blog-static/content/blog/00_types_intro.md

183 lines
9.9 KiB
Markdown

---
title: "Everything I Know About Types: Introduction"
date: 2022-06-26T18:36:01-07:00
tags: ["Type Systems", "Programming Languages"]
series: "Everything I Know About Types"
draft: true
---
{{< katex_component_js >}}
{{< bergamot_js_css >}}
I am in love with types and type systems. They are, quite probably,
my favorite concept in the world. Most of us mere
mortals use types as a way to make sure we aren't writing
bad code - they help us catch things that are certainly wrong,
such as dividing a string by a number, or calling a method
on an object that doesn't exist. Types, however, can do
a lot more than that. In languages with particularly
powerful type systems, like Idris or Coq, you can prove
facts about your program, confirming, for instance,
that a function doesn't loop infinitely and eventually
returns a result. More generally, some powerful systems
can be used in place of Zermelo-Frankel set theory as
foundations for _all of mathematics_. There's a lot of depth to types.
A lot of the fascinating ideas in type systems are contained
in academic writing, rather than "real-world" programming
languages. This, unfortunately, makes them harder to access -
the field of programming languages isn't exactly known
for being friendly to newbies. I want to introduce these fascinating
ideas to you, and I want to prove to you that these ideas are _not_
"academic autofellatio" -- they have practical use. At the
same time, I want to give you the tools to explore programming
languages literature yourself. The notation in the field seemed
to me to be scary and overwhelming at first, but nowadays my eyes
go straight to the equations in PL papers. It's not that scary
once you get used to it, and it really does save a lot of ambiguity.
So I am going to try to opt for a hybrid approach. My aim is to explore
various ideas in programming languages, starting from the absolute
basics and working my way up to the limits of what I know. I will
try to motivate each exploration with examples from the real world,
using well-known typed languages, perhaps TypeScript, Java, C++, and Haskell.
At the same time, I will build up the tools to describe type systems
mathematically, the way it's done in papers.
Before all of that, though, I want to go slightly deeper into the
motivation behind type systems, and some reasons why I find them
so interesting.
### The Impossibility of Knowing
Wouldn't it be nice if there was a programming language that could
know, always and with complete certainty, if you wrote bad code?
Maybe it screams at you, like the [skull of regret](https://imgur.com/gallery/vQm8O).
Forgot to increment a loop counter? _AAAA!_ Flipped the arguments to a function?
_AAAA!_ Caused an infinite loop? _AAAAAAA!_ No mistake escapes its all-seeing
eye, and if your code compiles, it's always completely correct. You
might think that this compiler only exists in your mind, but guess what? You're right.
The reason, broadly speaking, is [Rice's theorem](https://en.wikipedia.org/wiki/Rice%27s_theorem).
It states that any interesting question about a program's behavior is impossible to answer.
We can't determine for sure whether or not any program will run infinitely, or if it will
throw an exception, or do something silly. It's just not possible. We're out of luck.
In software, though, there's a bit of discrepancy between theory and practice. They
say that computer scientists love dismissing problems as NP-hard, while working
software engineers write code that, while not fast in the general case, is perfectly
adequate in the real world. What I'm trying to say is: a theoretical restriction,
no matter how stringent, shouldn't stop us from trying to find something "good enough in practice".
This is where types come in. They are decidedly not magic - you can't do anything
with a type system that violates Rice's theorem. And yet, modern type systems
can reject a huge number of incorrect programs, and at the same time allow
most "correct-looking" programs through. They can't eliminate _all_ errors,
but they can eliminate _a lot_ of them.
Moreover, this sort of checking is _cheap_. For example, [Algorithm J](https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system#Algorithm_J),
which is used to typecheck polymorphic functional programs, has a "close to linear"
complexity in terms of program size. This is a huge jump, taking us from "undecidable" to
"efficient". Once again, approximations prevail. What we have is a fast and
practical way of catching our mistakes.
### Types for Documentation
To me, type systems aren't only about checking if your program is correct;
they're also about documentation. This way of viewing programs comes to
me from experience with Haskell. In Haskell, the compiler is usually able to
check your program without you ever having to write down any types. You
can write your programs, like `length xs + sum xs`, and feel assured that what
you wrote isn't complete nonsense. And yet, most Haskell programmers
(myself included) _do_ write the types for all of their functions, by hand.
How come?
The answer is that types can tell you what your code does. Even nontrivial functions
can often have types that fit on a single line (and that are quite easy to read).
For example, given a function `typecheck`, you could try figure out what it returns by reading
the code, or by meditating on its name. Or, you could read its type:
```Haskell
typecheck :: Expr -> Either Error Type
```
This can be read "given an expression, `typecheck` returns either an error or a type".
Now you know what to expect when calling the function, what arguments it needs,
and you have some idea of what it does. You get all this without ever looking
at the code. As another example, the function `partition` has the following type:
```Haskell
partition :: (a -> Bool) -> [a] -> ([a], [a])
```
This one reads, "given a function that returns true or false, and a list, `partition` returns two lists".
It's not too hard to infer that `partition` divides the given list into two, one list
for items that made the input function true, and one for those that made it false. I say this
without having read a word of `partition`'s documentation, and having never seen its code.
The type is enough to convey that information.
There's more. Types, when viewed in this way, don't just help you understand code you already
have. They can also be used to _find_ code that does what you want. This is the idea behind
[Hoogle](https://hoogle.haskell.org/). Here's an example.
Suppose I have a list of people's middle names. A middle name is just a string (`String` in Haskell).
Not everyone has a middle name, though. To represent a value that may or may not be present,
we can use the `Maybe` type constructor. A list of people's middle names, then, would have
the type `[Maybe String]`, a list of (potentially absent) strings. Now, suppose I wanted to take
this list, and filter out all of the absent middle names, resulting in just a list of strings, `[String]`.
Do I have to do this myself, or is there a function for it? I put a type signature into Hoogle:
```Haskell
[Maybe String] -> [String]
```
Hoogle responds with a function that fits the search, in the `Data.Maybe` module:
```Haskell
catMaybes : [Maybe a] -> [a]
```
Looks like I don't have to implement it myself after all! When working with Haskell,
I consult Hoogle a lot of figure out the names of existing functions that match my needs.
I singled out Haskell in this section, but types in a variety of other languages
make for great documentation. When I wrote Rust code, the types were very useful to
understanding how to use a crate; check out the
[documentation page for `Vec`](https://doc.rust-lang.org/std/vec/struct.Vec.html), for instance.
Documentation of Elm packages always lists functions' types (see [documentation for `Parser`](https://package.elm-lang.org/packages/elm/parser/latest/Parser), for example). Even C++ type signatures
listed by Doxygen can be quite useful; I, for one, got a lot out of the [LLVM documentation](https://llvm.org/doxygen/classllvm_1_1IRBuilderBase.html).
### Exercises, Bergamot and You
One of the reasons I love the [Software Foundations](https://softwarefoundations.cis.upenn.edu/)
series are the exercises. They are within the text, and they are machine-checked:
if you use a computer tool to work through the tasks, and verify that you did
them correctly. I hope to do something similar. Exercises will look something
like this:
{{< bergamot_exercise label="sample exercise" id="exercise-1" >}}
Here I tell you to do something that I believe would be instructive.
{{< /bergamot_exercise >}}
To achieve my ideal of interactive exercises, I developed a tool called Bergamot.
It's a tiny little programming language for writing _inference rules_, which are an invaluable tool
in a type theorist's toolbox. I introduced the tool in [a separate post on this
site]({{< relref "bergamot" >}}). Throughout this series, I'll be using Bergamot
for exercises and mild amounts of interactive content. This is completely optional:
My aim is to make everything I write self-contained and useful without various
tools. However, I think that having a way to interactively play with inference
rules is conducive to learning the concepts.
The unfortunate problem with making a tool for exercises is that you also need
to teach others how to use the tool. Some exercises will be more specific
to the tool than to type theory itself; I will denote these exercises as such where
possible. Also, whenever the context of the exercise can be loaded into
Bergamot, I will denote this with a play button.
{{< bergamot_preset name="intro-preset" prompt="PromptConverter @ prompt(type(?term, ?t)) <- input(?term);" >}}
TNumber @ type(lit(?n), number) <- num(?n);
TPlusI @ type(plus(?e_1, ?e_2), number) <-
type(?e_1, number), type(?e_2, number);
{{< /bergamot_preset >}}
{{< bergamot_exercise label="bergamot; sample exercise" preset="intro-preset" id="exercise-2" >}}
Try typing `1+1` into the input field!
{{< /bergamot_exercise >}}