blog-static/content/blog/00_types_intro.md

9.9 KiB

title date tags series draft
Everything I Know About Types: Introduction 2022-06-26T18:36:01-07:00
Type Systems
Programming Languages
Everything I Know About Types true

{{< katex_component_js >}} {{< bergamot_js_css >}}

I am in love with types and type systems. They are, quite probably, my favorite concept in the world. Most of us mere mortals use types as a way to make sure we aren't writing bad code - they help us catch things that are certainly wrong, such as dividing a string by a number, or calling a method on an object that doesn't exist. Types, however, can do a lot more than that. In languages with particularly powerful type systems, like Idris or Coq, you can prove facts about your program, confirming, for instance, that a function doesn't loop infinitely and eventually returns a result. More generally, some powerful systems can be used in place of Zermelo-Frankel set theory as foundations for all of mathematics. There's a lot of depth to types.

A lot of the fascinating ideas in type systems are contained in academic writing, rather than "real-world" programming languages. This, unfortunately, makes them harder to access - the field of programming languages isn't exactly known for being friendly to newbies. I want to introduce these fascinating ideas to you, and I want to prove to you that these ideas are not "academic autofellatio" -- they have practical use. At the same time, I want to give you the tools to explore programming languages literature yourself. The notation in the field seemed to me to be scary and overwhelming at first, but nowadays my eyes go straight to the equations in PL papers. It's not that scary once you get used to it, and it really does save a lot of ambiguity.

So I am going to try to opt for a hybrid approach. My aim is to explore various ideas in programming languages, starting from the absolute basics and working my way up to the limits of what I know. I will try to motivate each exploration with examples from the real world, using well-known typed languages, perhaps TypeScript, Java, C++, and Haskell. At the same time, I will build up the tools to describe type systems mathematically, the way it's done in papers.

Before all of that, though, I want to go slightly deeper into the motivation behind type systems, and some reasons why I find them so interesting.

The Impossibility of Knowing

Wouldn't it be nice if there was a programming language that could know, always and with complete certainty, if you wrote bad code? Maybe it screams at you, like the skull of regret. Forgot to increment a loop counter? AAAA! Flipped the arguments to a function? AAAA! Caused an infinite loop? AAAAAAA! No mistake escapes its all-seeing eye, and if your code compiles, it's always completely correct. You might think that this compiler only exists in your mind, but guess what? You're right.

The reason, broadly speaking, is Rice's theorem. It states that any interesting question about a program's behavior is impossible to answer. We can't determine for sure whether or not any program will run infinitely, or if it will throw an exception, or do something silly. It's just not possible. We're out of luck.

In software, though, there's a bit of discrepancy between theory and practice. They say that computer scientists love dismissing problems as NP-hard, while working software engineers write code that, while not fast in the general case, is perfectly adequate in the real world. What I'm trying to say is: a theoretical restriction, no matter how stringent, shouldn't stop us from trying to find something "good enough in practice".

This is where types come in. They are decidedly not magic - you can't do anything with a type system that violates Rice's theorem. And yet, modern type systems can reject a huge number of incorrect programs, and at the same time allow most "correct-looking" programs through. They can't eliminate all errors, but they can eliminate a lot of them.

Moreover, this sort of checking is cheap. For example, Algorithm J, which is used to typecheck polymorphic functional programs, has a "close to linear" complexity in terms of program size. This is a huge jump, taking us from "undecidable" to "efficient". Once again, approximations prevail. What we have is a fast and practical way of catching our mistakes.

Types for Documentation

To me, type systems aren't only about checking if your program is correct; they're also about documentation. This way of viewing programs comes to me from experience with Haskell. In Haskell, the compiler is usually able to check your program without you ever having to write down any types. You can write your programs, like length xs + sum xs, and feel assured that what you wrote isn't complete nonsense. And yet, most Haskell programmers (myself included) do write the types for all of their functions, by hand. How come?

The answer is that types can tell you what your code does. Even nontrivial functions can often have types that fit on a single line (and that are quite easy to read). For example, given a function typecheck, you could try figure out what it returns by reading the code, or by meditating on its name. Or, you could read its type:

typecheck :: Expr -> Either Error Type

This can be read "given an expression, typecheck returns either an error or a type". Now you know what to expect when calling the function, what arguments it needs, and you have some idea of what it does. You get all this without ever looking at the code. As another example, the function partition has the following type:

partition :: (a -> Bool) -> [a] -> ([a], [a])

This one reads, "given a function that returns true or false, and a list, partition returns two lists". It's not too hard to infer that partition divides the given list into two, one list for items that made the input function true, and one for those that made it false. I say this without having read a word of partition's documentation, and having never seen its code. The type is enough to convey that information.

There's more. Types, when viewed in this way, don't just help you understand code you already have. They can also be used to find code that does what you want. This is the idea behind Hoogle. Here's an example.

Suppose I have a list of people's middle names. A middle name is just a string (String in Haskell). Not everyone has a middle name, though. To represent a value that may or may not be present, we can use the Maybe type constructor. A list of people's middle names, then, would have the type [Maybe String], a list of (potentially absent) strings. Now, suppose I wanted to take this list, and filter out all of the absent middle names, resulting in just a list of strings, [String]. Do I have to do this myself, or is there a function for it? I put a type signature into Hoogle:

[Maybe String] -> [String]

Hoogle responds with a function that fits the search, in the Data.Maybe module:

catMaybes : [Maybe a] -> [a]

Looks like I don't have to implement it myself after all! When working with Haskell, I consult Hoogle a lot of figure out the names of existing functions that match my needs.

I singled out Haskell in this section, but types in a variety of other languages make for great documentation. When I wrote Rust code, the types were very useful to understanding how to use a crate; check out the documentation page for Vec, for instance. Documentation of Elm packages always lists functions' types (see documentation for Parser, for example). Even C++ type signatures listed by Doxygen can be quite useful; I, for one, got a lot out of the LLVM documentation.

Exercises, Bergamot and You

One of the reasons I love the Software Foundations series are the exercises. They are within the text, and they are machine-checked: if you use a computer tool to work through the tasks, and verify that you did them correctly. I hope to do something similar. Exercises will look something like this:

{{< bergamot_exercise label="sample exercise" id="exercise-1" >}} Here I tell you to do something that I believe would be instructive. {{< /bergamot_exercise >}}

To achieve my ideal of interactive exercises, I developed a tool called Bergamot. It's a tiny little programming language for writing inference rules, which are an invaluable tool in a type theorist's toolbox. I introduced the tool in [a separate post on this site]({{< relref "bergamot" >}}). Throughout this series, I'll be using Bergamot for exercises and mild amounts of interactive content. This is completely optional: My aim is to make everything I write self-contained and useful without various tools. However, I think that having a way to interactively play with inference rules is conducive to learning the concepts.

The unfortunate problem with making a tool for exercises is that you also need to teach others how to use the tool. Some exercises will be more specific to the tool than to type theory itself; I will denote these exercises as such where possible. Also, whenever the context of the exercise can be loaded into Bergamot, I will denote this with a play button.

{{< bergamot_preset name="intro-preset" prompt="PromptConverter @ prompt(type(?term, ?t)) <- input(?term);" >}} TNumber @ type(lit(?n), number) <- num(?n); TPlusI @ type(plus(?e_1, ?e_2), number) <- type(?e_1, number), type(?e_2, number); {{< /bergamot_preset >}}

{{< bergamot_exercise label="bergamot; sample exercise" preset="intro-preset" id="exercise-2" >}} Try typing 1+1 into the input field! {{< /bergamot_exercise >}}