From 5ef10238e3c1597744bdaecff6e28ba81a71a07e Mon Sep 17 00:00:00 2001 From: Danila Fedorin Date: Thu, 30 Jun 2022 21:58:57 -0700 Subject: [PATCH] Start working on the basics part of the type systems articles --- content/blog/01_types_basics.md | 203 ++++++++++++++++++++++++++++++++ 1 file changed, 203 insertions(+) create mode 100644 content/blog/01_types_basics.md diff --git a/content/blog/01_types_basics.md b/content/blog/01_types_basics.md new file mode 100644 index 0000000..3e7880a --- /dev/null +++ b/content/blog/01_types_basics.md @@ -0,0 +1,203 @@ +--- +title: "Everything I Know About Types: Basics" +date: 2022-06-30T19:08:50-07:00 +tags: ["Type Systems", "Programming Languages"] +draft: true +--- + +It's finally time to start looking at types. As I mentioned, I want +to take an approach that draws a variety of examples from the real +world - I'd like to talk about examples from real programming +languages. Before doing that, though, let's start with a (working) +definition of what a type even is. Let's try the following: + +__A type is a category of values in a programming language.__ + +Values are grouped together in a category if they behave similarly +to each other, in some sense. All integers can be added together +or multiplied; all strings can be reversed and concatenated; +all objects of some class `Dog` have a method called `bark`. +It is precisely this categorization that makes type systems +practical; since values of a type have common behavior, it's +sufficient to reason about some abstract (non-specific) value +of a particular type, and there is no need to consider how a +program behaves in all possible scenarios. Furthermore, since +values of _different_ types may behave differently, it is _not_ +safe to use a value of one type where a value of another type +was expected. + +In the above paragraph, I already introduced some examples +of types. Let's take a look at some examples in the wild, +starting with numbers. TypeScript gives us the easiest time in +this case: it has a type called `number`. In the below code +snippet, I assign a number to the variable `x` so that I can +explicitly write its type. + +```TypeScript +const x: number = 1; +``` + +In other languages, the situation is slightly more complicated; it's +frequently necessary to distinguish between values that could have +a fractional portion (real numbers) and numbers that are always whole +(integers). For the moment, let's focus on integers. These are +ubiquitous in various programming languages; Java has `int`: + +```Java +int x = 0; +``` + +Things in C++, C#, and many other languages look very similar. +In rust, we have to make an even finer distinction: we have to +distinguish between integers represented using 32 bits and those +represented by 64 bits. Focusing on the former, we +could write: + +```Rust +let x: i32 = 0; +``` + +In Haskell, we can confirm the type of a value without having to +assign it to a variable; the following suffices. + +```Haskell +1 :: Int +``` + +That should be enough examples of integers for now. I'm sure you've seen +them in your programming or computer science career. What you +may not have seen, though, is the formal / mathematical way of +stating that some expression or value has a particular type. +In the mathematical notation, too, there's no need to assign a value to +a variable to state its type. The notation is actually very similar +the that of Haskell; here's how one might write the claim that 1 is a number. + +{{< latex >}} +1:\text{number} +{{< /latex >}} + +There's one more difference between mathematical notation and the +code we've seen so far. If you wrote `num`, or `aNumber`, or anything +other than just `numbeer` in the TypeScript example (or if you similarly +deviated from the "correct" name in other languages), you'd be greeted with +an error. The compilers or interpreters of these languages only understand a +fixed set of types, and we are required to stick to names in that set. We have no such +duty when using mathematical notation. The main goal of a mathematical definition +is not to run the code, or check if it's correct; it's to communicate something +to others. As long as others understand what you mean, you can do whatever you want. +I _chose_ to use the word \\(\\text{number}\\) to represent the type +of numbers, mainly because it's _very_ clear what that means. A theorist writing +a paper might cringe at the verbosity of such a convention. My goal, however, is +to communicate things to _you_, dear reader, and I think it's best to settle for +clarity over brevity. + +Actually, this illustrates a general principle. It's not just the names of the types +that we have full control over; it's the whole notation. We could just as well have +written the claim as follows: + +{{< latex >}} +\cdot\ \text{nummie}\ \sim +{{< /latex >}} + +As long as the reader knew that a single dot represents the number 1, "nummie" +represents numbers, and the tilde represents "has type", we'd technically +be fine. Of course, this is completely unreadable, and certainly unconventional. +I will do my best to stick to the notational standards established in programming +languages literature. Nevertheless, keep this in mind: __we control the notation__. +It's perfectly acceptable to change how something is written if it makes it easier +to express whatever you want to express, and this is done frequently in practice. +Another consequence of this is that not everyone agrees on notation; according +to [this paper](https://labs.oracle.com/pls/apex/f?p=LABS:0::APPLICATION_PROCESS%3DGETDOC_INLINE:::DOC_ID:959), +27 different ways of writing down substitutions were observed in the POPL conference alone. + +One more thing. So far, we've only written down one claim: the value 1 is a number. +What about the other numbers? To make sure they're accounted for, we need similar +rules for 2, 3, and so on. + +{{< latex >}} +2:\text{number} \quad 3:\text{number} \quad ... +{{< /latex >}} + +This gets tedious quickly. All these rules look the same! It would be much nicer if we could +write down the "shape" of these rules, and understand that there's one such rule for each number. +This is exactly what is done in PL. We'd write the following. + +{{< latex >}} +n:\text{number} +{{< /latex >}} + +What's this \\(n\\)? First, recall that notation is up to us. I'm choosing to use the letter +\\(n\\) to stand for "any value that is a number". We write a symbol, say what we want it to mean, +and we're done. But then, we need to be careful. It's important to note that the letter \\(n\\) is +not a variable like `x` in our code snippets above. In fact, it's not at all part of the programming +language we're discussing. Rather, it's kind of like a variable in our _rules_. + +This distinction comes up a lot. The thing is, the notation we're building up to talk about programs is its own +kind of language. It's not meant for a computer to execute, mind you, but that's not a requirement +for something to be language (ever heard of English?). The bottom line is, we have symbols with +particular meanings, and there are rules to how they have to be written. The statement "1 is a number" +must be written by first writing 1, then a colon, then \\(\text{number}\\). It's a language. + +Really, then, we have two languages to think about: +* The _object language_ is the programming language we're trying to describe and mathematically + formalize. This is the language that has variables like `x`, keywords like `let` and `const`, and so on. +* The _meta language_ is the notation we use to talk about our object language. It consists of + the various symbols we define, and is really just a system for communicating various things + (like type rules) to others. + +Using this terminology, \\(n\\) is a variable in our meta language; this is commonly called +a _metavariable_. A rule such as \\(n:\\text{number}\\) that contains metavariables isn't +really a rule by itself; rather, it stands for a whole bunch of rules, one for each possible +number that \\(n\\) can be. We call this a _rule schema_. + +Alright, that's enough theory for now. Let's go back to the real world. Working with +plain old values like `1` gets boring quickly. There's not many programs you can write +with them! Numbers can be added, though, why don't we look at that? All mainstream +languages can do this quite easily. Here's Typescript: + +``` +const y = 1+1; +``` + +When it comes to adding whole numbers, every other language is pretty much the same. +Throwing addition into the mix, and branching out to other types of numbers, we +can arrive at our first type error. Here it is in Rust: + +```Rust +let x = 1.1 + 1; +// ^ no implementation for `{float} + {integer}` +``` + +You see, numbers that are not whole are represented on computers differently +than whole numbers are. The former are represented using something called _floating point_ +(hence the type name `float`). Rust wants the user to be fully aware that they are adding +two numbers that have different representations, so it makes such additions a type error +by default, preventing it from happening on accident. The type system is used to enforce this. +In Java, addition like this is perfectly legal, and conversion is performed automatically. + +```Java +double x = 1.1 + 1; +``` + +The addition produces a double-precision (hence `double`) floating point value. If +we were to try to claim that `x` is an integer, we would be stopped. + +```Java +int x = 1.1 + 1; +// ^ incompatible types: possible lossy conversion from double to int +``` + +If we tried to save the result of `1.1+1` as an integer, we'd have to throw away the `.1`; +that's what Java means by "lossy". This is something that the Java designers didn't +want users to do accidentally. The type system ensures that if either number +being added is a `float` (or `double`), then so is the result of the addition. + +In TypeScript, all numbers have the same representation, so there's no way to create +a type error by adding two of them. + +```TypeScript +const x: number = 1.1 + 1; // just fine! +``` + +That concludes the second round of real-world examples. Let's take a look at formalizing +all of this mathematically.