Start working on the basics part of the type systems articles

2022-06-30 21:58:57 -07:00 · 2022-06-30 21:58:57 -07:00 · 5ef10238e3
commit 5ef10238e3
parent d2e8f809d0
1 changed files with 203 additions and 0 deletions
--- a/content/blog/01_types_basics.md
+++ b/content/blog/01_types_basics.md
@ -0,0 +1,203 @@
+---
+title: "Everything I Know About Types: Basics"
+date: 2022-06-30T19:08:50-07:00
+tags: ["Type Systems", "Programming Languages"]
+draft: true
+---
+
+It's finally time to start looking at types. As I mentioned, I want
+to take an approach that draws a variety of examples from the real
+world - I'd like to talk about examples from real programming
+languages. Before doing that, though, let's start with a (working)
+definition of what a type even is. Let's try the following:
+
+__A type is a category of values in a programming language.__
+
+Values are grouped together in a category if they behave similarly
+to each other, in some sense. All integers can be added together
+or multiplied; all strings can be reversed and concatenated; 
+all objects of some class `Dog` have a method called `bark`.
+It is precisely this categorization that makes type systems
+practical; since values of a type have common behavior, it's
+sufficient to reason about some abstract (non-specific) value
+of a particular type, and there is no need to consider how a
+program behaves in all possible scenarios. Furthermore, since
+values of _different_ types may behave differently, it is _not_
+safe to use a value of one type where a value of another type
+was expected.
+
+In the above paragraph, I already introduced some examples
+of types. Let's take a look at some examples in the wild,
+starting with numbers. TypeScript gives us the easiest time in
+this case: it has a type called `number`. In the below code
+snippet, I assign a number to the variable `x` so that I can
+explicitly write its type.
+
+```TypeScript
+const x: number = 1;
+```
+
+In other languages, the situation is slightly more complicated; it's
+frequently necessary to distinguish between values that could have
+a fractional portion (real numbers) and numbers that are always whole
+(integers). For the moment, let's focus on integers. These are
+ubiquitous in various programming languages; Java has `int`:
+
+```Java
+int x = 0;
+```
+
+Things in C++, C#, and many other languages look very similar.
+In rust, we have to make an even finer distinction: we have to
+distinguish between integers represented using 32 bits and those
+represented by 64 bits. Focusing on the former, we
+could write:
+
+```Rust
+let x: i32 = 0;
+```
+
+In Haskell, we can confirm the type of a value without having to
+assign it to a variable; the following suffices.
+
+```Haskell
+1 :: Int
+```
+
+That should be enough examples of integers for now. I'm sure you've seen
+them in your programming or computer science career. What you
+may not have seen, though, is the formal / mathematical way of
+stating that some expression or value has a particular type.
+In the mathematical notation, too, there's no need to assign a value to
+a variable to state its type. The notation is actually very similar
+the that of Haskell; here's how one might write the claim that 1 is a number.
+
+{{< latex >}}
+1:\text{number}
+{{< /latex >}}
+
+There's one more difference between mathematical notation and the
+code we've seen so far. If you wrote `num`, or `aNumber`, or anything
+other than just `numbeer` in the TypeScript example (or if you similarly
+deviated from the "correct" name in other languages), you'd be greeted with
+an error. The compilers or interpreters of these languages only understand a
+fixed set of types, and we are required to stick to names in that set. We have no such
+duty when using mathematical notation. The main goal of a mathematical definition
+is not to run the code, or check if it's correct; it's to communicate something
+to others. As long as others understand what you mean, you can do whatever you want.
+I _chose_ to use the word \\(\\text{number}\\) to represent the type
+of numbers, mainly because it's _very_ clear what that means. A theorist writing
+a paper might cringe at the verbosity of such a convention. My goal, however, is
+to communicate things to _you_, dear reader, and I think it's best to settle for
+clarity over brevity.
+
+Actually, this illustrates a general principle. It's not just the names of the types
+that we have full control over; it's the whole notation. We could just as well have
+written the claim as follows:
+
+{{< latex >}}
+\cdot\ \text{nummie}\ \sim
+{{< /latex >}}
+
+As long as the reader knew that a single dot represents the number 1, "nummie"
+represents numbers, and the tilde represents "has type", we'd technically
+be fine. Of course, this is completely unreadable, and certainly unconventional.
+I will do my best to stick to the notational standards established in programming
+languages literature. Nevertheless, keep this in mind: __we control the notation__.
+It's perfectly acceptable to change how something is written if it makes it easier
+to express whatever you want to express, and this is done frequently in practice.
+Another consequence of this is that not everyone agrees on notation; according
+to [this paper](https://labs.oracle.com/pls/apex/f?p=LABS:0::APPLICATION_PROCESS%3DGETDOC_INLINE:::DOC_ID:959),
+27 different ways of writing down substitutions were observed in the POPL conference alone.
+
+One more thing. So far, we've only written down one claim: the value 1 is a number.
+What about the other numbers? To make sure they're accounted for, we need similar
+rules for 2, 3, and so on.
+
+{{< latex >}}
+2:\text{number} \quad 3:\text{number} \quad ...
+{{< /latex >}}
+
+This gets tedious quickly. All these rules look the same! It would be much nicer if we could
+write down the "shape" of these rules, and understand that there's one such rule for each number.
+This is exactly what is done in PL. We'd write the following.
+
+{{< latex >}}
+n:\text{number}
+{{< /latex >}}
+
+What's this \\(n\\)? First, recall that notation is up to us. I'm choosing to use the letter
+\\(n\\) to stand for "any value that is a number". We write a symbol, say what we want it to mean,
+and we're done. But then, we need to be careful. It's important to note that the letter \\(n\\) is
+not a variable like `x` in our code snippets above. In fact, it's not at all part of the programming
+language we're discussing. Rather, it's kind of like a variable in our _rules_.
+
+This distinction comes up a lot. The thing is, the notation we're building up to talk about programs is its own
+kind of language. It's not meant for a computer to execute, mind you, but that's not a requirement
+for something to be language (ever heard of English?). The bottom line is, we have symbols with
+particular meanings, and there are rules to how they have to be written. The statement "1 is a number"
+must be written by first writing 1, then a colon, then \\(\text{number}\\). It's a language.
+
+Really, then, we have two languages to think about:
+* The _object language_ is the programming language we're trying to describe and mathematically
+  formalize. This is the language that has variables like `x`, keywords like `let` and `const`, and so on.
+* The _meta language_ is the notation we use to talk about our object language. It consists of
+  the various symbols we define, and is really just a system for communicating various things
+  (like type rules) to others.
+
+Using this terminology, \\(n\\) is a variable in our meta language; this is commonly called
+a _metavariable_. A rule such as \\(n:\\text{number}\\) that contains metavariables isn't
+really a rule by itself; rather, it stands for a whole bunch of rules, one for each possible
+number that \\(n\\) can be. We call this a _rule schema_.
+
+Alright, that's enough theory for now. Let's go back to the real world. Working with
+plain old values like `1` gets boring quickly. There's not many programs you can write
+with them! Numbers can be added, though, why don't we look at that? All mainstream
+languages can do this quite easily. Here's Typescript:
+
+```
+const y = 1+1;
+```
+
+When it comes to adding whole numbers, every other language is pretty much the same.
+Throwing addition into the mix, and branching out to other types of numbers, we
+can arrive at our first type error. Here it is in Rust:
+
+```Rust
+let x = 1.1 + 1;
+//          ^ no implementation for `{float} + {integer}`
+```
+
+You see, numbers that are not whole are represented on computers differently
+than whole numbers are. The former are represented using something called _floating point_
+(hence the type name `float`). Rust wants the user to be fully aware that they are adding
+two numbers that have different representations, so it makes such additions a type error
+by default, preventing it from happening on accident. The type system is used to enforce this.
+In Java, addition like this is perfectly legal, and conversion is performed automatically.
+
+```Java
+double x = 1.1 + 1;
+```
+
+The addition produces a double-precision (hence `double`) floating point value. If
+we were to try to claim that `x` is an integer, we would be stopped.
+
+```Java
+int x = 1.1 + 1;
+//          ^ incompatible types: possible lossy conversion from double to int
+```
+
+If we tried to save the result of `1.1+1` as an integer, we'd have to throw away the `.1`;
+that's what Java means by "lossy". This is something that the Java designers didn't
+want users to do accidentally. The type system ensures that if either number
+being added is a `float` (or `double`), then so is the result of the addition.
+
+In TypeScript, all numbers have the same representation, so there's no way to create
+a type error by adding two of them.
+
+```TypeScript
+const x: number = 1.1 + 1; // just fine!
+```
+
+That concludes the second round of real-world examples. Let's take a look at formalizing
+all of this mathematically.