Start working on the basics part of the type systems articles
This commit is contained in:
		
							parent
							
								
									d2e8f809d0
								
							
						
					
					
						commit
						5ef10238e3
					
				
							
								
								
									
										203
									
								
								content/blog/01_types_basics.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										203
									
								
								content/blog/01_types_basics.md
									
									
									
									
									
										Normal file
									
								
							@ -0,0 +1,203 @@
 | 
			
		||||
---
 | 
			
		||||
title: "Everything I Know About Types: Basics"
 | 
			
		||||
date: 2022-06-30T19:08:50-07:00
 | 
			
		||||
tags: ["Type Systems", "Programming Languages"]
 | 
			
		||||
draft: true
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
It's finally time to start looking at types. As I mentioned, I want
 | 
			
		||||
to take an approach that draws a variety of examples from the real
 | 
			
		||||
world - I'd like to talk about examples from real programming
 | 
			
		||||
languages. Before doing that, though, let's start with a (working)
 | 
			
		||||
definition of what a type even is. Let's try the following:
 | 
			
		||||
 | 
			
		||||
__A type is a category of values in a programming language.__
 | 
			
		||||
 | 
			
		||||
Values are grouped together in a category if they behave similarly
 | 
			
		||||
to each other, in some sense. All integers can be added together
 | 
			
		||||
or multiplied; all strings can be reversed and concatenated; 
 | 
			
		||||
all objects of some class `Dog` have a method called `bark`.
 | 
			
		||||
It is precisely this categorization that makes type systems
 | 
			
		||||
practical; since values of a type have common behavior, it's
 | 
			
		||||
sufficient to reason about some abstract (non-specific) value
 | 
			
		||||
of a particular type, and there is no need to consider how a
 | 
			
		||||
program behaves in all possible scenarios. Furthermore, since
 | 
			
		||||
values of _different_ types may behave differently, it is _not_
 | 
			
		||||
safe to use a value of one type where a value of another type
 | 
			
		||||
was expected.
 | 
			
		||||
 | 
			
		||||
In the above paragraph, I already introduced some examples
 | 
			
		||||
of types. Let's take a look at some examples in the wild,
 | 
			
		||||
starting with numbers. TypeScript gives us the easiest time in
 | 
			
		||||
this case: it has a type called `number`. In the below code
 | 
			
		||||
snippet, I assign a number to the variable `x` so that I can
 | 
			
		||||
explicitly write its type.
 | 
			
		||||
 | 
			
		||||
```TypeScript
 | 
			
		||||
const x: number = 1;
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
In other languages, the situation is slightly more complicated; it's
 | 
			
		||||
frequently necessary to distinguish between values that could have
 | 
			
		||||
a fractional portion (real numbers) and numbers that are always whole
 | 
			
		||||
(integers). For the moment, let's focus on integers. These are
 | 
			
		||||
ubiquitous in various programming languages; Java has `int`:
 | 
			
		||||
 | 
			
		||||
```Java
 | 
			
		||||
int x = 0;
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Things in C++, C#, and many other languages look very similar.
 | 
			
		||||
In rust, we have to make an even finer distinction: we have to
 | 
			
		||||
distinguish between integers represented using 32 bits and those
 | 
			
		||||
represented by 64 bits. Focusing on the former, we
 | 
			
		||||
could write:
 | 
			
		||||
 | 
			
		||||
```Rust
 | 
			
		||||
let x: i32 = 0;
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
In Haskell, we can confirm the type of a value without having to
 | 
			
		||||
assign it to a variable; the following suffices.
 | 
			
		||||
 | 
			
		||||
```Haskell
 | 
			
		||||
1 :: Int
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
That should be enough examples of integers for now. I'm sure you've seen
 | 
			
		||||
them in your programming or computer science career. What you
 | 
			
		||||
may not have seen, though, is the formal / mathematical way of
 | 
			
		||||
stating that some expression or value has a particular type.
 | 
			
		||||
In the mathematical notation, too, there's no need to assign a value to
 | 
			
		||||
a variable to state its type. The notation is actually very similar
 | 
			
		||||
the that of Haskell; here's how one might write the claim that 1 is a number.
 | 
			
		||||
 | 
			
		||||
{{< latex >}}
 | 
			
		||||
1:\text{number}
 | 
			
		||||
{{< /latex >}}
 | 
			
		||||
 | 
			
		||||
There's one more difference between mathematical notation and the
 | 
			
		||||
code we've seen so far. If you wrote `num`, or `aNumber`, or anything
 | 
			
		||||
other than just `numbeer` in the TypeScript example (or if you similarly
 | 
			
		||||
deviated from the "correct" name in other languages), you'd be greeted with
 | 
			
		||||
an error. The compilers or interpreters of these languages only understand a
 | 
			
		||||
fixed set of types, and we are required to stick to names in that set. We have no such
 | 
			
		||||
duty when using mathematical notation. The main goal of a mathematical definition
 | 
			
		||||
is not to run the code, or check if it's correct; it's to communicate something
 | 
			
		||||
to others. As long as others understand what you mean, you can do whatever you want.
 | 
			
		||||
I _chose_ to use the word \\(\\text{number}\\) to represent the type
 | 
			
		||||
of numbers, mainly because it's _very_ clear what that means. A theorist writing
 | 
			
		||||
a paper might cringe at the verbosity of such a convention. My goal, however, is
 | 
			
		||||
to communicate things to _you_, dear reader, and I think it's best to settle for
 | 
			
		||||
clarity over brevity.
 | 
			
		||||
 | 
			
		||||
Actually, this illustrates a general principle. It's not just the names of the types
 | 
			
		||||
that we have full control over; it's the whole notation. We could just as well have
 | 
			
		||||
written the claim as follows:
 | 
			
		||||
 | 
			
		||||
{{< latex >}}
 | 
			
		||||
\cdot\ \text{nummie}\ \sim
 | 
			
		||||
{{< /latex >}}
 | 
			
		||||
 | 
			
		||||
As long as the reader knew that a single dot represents the number 1, "nummie"
 | 
			
		||||
represents numbers, and the tilde represents "has type", we'd technically
 | 
			
		||||
be fine. Of course, this is completely unreadable, and certainly unconventional.
 | 
			
		||||
I will do my best to stick to the notational standards established in programming
 | 
			
		||||
languages literature. Nevertheless, keep this in mind: __we control the notation__.
 | 
			
		||||
It's perfectly acceptable to change how something is written if it makes it easier
 | 
			
		||||
to express whatever you want to express, and this is done frequently in practice.
 | 
			
		||||
Another consequence of this is that not everyone agrees on notation; according
 | 
			
		||||
to [this paper](https://labs.oracle.com/pls/apex/f?p=LABS:0::APPLICATION_PROCESS%3DGETDOC_INLINE:::DOC_ID:959),
 | 
			
		||||
27 different ways of writing down substitutions were observed in the POPL conference alone.
 | 
			
		||||
 | 
			
		||||
One more thing. So far, we've only written down one claim: the value 1 is a number.
 | 
			
		||||
What about the other numbers? To make sure they're accounted for, we need similar
 | 
			
		||||
rules for 2, 3, and so on.
 | 
			
		||||
 | 
			
		||||
{{< latex >}}
 | 
			
		||||
2:\text{number} \quad 3:\text{number} \quad ...
 | 
			
		||||
{{< /latex >}}
 | 
			
		||||
 | 
			
		||||
This gets tedious quickly. All these rules look the same! It would be much nicer if we could
 | 
			
		||||
write down the "shape" of these rules, and understand that there's one such rule for each number.
 | 
			
		||||
This is exactly what is done in PL. We'd write the following.
 | 
			
		||||
 | 
			
		||||
{{< latex >}}
 | 
			
		||||
n:\text{number}
 | 
			
		||||
{{< /latex >}}
 | 
			
		||||
 | 
			
		||||
What's this \\(n\\)? First, recall that notation is up to us. I'm choosing to use the letter
 | 
			
		||||
\\(n\\) to stand for "any value that is a number". We write a symbol, say what we want it to mean,
 | 
			
		||||
and we're done. But then, we need to be careful. It's important to note that the letter \\(n\\) is
 | 
			
		||||
not a variable like `x` in our code snippets above. In fact, it's not at all part of the programming
 | 
			
		||||
language we're discussing. Rather, it's kind of like a variable in our _rules_.
 | 
			
		||||
 | 
			
		||||
This distinction comes up a lot. The thing is, the notation we're building up to talk about programs is its own
 | 
			
		||||
kind of language. It's not meant for a computer to execute, mind you, but that's not a requirement
 | 
			
		||||
for something to be language (ever heard of English?). The bottom line is, we have symbols with
 | 
			
		||||
particular meanings, and there are rules to how they have to be written. The statement "1 is a number"
 | 
			
		||||
must be written by first writing 1, then a colon, then \\(\text{number}\\). It's a language.
 | 
			
		||||
 | 
			
		||||
Really, then, we have two languages to think about:
 | 
			
		||||
* The _object language_ is the programming language we're trying to describe and mathematically
 | 
			
		||||
  formalize. This is the language that has variables like `x`, keywords like `let` and `const`, and so on.
 | 
			
		||||
* The _meta language_ is the notation we use to talk about our object language. It consists of
 | 
			
		||||
  the various symbols we define, and is really just a system for communicating various things
 | 
			
		||||
  (like type rules) to others.
 | 
			
		||||
 | 
			
		||||
Using this terminology, \\(n\\) is a variable in our meta language; this is commonly called
 | 
			
		||||
a _metavariable_. A rule such as \\(n:\\text{number}\\) that contains metavariables isn't
 | 
			
		||||
really a rule by itself; rather, it stands for a whole bunch of rules, one for each possible
 | 
			
		||||
number that \\(n\\) can be. We call this a _rule schema_.
 | 
			
		||||
 | 
			
		||||
Alright, that's enough theory for now. Let's go back to the real world. Working with
 | 
			
		||||
plain old values like `1` gets boring quickly. There's not many programs you can write
 | 
			
		||||
with them! Numbers can be added, though, why don't we look at that? All mainstream
 | 
			
		||||
languages can do this quite easily. Here's Typescript:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
const y = 1+1;
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
When it comes to adding whole numbers, every other language is pretty much the same.
 | 
			
		||||
Throwing addition into the mix, and branching out to other types of numbers, we
 | 
			
		||||
can arrive at our first type error. Here it is in Rust:
 | 
			
		||||
 | 
			
		||||
```Rust
 | 
			
		||||
let x = 1.1 + 1;
 | 
			
		||||
//          ^ no implementation for `{float} + {integer}`
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
You see, numbers that are not whole are represented on computers differently
 | 
			
		||||
than whole numbers are. The former are represented using something called _floating point_
 | 
			
		||||
(hence the type name `float`). Rust wants the user to be fully aware that they are adding
 | 
			
		||||
two numbers that have different representations, so it makes such additions a type error
 | 
			
		||||
by default, preventing it from happening on accident. The type system is used to enforce this.
 | 
			
		||||
In Java, addition like this is perfectly legal, and conversion is performed automatically.
 | 
			
		||||
 | 
			
		||||
```Java
 | 
			
		||||
double x = 1.1 + 1;
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The addition produces a double-precision (hence `double`) floating point value. If
 | 
			
		||||
we were to try to claim that `x` is an integer, we would be stopped.
 | 
			
		||||
 | 
			
		||||
```Java
 | 
			
		||||
int x = 1.1 + 1;
 | 
			
		||||
//          ^ incompatible types: possible lossy conversion from double to int
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
If we tried to save the result of `1.1+1` as an integer, we'd have to throw away the `.1`;
 | 
			
		||||
that's what Java means by "lossy". This is something that the Java designers didn't
 | 
			
		||||
want users to do accidentally. The type system ensures that if either number
 | 
			
		||||
being added is a `float` (or `double`), then so is the result of the addition.
 | 
			
		||||
 | 
			
		||||
In TypeScript, all numbers have the same representation, so there's no way to create
 | 
			
		||||
a type error by adding two of them.
 | 
			
		||||
 | 
			
		||||
```TypeScript
 | 
			
		||||
const x: number = 1.1 + 1; // just fine!
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
That concludes the second round of real-world examples. Let's take a look at formalizing
 | 
			
		||||
all of this mathematically.
 | 
			
		||||
		Loading…
	
		Reference in New Issue
	
	Block a user