Start working on the basics part of the type systems articles
This commit is contained in:
parent
d2e8f809d0
commit
5ef10238e3
203
content/blog/01_types_basics.md
Normal file
203
content/blog/01_types_basics.md
Normal file
|
@ -0,0 +1,203 @@
|
||||||
|
---
|
||||||
|
title: "Everything I Know About Types: Basics"
|
||||||
|
date: 2022-06-30T19:08:50-07:00
|
||||||
|
tags: ["Type Systems", "Programming Languages"]
|
||||||
|
draft: true
|
||||||
|
---
|
||||||
|
|
||||||
|
It's finally time to start looking at types. As I mentioned, I want
|
||||||
|
to take an approach that draws a variety of examples from the real
|
||||||
|
world - I'd like to talk about examples from real programming
|
||||||
|
languages. Before doing that, though, let's start with a (working)
|
||||||
|
definition of what a type even is. Let's try the following:
|
||||||
|
|
||||||
|
__A type is a category of values in a programming language.__
|
||||||
|
|
||||||
|
Values are grouped together in a category if they behave similarly
|
||||||
|
to each other, in some sense. All integers can be added together
|
||||||
|
or multiplied; all strings can be reversed and concatenated;
|
||||||
|
all objects of some class `Dog` have a method called `bark`.
|
||||||
|
It is precisely this categorization that makes type systems
|
||||||
|
practical; since values of a type have common behavior, it's
|
||||||
|
sufficient to reason about some abstract (non-specific) value
|
||||||
|
of a particular type, and there is no need to consider how a
|
||||||
|
program behaves in all possible scenarios. Furthermore, since
|
||||||
|
values of _different_ types may behave differently, it is _not_
|
||||||
|
safe to use a value of one type where a value of another type
|
||||||
|
was expected.
|
||||||
|
|
||||||
|
In the above paragraph, I already introduced some examples
|
||||||
|
of types. Let's take a look at some examples in the wild,
|
||||||
|
starting with numbers. TypeScript gives us the easiest time in
|
||||||
|
this case: it has a type called `number`. In the below code
|
||||||
|
snippet, I assign a number to the variable `x` so that I can
|
||||||
|
explicitly write its type.
|
||||||
|
|
||||||
|
```TypeScript
|
||||||
|
const x: number = 1;
|
||||||
|
```
|
||||||
|
|
||||||
|
In other languages, the situation is slightly more complicated; it's
|
||||||
|
frequently necessary to distinguish between values that could have
|
||||||
|
a fractional portion (real numbers) and numbers that are always whole
|
||||||
|
(integers). For the moment, let's focus on integers. These are
|
||||||
|
ubiquitous in various programming languages; Java has `int`:
|
||||||
|
|
||||||
|
```Java
|
||||||
|
int x = 0;
|
||||||
|
```
|
||||||
|
|
||||||
|
Things in C++, C#, and many other languages look very similar.
|
||||||
|
In rust, we have to make an even finer distinction: we have to
|
||||||
|
distinguish between integers represented using 32 bits and those
|
||||||
|
represented by 64 bits. Focusing on the former, we
|
||||||
|
could write:
|
||||||
|
|
||||||
|
```Rust
|
||||||
|
let x: i32 = 0;
|
||||||
|
```
|
||||||
|
|
||||||
|
In Haskell, we can confirm the type of a value without having to
|
||||||
|
assign it to a variable; the following suffices.
|
||||||
|
|
||||||
|
```Haskell
|
||||||
|
1 :: Int
|
||||||
|
```
|
||||||
|
|
||||||
|
That should be enough examples of integers for now. I'm sure you've seen
|
||||||
|
them in your programming or computer science career. What you
|
||||||
|
may not have seen, though, is the formal / mathematical way of
|
||||||
|
stating that some expression or value has a particular type.
|
||||||
|
In the mathematical notation, too, there's no need to assign a value to
|
||||||
|
a variable to state its type. The notation is actually very similar
|
||||||
|
the that of Haskell; here's how one might write the claim that 1 is a number.
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
1:\text{number}
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
There's one more difference between mathematical notation and the
|
||||||
|
code we've seen so far. If you wrote `num`, or `aNumber`, or anything
|
||||||
|
other than just `numbeer` in the TypeScript example (or if you similarly
|
||||||
|
deviated from the "correct" name in other languages), you'd be greeted with
|
||||||
|
an error. The compilers or interpreters of these languages only understand a
|
||||||
|
fixed set of types, and we are required to stick to names in that set. We have no such
|
||||||
|
duty when using mathematical notation. The main goal of a mathematical definition
|
||||||
|
is not to run the code, or check if it's correct; it's to communicate something
|
||||||
|
to others. As long as others understand what you mean, you can do whatever you want.
|
||||||
|
I _chose_ to use the word \\(\\text{number}\\) to represent the type
|
||||||
|
of numbers, mainly because it's _very_ clear what that means. A theorist writing
|
||||||
|
a paper might cringe at the verbosity of such a convention. My goal, however, is
|
||||||
|
to communicate things to _you_, dear reader, and I think it's best to settle for
|
||||||
|
clarity over brevity.
|
||||||
|
|
||||||
|
Actually, this illustrates a general principle. It's not just the names of the types
|
||||||
|
that we have full control over; it's the whole notation. We could just as well have
|
||||||
|
written the claim as follows:
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
\cdot\ \text{nummie}\ \sim
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
As long as the reader knew that a single dot represents the number 1, "nummie"
|
||||||
|
represents numbers, and the tilde represents "has type", we'd technically
|
||||||
|
be fine. Of course, this is completely unreadable, and certainly unconventional.
|
||||||
|
I will do my best to stick to the notational standards established in programming
|
||||||
|
languages literature. Nevertheless, keep this in mind: __we control the notation__.
|
||||||
|
It's perfectly acceptable to change how something is written if it makes it easier
|
||||||
|
to express whatever you want to express, and this is done frequently in practice.
|
||||||
|
Another consequence of this is that not everyone agrees on notation; according
|
||||||
|
to [this paper](https://labs.oracle.com/pls/apex/f?p=LABS:0::APPLICATION_PROCESS%3DGETDOC_INLINE:::DOC_ID:959),
|
||||||
|
27 different ways of writing down substitutions were observed in the POPL conference alone.
|
||||||
|
|
||||||
|
One more thing. So far, we've only written down one claim: the value 1 is a number.
|
||||||
|
What about the other numbers? To make sure they're accounted for, we need similar
|
||||||
|
rules for 2, 3, and so on.
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
2:\text{number} \quad 3:\text{number} \quad ...
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
This gets tedious quickly. All these rules look the same! It would be much nicer if we could
|
||||||
|
write down the "shape" of these rules, and understand that there's one such rule for each number.
|
||||||
|
This is exactly what is done in PL. We'd write the following.
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
n:\text{number}
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
What's this \\(n\\)? First, recall that notation is up to us. I'm choosing to use the letter
|
||||||
|
\\(n\\) to stand for "any value that is a number". We write a symbol, say what we want it to mean,
|
||||||
|
and we're done. But then, we need to be careful. It's important to note that the letter \\(n\\) is
|
||||||
|
not a variable like `x` in our code snippets above. In fact, it's not at all part of the programming
|
||||||
|
language we're discussing. Rather, it's kind of like a variable in our _rules_.
|
||||||
|
|
||||||
|
This distinction comes up a lot. The thing is, the notation we're building up to talk about programs is its own
|
||||||
|
kind of language. It's not meant for a computer to execute, mind you, but that's not a requirement
|
||||||
|
for something to be language (ever heard of English?). The bottom line is, we have symbols with
|
||||||
|
particular meanings, and there are rules to how they have to be written. The statement "1 is a number"
|
||||||
|
must be written by first writing 1, then a colon, then \\(\text{number}\\). It's a language.
|
||||||
|
|
||||||
|
Really, then, we have two languages to think about:
|
||||||
|
* The _object language_ is the programming language we're trying to describe and mathematically
|
||||||
|
formalize. This is the language that has variables like `x`, keywords like `let` and `const`, and so on.
|
||||||
|
* The _meta language_ is the notation we use to talk about our object language. It consists of
|
||||||
|
the various symbols we define, and is really just a system for communicating various things
|
||||||
|
(like type rules) to others.
|
||||||
|
|
||||||
|
Using this terminology, \\(n\\) is a variable in our meta language; this is commonly called
|
||||||
|
a _metavariable_. A rule such as \\(n:\\text{number}\\) that contains metavariables isn't
|
||||||
|
really a rule by itself; rather, it stands for a whole bunch of rules, one for each possible
|
||||||
|
number that \\(n\\) can be. We call this a _rule schema_.
|
||||||
|
|
||||||
|
Alright, that's enough theory for now. Let's go back to the real world. Working with
|
||||||
|
plain old values like `1` gets boring quickly. There's not many programs you can write
|
||||||
|
with them! Numbers can be added, though, why don't we look at that? All mainstream
|
||||||
|
languages can do this quite easily. Here's Typescript:
|
||||||
|
|
||||||
|
```
|
||||||
|
const y = 1+1;
|
||||||
|
```
|
||||||
|
|
||||||
|
When it comes to adding whole numbers, every other language is pretty much the same.
|
||||||
|
Throwing addition into the mix, and branching out to other types of numbers, we
|
||||||
|
can arrive at our first type error. Here it is in Rust:
|
||||||
|
|
||||||
|
```Rust
|
||||||
|
let x = 1.1 + 1;
|
||||||
|
// ^ no implementation for `{float} + {integer}`
|
||||||
|
```
|
||||||
|
|
||||||
|
You see, numbers that are not whole are represented on computers differently
|
||||||
|
than whole numbers are. The former are represented using something called _floating point_
|
||||||
|
(hence the type name `float`). Rust wants the user to be fully aware that they are adding
|
||||||
|
two numbers that have different representations, so it makes such additions a type error
|
||||||
|
by default, preventing it from happening on accident. The type system is used to enforce this.
|
||||||
|
In Java, addition like this is perfectly legal, and conversion is performed automatically.
|
||||||
|
|
||||||
|
```Java
|
||||||
|
double x = 1.1 + 1;
|
||||||
|
```
|
||||||
|
|
||||||
|
The addition produces a double-precision (hence `double`) floating point value. If
|
||||||
|
we were to try to claim that `x` is an integer, we would be stopped.
|
||||||
|
|
||||||
|
```Java
|
||||||
|
int x = 1.1 + 1;
|
||||||
|
// ^ incompatible types: possible lossy conversion from double to int
|
||||||
|
```
|
||||||
|
|
||||||
|
If we tried to save the result of `1.1+1` as an integer, we'd have to throw away the `.1`;
|
||||||
|
that's what Java means by "lossy". This is something that the Java designers didn't
|
||||||
|
want users to do accidentally. The type system ensures that if either number
|
||||||
|
being added is a `float` (or `double`), then so is the result of the addition.
|
||||||
|
|
||||||
|
In TypeScript, all numbers have the same representation, so there's no way to create
|
||||||
|
a type error by adding two of them.
|
||||||
|
|
||||||
|
```TypeScript
|
||||||
|
const x: number = 1.1 + 1; // just fine!
|
||||||
|
```
|
||||||
|
|
||||||
|
That concludes the second round of real-world examples. Let's take a look at formalizing
|
||||||
|
all of this mathematically.
|
Loading…
Reference in New Issue
Block a user