Start working on the basics part of the type systems articles
This commit is contained in:
parent
d2e8f809d0
commit
5ef10238e3
203
content/blog/01_types_basics.md
Normal file
203
content/blog/01_types_basics.md
Normal file
@ -0,0 +1,203 @@
|
||||
---
|
||||
title: "Everything I Know About Types: Basics"
|
||||
date: 2022-06-30T19:08:50-07:00
|
||||
tags: ["Type Systems", "Programming Languages"]
|
||||
draft: true
|
||||
---
|
||||
|
||||
It's finally time to start looking at types. As I mentioned, I want
|
||||
to take an approach that draws a variety of examples from the real
|
||||
world - I'd like to talk about examples from real programming
|
||||
languages. Before doing that, though, let's start with a (working)
|
||||
definition of what a type even is. Let's try the following:
|
||||
|
||||
__A type is a category of values in a programming language.__
|
||||
|
||||
Values are grouped together in a category if they behave similarly
|
||||
to each other, in some sense. All integers can be added together
|
||||
or multiplied; all strings can be reversed and concatenated;
|
||||
all objects of some class `Dog` have a method called `bark`.
|
||||
It is precisely this categorization that makes type systems
|
||||
practical; since values of a type have common behavior, it's
|
||||
sufficient to reason about some abstract (non-specific) value
|
||||
of a particular type, and there is no need to consider how a
|
||||
program behaves in all possible scenarios. Furthermore, since
|
||||
values of _different_ types may behave differently, it is _not_
|
||||
safe to use a value of one type where a value of another type
|
||||
was expected.
|
||||
|
||||
In the above paragraph, I already introduced some examples
|
||||
of types. Let's take a look at some examples in the wild,
|
||||
starting with numbers. TypeScript gives us the easiest time in
|
||||
this case: it has a type called `number`. In the below code
|
||||
snippet, I assign a number to the variable `x` so that I can
|
||||
explicitly write its type.
|
||||
|
||||
```TypeScript
|
||||
const x: number = 1;
|
||||
```
|
||||
|
||||
In other languages, the situation is slightly more complicated; it's
|
||||
frequently necessary to distinguish between values that could have
|
||||
a fractional portion (real numbers) and numbers that are always whole
|
||||
(integers). For the moment, let's focus on integers. These are
|
||||
ubiquitous in various programming languages; Java has `int`:
|
||||
|
||||
```Java
|
||||
int x = 0;
|
||||
```
|
||||
|
||||
Things in C++, C#, and many other languages look very similar.
|
||||
In rust, we have to make an even finer distinction: we have to
|
||||
distinguish between integers represented using 32 bits and those
|
||||
represented by 64 bits. Focusing on the former, we
|
||||
could write:
|
||||
|
||||
```Rust
|
||||
let x: i32 = 0;
|
||||
```
|
||||
|
||||
In Haskell, we can confirm the type of a value without having to
|
||||
assign it to a variable; the following suffices.
|
||||
|
||||
```Haskell
|
||||
1 :: Int
|
||||
```
|
||||
|
||||
That should be enough examples of integers for now. I'm sure you've seen
|
||||
them in your programming or computer science career. What you
|
||||
may not have seen, though, is the formal / mathematical way of
|
||||
stating that some expression or value has a particular type.
|
||||
In the mathematical notation, too, there's no need to assign a value to
|
||||
a variable to state its type. The notation is actually very similar
|
||||
the that of Haskell; here's how one might write the claim that 1 is a number.
|
||||
|
||||
{{< latex >}}
|
||||
1:\text{number}
|
||||
{{< /latex >}}
|
||||
|
||||
There's one more difference between mathematical notation and the
|
||||
code we've seen so far. If you wrote `num`, or `aNumber`, or anything
|
||||
other than just `numbeer` in the TypeScript example (or if you similarly
|
||||
deviated from the "correct" name in other languages), you'd be greeted with
|
||||
an error. The compilers or interpreters of these languages only understand a
|
||||
fixed set of types, and we are required to stick to names in that set. We have no such
|
||||
duty when using mathematical notation. The main goal of a mathematical definition
|
||||
is not to run the code, or check if it's correct; it's to communicate something
|
||||
to others. As long as others understand what you mean, you can do whatever you want.
|
||||
I _chose_ to use the word \\(\\text{number}\\) to represent the type
|
||||
of numbers, mainly because it's _very_ clear what that means. A theorist writing
|
||||
a paper might cringe at the verbosity of such a convention. My goal, however, is
|
||||
to communicate things to _you_, dear reader, and I think it's best to settle for
|
||||
clarity over brevity.
|
||||
|
||||
Actually, this illustrates a general principle. It's not just the names of the types
|
||||
that we have full control over; it's the whole notation. We could just as well have
|
||||
written the claim as follows:
|
||||
|
||||
{{< latex >}}
|
||||
\cdot\ \text{nummie}\ \sim
|
||||
{{< /latex >}}
|
||||
|
||||
As long as the reader knew that a single dot represents the number 1, "nummie"
|
||||
represents numbers, and the tilde represents "has type", we'd technically
|
||||
be fine. Of course, this is completely unreadable, and certainly unconventional.
|
||||
I will do my best to stick to the notational standards established in programming
|
||||
languages literature. Nevertheless, keep this in mind: __we control the notation__.
|
||||
It's perfectly acceptable to change how something is written if it makes it easier
|
||||
to express whatever you want to express, and this is done frequently in practice.
|
||||
Another consequence of this is that not everyone agrees on notation; according
|
||||
to [this paper](https://labs.oracle.com/pls/apex/f?p=LABS:0::APPLICATION_PROCESS%3DGETDOC_INLINE:::DOC_ID:959),
|
||||
27 different ways of writing down substitutions were observed in the POPL conference alone.
|
||||
|
||||
One more thing. So far, we've only written down one claim: the value 1 is a number.
|
||||
What about the other numbers? To make sure they're accounted for, we need similar
|
||||
rules for 2, 3, and so on.
|
||||
|
||||
{{< latex >}}
|
||||
2:\text{number} \quad 3:\text{number} \quad ...
|
||||
{{< /latex >}}
|
||||
|
||||
This gets tedious quickly. All these rules look the same! It would be much nicer if we could
|
||||
write down the "shape" of these rules, and understand that there's one such rule for each number.
|
||||
This is exactly what is done in PL. We'd write the following.
|
||||
|
||||
{{< latex >}}
|
||||
n:\text{number}
|
||||
{{< /latex >}}
|
||||
|
||||
What's this \\(n\\)? First, recall that notation is up to us. I'm choosing to use the letter
|
||||
\\(n\\) to stand for "any value that is a number". We write a symbol, say what we want it to mean,
|
||||
and we're done. But then, we need to be careful. It's important to note that the letter \\(n\\) is
|
||||
not a variable like `x` in our code snippets above. In fact, it's not at all part of the programming
|
||||
language we're discussing. Rather, it's kind of like a variable in our _rules_.
|
||||
|
||||
This distinction comes up a lot. The thing is, the notation we're building up to talk about programs is its own
|
||||
kind of language. It's not meant for a computer to execute, mind you, but that's not a requirement
|
||||
for something to be language (ever heard of English?). The bottom line is, we have symbols with
|
||||
particular meanings, and there are rules to how they have to be written. The statement "1 is a number"
|
||||
must be written by first writing 1, then a colon, then \\(\text{number}\\). It's a language.
|
||||
|
||||
Really, then, we have two languages to think about:
|
||||
* The _object language_ is the programming language we're trying to describe and mathematically
|
||||
formalize. This is the language that has variables like `x`, keywords like `let` and `const`, and so on.
|
||||
* The _meta language_ is the notation we use to talk about our object language. It consists of
|
||||
the various symbols we define, and is really just a system for communicating various things
|
||||
(like type rules) to others.
|
||||
|
||||
Using this terminology, \\(n\\) is a variable in our meta language; this is commonly called
|
||||
a _metavariable_. A rule such as \\(n:\\text{number}\\) that contains metavariables isn't
|
||||
really a rule by itself; rather, it stands for a whole bunch of rules, one for each possible
|
||||
number that \\(n\\) can be. We call this a _rule schema_.
|
||||
|
||||
Alright, that's enough theory for now. Let's go back to the real world. Working with
|
||||
plain old values like `1` gets boring quickly. There's not many programs you can write
|
||||
with them! Numbers can be added, though, why don't we look at that? All mainstream
|
||||
languages can do this quite easily. Here's Typescript:
|
||||
|
||||
```
|
||||
const y = 1+1;
|
||||
```
|
||||
|
||||
When it comes to adding whole numbers, every other language is pretty much the same.
|
||||
Throwing addition into the mix, and branching out to other types of numbers, we
|
||||
can arrive at our first type error. Here it is in Rust:
|
||||
|
||||
```Rust
|
||||
let x = 1.1 + 1;
|
||||
// ^ no implementation for `{float} + {integer}`
|
||||
```
|
||||
|
||||
You see, numbers that are not whole are represented on computers differently
|
||||
than whole numbers are. The former are represented using something called _floating point_
|
||||
(hence the type name `float`). Rust wants the user to be fully aware that they are adding
|
||||
two numbers that have different representations, so it makes such additions a type error
|
||||
by default, preventing it from happening on accident. The type system is used to enforce this.
|
||||
In Java, addition like this is perfectly legal, and conversion is performed automatically.
|
||||
|
||||
```Java
|
||||
double x = 1.1 + 1;
|
||||
```
|
||||
|
||||
The addition produces a double-precision (hence `double`) floating point value. If
|
||||
we were to try to claim that `x` is an integer, we would be stopped.
|
||||
|
||||
```Java
|
||||
int x = 1.1 + 1;
|
||||
// ^ incompatible types: possible lossy conversion from double to int
|
||||
```
|
||||
|
||||
If we tried to save the result of `1.1+1` as an integer, we'd have to throw away the `.1`;
|
||||
that's what Java means by "lossy". This is something that the Java designers didn't
|
||||
want users to do accidentally. The type system ensures that if either number
|
||||
being added is a `float` (or `double`), then so is the result of the addition.
|
||||
|
||||
In TypeScript, all numbers have the same representation, so there's no way to create
|
||||
a type error by adding two of them.
|
||||
|
||||
```TypeScript
|
||||
const x: number = 1.1 + 1; // just fine!
|
||||
```
|
||||
|
||||
That concludes the second round of real-world examples. Let's take a look at formalizing
|
||||
all of this mathematically.
|
Loading…
Reference in New Issue
Block a user