580 lines
27 KiB
Markdown
580 lines
27 KiB
Markdown
---
|
|
title: "Everything I Know About Types: Basics"
|
|
date: 2022-06-30T19:08:50-07:00
|
|
tags: ["Type Systems", "Programming Languages"]
|
|
series: "Everything I Know About Types"
|
|
draft: true
|
|
---
|
|
|
|
{{< katex_component_js >}}
|
|
{{< bergamot_js_css >}}
|
|
|
|
It's finally time to start looking at types. As I mentioned, I want
|
|
to take an approach that draws a variety of examples from the real
|
|
world - I'd like to talk about examples from real programming
|
|
languages. Before doing that, though, let's start with a (working)
|
|
definition of what a type even is. Let's try the following:
|
|
|
|
__A type is a category of values in a programming language.__
|
|
|
|
Values are grouped together in a category if they behave similarly
|
|
to each other, in some sense. All integers can be added together
|
|
or multiplied; all strings can be reversed and concatenated;
|
|
all objects of some class `Dog` have a method called `bark`.
|
|
It is precisely this categorization that makes type systems
|
|
practical; since values of a type have common behavior, it's
|
|
sufficient to reason about some abstract (non-specific) value
|
|
of a particular type, and there is no need to consider how a
|
|
program behaves in all possible scenarios. Furthermore, since
|
|
values of _different_ types may behave differently, it is _not_
|
|
safe to use a value of one type where a value of another type
|
|
was expected.
|
|
|
|
In the above paragraph, I already introduced some examples
|
|
of types. Let's take a look at some examples in the wild,
|
|
starting with numbers. TypeScript gives us the easiest time in
|
|
this case: it has a type called `number`. In the below code
|
|
snippet, I assign a number to the variable `x` so that I can
|
|
explicitly write its type.
|
|
|
|
```TypeScript
|
|
const x: number = 1;
|
|
```
|
|
|
|
In other languages, the situation is slightly more complicated; it's
|
|
frequently necessary to distinguish between values that could have
|
|
a fractional portion (real numbers) and numbers that are always whole
|
|
(integers). For the moment, let's focus on integers. These are
|
|
ubiquitous in various programming languages; Java has `int`:
|
|
|
|
```Java
|
|
int x = 0;
|
|
```
|
|
|
|
Things in C++, C#, and many other languages look very similar.
|
|
In Rust, we have to make an even finer distinction: we have to
|
|
distinguish between integers represented using 32 bits and those
|
|
represented by 64 bits. Focusing on the former, we
|
|
could write:
|
|
|
|
```Rust
|
|
let x: i32 = 0;
|
|
```
|
|
|
|
In Haskell, we can confirm the type of a value without having to
|
|
assign it to a variable; the following suffices.
|
|
|
|
```Haskell
|
|
1 :: Int
|
|
```
|
|
|
|
That should be enough examples of integers for now. I'm sure you've seen
|
|
them in your programming or computer science career. What you
|
|
may not have seen, though, is the formal, mathematical way of
|
|
stating that some expression or value has a particular type.
|
|
In the mathematical notation, too, there's no need to assign a value to
|
|
a variable to state its type. The notation is actually very similar
|
|
the that of Haskell; here's how one might write the claim that 1 is a number.
|
|
|
|
{{< latex >}}
|
|
1:\text{number}
|
|
{{< /latex >}}
|
|
|
|
There's one more difference between mathematical notation and the
|
|
code we've seen so far. If you wrote `num`, or `aNumber`, or anything
|
|
other than just `number` in the TypeScript example (or if you similarly
|
|
deviated from the "correct" name in other languages), you'd be greeted with
|
|
an error. The compilers or interpreters of these languages only understand a
|
|
fixed set of types, and we are required to stick to names in that set. We have no such
|
|
duty when using mathematical notation. The main goal of a mathematical definition
|
|
is not to run the code, or check if it's correct; it's to communicate something
|
|
to others. As long as others understand what you mean, you can do whatever you want.
|
|
I _chose_ to use the word \(\text{number}\) to represent the type
|
|
of numbers, mainly because it's _very_ clear what that means. A theorist writing
|
|
a paper might cringe at the verbosity of such a convention. My goal, however, is
|
|
to communicate things to _you_, dear reader, and I think it's best to settle for
|
|
clarity over brevity.
|
|
|
|
Actually, this illustrates a general principle. It's not just the names of the types
|
|
that we have full control over; it's the whole notation. We could just as well have
|
|
written the claim as follows:
|
|
|
|
{{< latex >}}
|
|
\cdot\ \text{nummie}\ \sim
|
|
{{< /latex >}}
|
|
|
|
As long as the reader knew that a single dot represents the number 1, "nummie"
|
|
represents numbers, and the tilde represents "has type", we'd technically
|
|
be fine. Of course, this is completely unreadable, and certainly unconventional.
|
|
I will do my best to stick to the notational standards established in programming
|
|
languages literature. Nevertheless, keep this in mind: __we control the notation__.
|
|
It's perfectly acceptable to change how something is written if it makes it easier
|
|
to express whatever you want to express, and this is done frequently in practice.
|
|
Another consequence of this is that not everyone agrees on notation; according
|
|
to [this paper](https://labs.oracle.com/pls/apex/f?p=LABS:0::APPLICATION_PROCESS%3DGETDOC_INLINE:::DOC_ID:959),
|
|
27 different ways of writing down substitutions were observed in the POPL conference alone.
|
|
|
|
{{< bergamot_preset name="notation-preset" prompt="PromptConverter @ prompt(type(?term, ?t)) <- input(?term);" >}}
|
|
TNumber @ type(lit(?n), number) <- num(?n);
|
|
{{< /bergamot_preset >}}
|
|
|
|
{{< bergamot_exercise label="bergamot; tweaking notation" preset="notation-preset" id="exercise-1" >}}
|
|
Bergamot, the interactive tool I've developed for doing exercises, supports
|
|
customizing the notation for rules. Try changing the \(:\) symbol to
|
|
the \(\sim\) symbol (denoted in latex as `\sim`).
|
|
|
|
To change the way that rules are rendered, click the "Presentation Rules"
|
|
tab in the "Rules" section. There will be a lot there: I've added rules for
|
|
pretty-printing a fair amount of the standard programming languages notation.
|
|
Scroll down to `LatexTypeBin`, and change `:` to
|
|
`\\sim` on that line (the extra backslash is to handle string
|
|
escaping). Now try typing numbers into the input box; you should see
|
|
something like \(1 \sim \text{number} \)
|
|
{{< /bergamot_exercise >}}
|
|
|
|
One more thing. So far, we've only written down one claim: the value 1 is a number.
|
|
What about the other numbers? To make sure they're accounted for, we need similar
|
|
rules for 2, 3, and so on.
|
|
|
|
{{< latex >}}
|
|
2:\text{number} \quad 3:\text{number} \quad ...
|
|
{{< /latex >}}
|
|
|
|
This gets tedious quickly. All these rules look the same! It would be much nicer if we could
|
|
write down the "shape" of these rules, and understand that there's one such rule for each number.
|
|
This is exactly what is done in PL. We'd write the following.
|
|
|
|
{{< latex >}}
|
|
n:\text{number}
|
|
{{< /latex >}}
|
|
|
|
What's this \(n\)? First, recall that notation is up to us. I'm choosing to use the letter
|
|
\(n\) to stand for "any value that is a number". We write a symbol, say what we want it to mean,
|
|
and we're done.
|
|
|
|
{{< dialog >}}
|
|
{{< message "question" "reader" >}}
|
|
Hold on, isn't this a bit circular? We're saying that \(n\) ranges over all numbers,
|
|
and then saying that a number is anything of the form \(n\). A circular definition
|
|
of this sort doesn't inspire faith in this approach.{{< /message >}}
|
|
{{< message "answer" "Daniel" >}}
|
|
You raise a good point. I am <em>not</em> defining \(n\) to be "any expression that
|
|
has type \(\text{number}\)"; rather, there are <em>two</em> meanings to the word "number"
|
|
in use here. First, there's number-the-symbol. This concept refers to the numerical
|
|
symbols in the syntax of our programming language, like <code>1</code>, <code>3.14</code> or <code>1000</code>.
|
|
Such symbols exist even in untyped languages. Second, there's number-the-type, written
|
|
in our mathematical notation as \(\text{number}\). This is, as we saw earlier, a
|
|
category of values.<br>
|
|
<br>
|
|
The rule, then, associates with numerical constants in our language (represented
|
|
by \(n\)) the type \(\text{number}\).
|
|
{{< /message >}}
|
|
{{< /dialog >}}
|
|
|
|
Actually, to be extra precise, we might want to be explicit about our claim
|
|
that \(n\) is a number, rather than resorting to notational conventions.
|
|
To do so, we'd need to write something like the following:
|
|
|
|
{{< latex >}}
|
|
\cfrac{n \in \texttt{Num}}{n : \text{number}}
|
|
{{< /latex >}}
|
|
|
|
Where \(\texttt{Num}\) denotes the set of numbers in our syntax (`1`, `3.14`, etc.)
|
|
The stuff about the line is called a premise, and it's a simply a condition
|
|
required for the rule to hold. The rule then says that \(n\) has type number --
|
|
but only if \(n\) is a numeric symbol in our language. We'll talk about premises
|
|
in more detail later on.
|
|
|
|
Having introduced this variable-like thing \(n\), we need to be careful.
|
|
It's important to note that the letter \(n\) is
|
|
not a variable like `x` in our code snippets above. In fact, it's not at all part of the programming
|
|
language we're discussing. Rather, it's kind of like a variable in our _rules_.
|
|
|
|
This distinction comes up a lot. The thing is, the notation we're building up to talk about programs is its own
|
|
kind of language. It's not meant for a computer to execute, mind you, but that's not a requirement
|
|
for something to be language (ever heard of English?). The bottom line is, we have symbols with
|
|
particular meanings, and there are rules to how they have to be written. The statement "1 is a number"
|
|
must be written by first writing 1, then a colon, then \(\text{number}\). It's a language.
|
|
|
|
Really, then, we have two languages to think about:
|
|
* The _object language_ is the programming language we're trying to describe and mathematically
|
|
formalize. This is the language that has variables like `x`, keywords like `let` and `const`, and so on.
|
|
|
|
Some examples of our object language that we've seen so far are `1` and `2+3`.
|
|
In our mathematical notation, they look like \(1\) and \(2+3\).
|
|
|
|
* The _meta language_ is the notation we use to talk about our object language. It consists of
|
|
the various symbols we define, and is really just a system for communicating various things
|
|
(like type rules) to others.
|
|
|
|
Expressions like \(n \in \texttt{Num}\) and \(1 : \text{number}\)
|
|
are examples of our meta language.
|
|
|
|
Using this terminology, \(n\) is a variable in our meta language; this is commonly called
|
|
a _metavariable_. A rule such as \(n:\text{number}\) that contains metavariables isn't
|
|
really a rule by itself; rather, it stands for a whole bunch of rules, one for each possible
|
|
number that \(n\) can be. We call this a _rule schema_.
|
|
|
|
Alright, that's enough theory for now. Let's go back to the real world. Working with
|
|
plain old values like `1` gets boring quickly. There's not many programs you can write
|
|
with them! Numbers can be added, though, why don't we look at that? All mainstream
|
|
languages can do this quite easily. Here's Typescript:
|
|
|
|
```TypeScript
|
|
const y = 1+1;
|
|
```
|
|
|
|
When it comes to adding whole numbers, every other language is pretty much the same.
|
|
Throwing other types of numbers into the mix, we
|
|
can arrive at our first type error. Here it is in Rust:
|
|
|
|
```Rust
|
|
let x = 1.1 + 1;
|
|
// ^ no implementation for `{float} + {integer}`
|
|
```
|
|
|
|
You see, numbers that are not whole are represented on computers differently
|
|
than whole numbers are. The former are represented using something called _floating point_
|
|
(hence the type name `float`). Rust wants the user to be fully aware that they are adding
|
|
two numbers that have different representations, so it makes such additions a type error
|
|
by default, preventing it from happening on accident. The type system is used to enforce this.
|
|
In Java, addition like this is perfectly legal, and conversion is performed automatically.
|
|
|
|
```Java
|
|
double x = 1.1 + 1;
|
|
```
|
|
|
|
The addition produces a double-precision (hence `double`) floating point value. If
|
|
we were to try to claim that `x` is an integer, we would be stopped.
|
|
|
|
```Java
|
|
int x = 1.1 + 1;
|
|
// ^ incompatible types: possible lossy conversion from double to int
|
|
```
|
|
|
|
If we tried to save the result of `1.1+1` as an integer, we'd have to throw away the `.1`;
|
|
that's what Java means by "lossy". This is something that the Java designers didn't
|
|
want users to do accidentally. The type system ensures that if either number
|
|
being added is a `float` (or `double`), then so is the result of the addition.
|
|
|
|
In TypeScript, all numbers have the same representation, so there's
|
|
{{< sidenote "right" "typescript-no-error-note" "no way to create a type error" >}}
|
|
Please don't interpret this as praise of TypeScript, or as criticism of
|
|
other languages. In some languages, it makes more sense to throw type errors
|
|
in cases like this, while in others it does not. Instead of arguing
|
|
about some hypothetical "ideal type system", I'd rather explore the various
|
|
approaches that seem to thrive in the real world.
|
|
{{< /sidenote >}}
|
|
by adding two of them.
|
|
|
|
```TypeScript
|
|
const x: number = 1.1 + 1; // just fine!
|
|
```
|
|
|
|
That concludes the second round of real-world examples. Let's take a look at formalizing
|
|
all of this mathematically. As a starting point, we can look at a rule that matches the TypeScript
|
|
view of having only a single number type, \(\text{number}\). This rule needs a little
|
|
bit "more" than the ones we've seen so far; we can't just blindly give things in the
|
|
form \(a+b\) the type \(\text{number}\) (what if we're adding strings?). For our
|
|
rule to behave in the way we have in mind, it's necessary for us to add _premises_.
|
|
Before I explain any further, let me show you the rule.
|
|
|
|
{{< latex >}}
|
|
\frac{e_1:\text{number}\quad e_2:\text{number}}{e_1+e_2:\text{number}}
|
|
{{< /latex >}}
|
|
|
|
In the above (and elsewhere) we will use the metavariable \(e\) as a stand-in for
|
|
any _expression_ in our source language. In general, expressions are things such as `1`,
|
|
`x`, `1.0+someFunction(y)`, and so on. In other words, they're things we can evaluate
|
|
to a value. For the purposes of this article, though, we're only looking at basic
|
|
types such as numbers and strings, as well as adding them. Thus, at this point, expressions look more like
|
|
`1`, `1+2`, `"hello"+1`. Notice from that last one that expressions need not be well-typed.
|
|
For the moment, we will avoid rules for checking _statements_ (like `let x = 5;`).
|
|
|
|
Rules like the above consist of premises (above the line) and conclusions (below the line).
|
|
The conclusion is the claim / fact that we can determine from the rule. In this specific case,
|
|
the conclusion is that \(e_1+e_2\) has type \(\text{number}\).
|
|
For this to be true, however, some conditions must be met; specifically, the sub-expressions
|
|
\(e_1\) and \(e_2\) must themselves be of type \(\text{number}\). These are the premises.
|
|
Reading in plain English, we could pronounce this rule as:
|
|
|
|
> If \(e_1\) and \(e_2\) have type \(\text{number}\), then \(e_1+e_2\) has type \(\text{number}\).
|
|
|
|
Notice that we don't care what the left and right operands are (we say they can be any expression).
|
|
We need not concern ourselves with how to compute _their_ type in this specific rule. Thus, the rule
|
|
would work for expressions like `1+1`, `(1+2)+(3+4)`, `1+x`, and so on, provided other rules
|
|
take care of figuring out the types of expressions like `1` and `x`. In this way, when we add
|
|
a feature to our language, we typically only need to add one or two associated rules; the
|
|
ones for other language features are typically unaffected.
|
|
|
|
Just to get some more practice, let's take a look at a rule for adding strings.
|
|
|
|
{{< latex >}}
|
|
\frac{e_1:\text{string}\quad e_2:\text{string}}{e_1+e_2:\text{string}}
|
|
{{< /latex >}}
|
|
|
|
This rule is read as follows:
|
|
|
|
> If \(e_1\) and \(e_2\) have type \(\text{string}\), then \(e_1+e_2\) has type \(\text{string}\).
|
|
|
|
{{< bergamot_preset name="string-preset" prompt="PromptConverter @ prompt(type(?term, ?t)) <- input(?term);" query="\"hello\"+\"world\"">}}
|
|
TNumber @ type(lit(?n), number) <- num(?n);
|
|
TPlusI @ type(plus(?e_1, ?e_2), number) <-
|
|
type(?e_1, number), type(?e_2, number);
|
|
{{< /bergamot_preset >}}
|
|
|
|
{{< bergamot_exercise label="bergamot; adding rules for strings" preset="string-preset" id="exercise-2" >}}
|
|
Try writing the Bergamot rules that correspond to the inference rule for strings
|
|
above. I've provided the rules for numbers; the rules for strings should be quite
|
|
similar.
|
|
|
|
In Bergamot, the claim that an expression `e` has type `t`
|
|
is written as `type(e, t)`. A rule looks like `RuleName @ conclusion <- premise1, premise2;`.
|
|
Thus, the rule
|
|
|
|
```
|
|
TNumber @ type(lit(?n), number) <- num(?n);
|
|
```
|
|
|
|
Has one premise, that the term \(n\) is a number, and the conclusion is that
|
|
a number literal has type \(\text{number}\). The `num` condition
|
|
is a Bergamot builtin, corresponding to our earlier notation of \(n \in \texttt{Num}\).
|
|
It holds for all numbers: it's always true that `num(1)`, `num(2)`,
|
|
etc. The equivalent builtin for something being a string is `str`.
|
|
|
|
To edit the rules in Bergamot, click the "Editor" button in the "Rules"
|
|
section. You will need to add two rules, just like we did for numbers:
|
|
a rule for string literals (something like \(\texttt{"Hello"} : \text{string}\),
|
|
but more general) and for adding two strings together. I suggest naming
|
|
these two rules `TString` and `TPlusS` respectively.
|
|
|
|
When you're done, you should be able to properly determine the types of
|
|
expressions such as `"Hello"` and `"Hello" + "World"`.
|
|
{{< /bergamot_exercise >}}
|
|
|
|
These rules generally work in other languages. Things get more complicated in languages like Java and Rust,
|
|
where types for numbers are more precise (\(\text{int}\) and \(\text{float}\) instead of
|
|
\(\text{number}\)). In these languages, we need rules for both.
|
|
|
|
{{< latex >}}
|
|
\frac{e_1:\text{int}\quad e_2:\text{int}}{e_1+e_2:\text{int}}
|
|
\quad\quad
|
|
\frac{e_1:\text{float}\quad e_2:\text{float}}{e_1+e_2:\text{float}}
|
|
{{< /latex >}}
|
|
|
|
But then, we also saw that Java is perfectly fine with adding a float to an integer,
|
|
and an integer to a float (the result is a float). Thus, we also add the following
|
|
two rules (but only in the Java case, since, as we have seen, Rust disallows
|
|
adding a float to an integer).
|
|
|
|
{{< latex >}}
|
|
\frac{e_1:\text{int}\quad e_2:\text{float}}{e_1+e_2:\text{float}}
|
|
\quad\quad
|
|
\frac{e_1:\text{float}\quad e_2:\text{int}}{e_1+e_2:\text{float}}
|
|
{{< /latex >}}
|
|
|
|
You might find this process of adding rules for each combination of operand
|
|
types tedious, and I would agree. In general, if a language
|
|
provides a lot of automatic conversions between types, we do _not_ explicitly
|
|
provide rules for all of the possible scenarios. Rather, we'd introduce a general
|
|
framework of subtypes, and then have a small number of rules that are responsible
|
|
for converting expressions from one type to another. That way, we'd only need to list
|
|
the integer-integer and the float-float rules. The rest would follow
|
|
from the conversion rules. Chapter 15 of _Types and Programming Languages_
|
|
by Benjamin Pierce is a nice explanation, but the [Wikipedia page](https://en.wikipedia.org/wiki/Subtyping)
|
|
ain't bad, either.
|
|
|
|
{{< bergamot_preset name="conversion-preset" prompt="PromptConverter @ prompt(type(?term, ?t)) <- input(?term);" >}}
|
|
section "Conversion rules" {
|
|
ConvertsIS @ converts(integer, string) <-;
|
|
ConvertsIF @ converts(integer, float) <-;
|
|
ConvertsFS @ converts(float, string) <-;
|
|
}
|
|
|
|
section "Rules for literals" {
|
|
TInt @ type(lit(?n), integer) <- int(?n);
|
|
TFloat @ type(lit(?f), float) <- float(?f);
|
|
TString @ type(lit(?s), string) <- str(?s);
|
|
}
|
|
|
|
section "" {
|
|
TPlusInt @ type(plus(?e_1, ?e_2), integer) <- type(?e_1, integer), type(?e_2, integer);
|
|
TPlusFloat @ type(plus(?e_1, ?e_2), float) <- type(?e_1, float), type(?e_2, float);
|
|
TPlusString @ type(plus(?e_1, ?e_2), string) <- type(?e_1, string), type(?e_2, string);
|
|
}
|
|
|
|
TConverts @ type(?e, ?tau_2) <- type(?e, ?tau_1), converts(?tau_1, ?tau_2);
|
|
{{< /bergamot_preset >}}
|
|
|
|
{{< bergamot_exercise label="advanced; a taste of conversions" preset="conversion-preset" id="exercise-3" >}}
|
|
This exercise is simply an early taste of formalizing conversions, which
|
|
allow users to (for example) write numbers where the language expects strings, with the
|
|
understanding that the number will be automatically turned into a string.
|
|
|
|
To avoid having an explosion of various rules, we instead define the "converts to"
|
|
relation, \(\tau_1 \preceq \tau_2\), where \(\tau_1\) and \(\tau_2\)
|
|
are types. To say that an integer can be automatically converted to a floating
|
|
pointer number, we can write \(\text{integer} \preceq \text{float}\).
|
|
Then, we add only a single additional rule to our language: `TConverts`.
|
|
This rule says that we can treat an expression of type \(\tau_1\) as
|
|
an expression of type \(\tau_2\), if the former can be converted to the
|
|
latter.
|
|
|
|
I have written some rules using these concepts. Input some expressions into
|
|
the box below that would require a conversion: some examples might be
|
|
`1 + 3.14` (adding an integer to a float), `1 + "hello"` (adding
|
|
an integer to a string), and `1.0 + "hello"` (adding a float to a string).
|
|
Click the "Proof Tree" tab to see how the various rules combine to make
|
|
the expression well-typed.
|
|
|
|
Now, remove the `ConvertsIS` rule that allows integers to be converted to
|
|
strings. Do all of the expressions from the previous paragraph still typecheck?
|
|
Can you see why?
|
|
|
|
{{< /bergamot_exercise >}}
|
|
|
|
Subtyping, however, is quite a bit beyond the scope of a "basics"
|
|
post. For the moment, we shall content ourselves with the tedious approach.
|
|
|
|
Another thing to note is that we haven't yet seen rules for what programs are _incorrect_,
|
|
and we never will. When formalizing type systems we rarely (if ever) explicitly enumerate
|
|
cases that produce errors. Rather, we interpret the absence of matching rules to indicate
|
|
that something is wrong. Since no rule has premises that match \(e_1:\text{float}\) and \(e_2:\text{string}\),
|
|
we can infer that
|
|
{{< sidenote "right" "float-string-note" "given the rules so far," >}}
|
|
I'm trying to be careful here, since adding a float to a string
|
|
is perfectly valid in Java (the float is automatically converted to
|
|
a string, and the two are concatenated).
|
|
{{< /sidenote >}}
|
|
`1.0+"hello"` is invalid. The task of adding good type error messages, then,
|
|
is usually a more practical concern, and is undertaken by compiler developers
|
|
rather than type theorists. There are, of course, exceptions; check out,
|
|
for instance, these papers on improving type error messages, as well
|
|
as this tool that showed up only a week or two before I started writing
|
|
this article.
|
|
|
|
- [Chameleon](https://chameleon.typecheck.me/), a tool for debugging Haskell error messages.
|
|
- [This paper](https://dl.acm.org/doi/10.1145/3412932.3412939), titled _Type debugging with counter-factual type error messages using an existing type checker_.
|
|
- [Another paper](https://dl.acm.org/doi/10.1145/507635.507659), _Compositional explanation of types and algorithmic debugging of type errors_.
|
|
|
|
One last thing. So far, I've been defining what each metavariable means by giving you
|
|
examples. However, this is not common practice in the theory of programming languages.
|
|
There is actually another form of notation that's frequently used from defining what
|
|
various symbols mean; it's called a [grammar](https://en.wikipedia.org/wiki/Formal_grammar).
|
|
A grammar for our little addition language looks like this:
|
|
|
|
{{< latex >}}
|
|
\begin{array}{rcll}
|
|
e & ::= & n & \text{(numbers)} \\
|
|
& | & s & \text{(strings)} \\
|
|
& | & e+e & \text{(addition)}
|
|
\end{array}
|
|
{{< /latex >}}
|
|
|
|
This reads,
|
|
|
|
> An expression is a number or a string or the sum of two expressions.
|
|
|
|
I think this is all I wanted to cover in this part. We've gotten a good start!
|
|
Here's a quick summary of what we've covered:
|
|
|
|
1. It is convenient to view a type as a category of values that have
|
|
similar behavior. All numbers can be added, all strings can be reversed,
|
|
and so on and so forth. Some types in existing programming languages
|
|
are the `number` type from TypeScript, `i32` and `int` for integers in Rust
|
|
and Java, respectively, and `float` for real
|
|
{{< sidenote "right" "float-not-real" "(kinda)" >}}
|
|
A precise reader might point out that a floating point number can only
|
|
represent certain values of real numbers, and definitely not all of them.
|
|
{{< /sidenote >}}
|
|
numbers in C++.
|
|
2. In programming language theory, we usually end up with two languages
|
|
when trying to formalize various concepts. The _object language_
|
|
is the actual programming language we're trying to study, like Python
|
|
or TypeScript. The _meta language_ is the language that we use
|
|
to reason and talk about the object language. Typically, this is
|
|
the language we use for writing down our rules.
|
|
3. The common type-theoretic notation for "expression \(x\)
|
|
has type \(\tau\)" is \(x : \tau\).
|
|
4. In writing more complicated rules, we will frequently make use
|
|
of the inference rule notation, which looks something like
|
|
the following.
|
|
{{< latex >}}
|
|
\frac{P_1 \quad P_2 \quad ... \quad P_n}{P}
|
|
{{< /latex >}}
|
|
The above is read as "if \(P_1\) through \(P_n\) are
|
|
true, then \(P\) is also true."
|
|
5. To support operators like `+` that can work on, say, both numbers
|
|
and strings, we provide inference rules for each such case. If this
|
|
gets cumbersome, we can introduce a system of _subtypes_ into our
|
|
type system, which preclude the need for creating so many rules;
|
|
however, for the time being, this is beyond the scope of the article.
|
|
|
|
Next time we'll take a look at type checking expressions like `x+1`, where
|
|
variables are involved.
|
|
|
|
### This Page at a Glance
|
|
{{< dialog >}}
|
|
{{< message "question" "reader" >}}
|
|
Hey, what's this section? I thought we were done.
|
|
{{< /message >}}
|
|
{{< message "answer" "Daniel" >}}
|
|
We are. However, various parts of this series will build on each other,
|
|
and as we go along, we'll be accumulating more and more various symbols,
|
|
notation, and rules. I thought it would be nice to provide a short-and-sweet
|
|
reference at the bottom for someone who doesn't want to re-read the whole
|
|
section just to find out what a symbol means.
|
|
{{< /message >}}
|
|
{{< message "question" "reader" >}}
|
|
So it's kind of like a TL;DR?
|
|
{{< /message >}}
|
|
{{< message "answer" "Daniel" >}}
|
|
Yup! I also would like to somehow communicate the feeling of reading
|
|
Programming Languages papers once you are familiar with the notation;
|
|
after a little bit of practice, you can just read the important figures
|
|
and already be up-to-speed on a big chunk of the content.
|
|
{{< /message >}}
|
|
{{< /dialog >}}
|
|
|
|
#### Metavariables
|
|
| Symbol | Meaning | Syntactic Category |
|
|
|---------|--------------|-----------------------|
|
|
| \(n\) | Numbers | \(\texttt{Num}\) |
|
|
| \(s\) | Strings | \(\texttt{Str}\) |
|
|
| \(e\) | Expressions | \(\texttt{Expr}\) |
|
|
|
|
#### Grammar
|
|
{{< block >}}
|
|
{{< latex >}}
|
|
\begin{array}{rcll}
|
|
e & ::= & n & \text{(numbers)} \\
|
|
& | & s & \text{(strings)} \\
|
|
& | & e+e & \text{(addition)}
|
|
\end{array}
|
|
{{< /latex >}}
|
|
{{< /block >}}
|
|
|
|
#### Rules
|
|
{{< foldtable >}}
|
|
| Rule | Description |
|
|
|--------------|-------------|
|
|
| {{< latex >}}\frac{n \in \texttt{Num}}{n : \text{number}} {{< /latex >}}| Number literals have type \(\text{number}\) |
|
|
| {{< latex >}}\frac{s \in \texttt{Str}}{s : \text{string}} {{< /latex >}}| String literals have type \(\text{string}\) |
|
|
| {{< latex >}}\frac{e_1 : \text{string}\quad e_2 : \text{string}}{e_1+e_2 : \text{string}} {{< /latex >}}| Adding strings gives a string |
|
|
| {{< latex >}}\frac{e_1 : \text{number}\quad e_2 : \text{number}}{e_1+e_2 : \text{number}} {{< /latex >}}| Adding numbers gives a number |
|
|
|
|
#### Playground
|
|
{{< bergamot_widget id="widget" query="" prompt="PromptConverter @ prompt(type(?term, ?t)) <- input(?term);" >}}
|
|
section "" {
|
|
TNumber @ type(lit(?n), number) <- num(?n);
|
|
TString @ type(lit(?s), string) <- str(?s);
|
|
}
|
|
section "" {
|
|
TPlusI @ type(plus(?e_1, ?e_2), number) <-
|
|
type(?e_1, number), type(?e_2, number);
|
|
TPlusS @ type(plus(?e_1, ?e_2), string) <-
|
|
type(?e_1, string), type(?e_2, string);
|
|
}
|
|
{{< /bergamot_widget >}}
|
|
|