Continue work on the type theory draft

This commit is contained in:
Danila Fedorin 2022-07-02 17:30:31 -07:00
parent bc52c2c685
commit 10a4435760

View File

@ -229,4 +229,100 @@ const x: number = 1.1 + 1; // just fine!
``` ```
That concludes the second round of real-world examples. Let's take a look at formalizing That concludes the second round of real-world examples. Let's take a look at formalizing
all of this mathematically. all of this mathematically. As a starting point, we can look at a rule that matches the TypeScript
view of having only a single number type, \\(\\text{number}\\). This rule needs a little
bit "more" than the ones we've seen so far; we can't just blindly give things in the
form \\(a+b\\) the type \\(\\text{number}\\) (what if we're adding strings?). For our
rule to behave in the way we have in mind, it's necessary for us to add _premises_.
Before I explain any further, let me show you the rule.
{{< latex >}}
\frac{e_1:\text{number}\quad e_2:\text{number}}{e_1+e_2:\text{number}}
{{< /latex >}}
In the above (and elsewhere) we will use the metavariable \\(e\\) as a stand-in for
any _expression_ in our source language. Expressions are things such as `1`,
`x`, `1.0+someFunction(y)`, and so on. In other words, they're things we can evaluate
to a value. For the moment, we will avoid rules for checking _statements_ (like `let x = 5;`).
Rules like the above consist of premises (above the line) and conclusions (below the line).
The conclusion is the claim / fact that we can determine from the rule. In this specific case,
the conclusion is that \\(e_1+e_2\\) has type \\(\\text{number}\\).
For this to be true, however, some conditions must be met; specifically, the sub-expressions
\\(e_1\\) and \\(e_2\\) must themselves be of type \\(\\text{number}\\). These are the premises.
Reading in plain English, we could pronounce this rule as:
> If \\(e_1\\) and \\(e_2\\) have type \\(\\text{number}\\), then \\(e_1+e_2\\) has type \\(\\text{number}\\).
Notice that we don't care what the left and right operands are (we say they can be any expression).
We need not concern ourselves with how to compute _their_ type in this specific rule. Thus, the rule
would work for expressions like `1+1`, `(1+2)+(3+4)`, `1+x`, and so on, provided other rules
take care of figuring out the types of expressions like `1` and `x`. In this way, when we add
a feature to our language, we typically only need to add one or two associated rules; the
ones for other language features are typically unaffected.
Just to get some more practice, let's take a look at a rule for adding strings.
{{< latex >}}
\frac{e_1:\text{string}\quad e_2:\text{string}}{e_1+e_2:\text{string}}
{{< /latex >}}
This rule is read as follows:
> If \\(e_1\\) and \\(e_2\\) have type \\(\\text{string}\\), then \\(e_1+e_2\\) has type \\(\\text{string}\\).
These rules generally work in other languages. Things get more complicated in languages like Java and Rust,
where types for numbers are more precise (\\(\\text{int}\\) and \\(\\text{float}\\) instead of
\\(\\text{number}\\)). In these languages, we need rules for both.
{{< latex >}}
\frac{e_1:\text{int}\quad e_2:\text{int}}{e_1+e_2:\text{int}}
\quad\quad
\frac{e_1:\text{float}\quad e_2:\text{float}}{e_1+e_2:\text{float}}
{{< /latex >}}
But then, we also saw that Java is perfectly fine with adding a float to an integer,
and an integer to a float (the result is a float). Thus, we also add the following
two rules (but only in the Java case, since, as we have seen, Rust disallows
adding a float to an integer).
{{< latex >}}
\frac{e_1:\text{int}\quad e_2:\text{float}}{e_1+e_2:\text{float}}
\quad\quad
\frac{e_1:\text{float}\quad e_2:\text{int}}{e_1+e_2:\text{float}}
{{< /latex >}}
You might find this process of adding rules for each combination of operand
types tedious, and I would agree. In general, if a language
provides a lot of automatic conversions between types, we do _not_ explicitly
provide rules for all of the possible scenarios. Rather, we'd introduce a general
framework of subtypes, and then have a small number of rules that are responsible
for converting expressions from one type to another. That way, we'd only need to list
the integer-integer and the float-float rules. The rest would follow
from the conversion rules.
{{< todo >}}Cite/ reference subtype stuff {{< /todo >}}
Subtyping, however, is quite a bit beyond the scope of a "basics"
post. For the moment, we shall content ourselves with the tedious approach.
Another thing to note is that we haven't yet seen rules for what programs are _incorrect_,
and we never will. When formalizing type systems we rarely (if ever) explicitly enumerate
cases that produce errors. Rather, we interpret the absence of matching rules to indicate
that something is wrong. Since no rule has premises that match \\(e_1:\\text{float}\\) and \\(e_2:\\text{string}\\),
we can infer that
{{< sidenote "right" "float-string-note" "given the rules so far," >}}
I'm trying to be careful here, since adding a float to a string
is perfectly valid in Java (the float is automatically converted to
a string, and the two are concatenated).
{{< /sidenote >}}
`1.0+"hello"` is invalid. The task of adding good type error messages, then,
is usually a more practical concern, and is undertaken by compiler developers
rather than type theorists. There are, of course, exceptions; check out,
for instance, these papers on improving type error messages, as well
as this tool that showed up only a week or two before I started writing
this article.
{{< todo >}}Cite/ reference type error stuff {{< /todo >}}
I think this is all I wanted to cover in this part.