Add a draft of the variables post in the types series
This commit is contained in:
parent
c21757694e
commit
9192b870b6
334
content/blog/02_types_variables.md
Normal file
334
content/blog/02_types_variables.md
Normal file
|
@ -0,0 +1,334 @@
|
||||||
|
---
|
||||||
|
title: "Everything I Know About Types: Variables"
|
||||||
|
date: 2022-08-28T19:05:31-07:00
|
||||||
|
tags: ["Type Systems", "Programming Languages"]
|
||||||
|
draft: true
|
||||||
|
---
|
||||||
|
|
||||||
|
In the [previous article]({{< relref "01_types_basics.md" >}}), we started
|
||||||
|
looking at types. We've looked at the types of strings (`"hello"`),
|
||||||
|
numbers (`1`, `3.14`), and binary operations (`2*3`, `"hello" + "world"`).
|
||||||
|
However, in pretty much all of these examples, one important thing has been
|
||||||
|
missing: variables. We haven't mentioned typechecking code like `x*y`,
|
||||||
|
and the rules we defined in the previous section will not work for such
|
||||||
|
code! In this post, we'll change that.
|
||||||
|
|
||||||
|
Let's take a look at examples of variables in some common programming
|
||||||
|
languages. If you come from a programming background, it's almost certain
|
||||||
|
that you're comfortable with variables. Nevertheless, it can't hurt to
|
||||||
|
see how they appear in many languages (hopefully, with enough variety, at
|
||||||
|
least a little bit of this will be novel to you).
|
||||||
|
|
||||||
|
We've already seen variables in TypeScript. They're declared using `let`
|
||||||
|
or `const`:
|
||||||
|
|
||||||
|
```TypeScript
|
||||||
|
const x: number = 1; // type added just to be explicit
|
||||||
|
let y = x + 1; // x is known to have type number, and thus so is x+1
|
||||||
|
```
|
||||||
|
|
||||||
|
In Kotlin, we have `var` and `val`, for mutable and immutable variables,
|
||||||
|
respectively:
|
||||||
|
```Kotlin
|
||||||
|
val x: String = "hello"; // once again, the type is optional
|
||||||
|
var y = x + 1; // x is known to have type string, and string+int is a string.
|
||||||
|
```
|
||||||
|
|
||||||
|
In Haskell, there's no keyword for declaring variables at the top level. We
|
||||||
|
could just write:
|
||||||
|
```Haskell
|
||||||
|
x :: String -- optional, just to be consistent with previous examples.
|
||||||
|
x = "hello"
|
||||||
|
y = x ++ show 1 -- No automatic conversion from numbers to strings
|
||||||
|
```
|
||||||
|
|
||||||
|
Haskell also has `let` expressions for creating variables inside expressions:
|
||||||
|
```Haskell
|
||||||
|
(let x = 1 in x + x) + x -- outer x is not known
|
||||||
|
```
|
||||||
|
An important thing to note about Haskell `let`-expressions (demonstrated in
|
||||||
|
the above code), is that the variable `x` only exists within their `in`
|
||||||
|
portion. Subsequent code does not get access to `x`, and that last code snippet
|
||||||
|
will result in a type error. This brings us to an important concept, the
|
||||||
|
_scope_ of a variable. A variable's scope is the part of the code where
|
||||||
|
it is available. In the above snippet,
|
||||||
|
{{< sidenote "right" "recursive-let-note" "the part after" >}}
|
||||||
|
Technically, the scope of the variable <code>x</code> starts right
|
||||||
|
after the <code>=</code>; Haskell let-expressions can be recursive.
|
||||||
|
Using only the concepts we've covered so far, we can only use this
|
||||||
|
for silly examples, like <code>let x = x + 1 in x</code>. This piece of
|
||||||
|
code will run infinitely, since it will never be able to finish
|
||||||
|
adding enough <code>1</code>s to find the value of <code>x</code>.
|
||||||
|
However, there are more useful examples of this feature, such
|
||||||
|
as defining temporary recursive functions, or creating infinite
|
||||||
|
lists. We might see these later.
|
||||||
|
{{< /sidenote >}} the `in` and
|
||||||
|
before the `)` is the scope of `x`; referring to it anywhere else is an error.
|
||||||
|
|
||||||
|
In C and C-like languages, the scope
|
||||||
|
of a variable usually continues the end of the _block_ (typically
|
||||||
|
denoted with curly braces) in which it is defined. For instance,
|
||||||
|
here's Rust:
|
||||||
|
```Rust
|
||||||
|
{
|
||||||
|
let x = 1;
|
||||||
|
let y = x + 1;
|
||||||
|
// The scope of x ends here
|
||||||
|
}
|
||||||
|
// Cannot use x here
|
||||||
|
{
|
||||||
|
let x = "hello";
|
||||||
|
// The scope of this x ends here
|
||||||
|
}
|
||||||
|
// Once again, cannot use x
|
||||||
|
```
|
||||||
|
Scopes don't have to be denoted using `{` or `}` -- Haskell, for instance,
|
||||||
|
just has the `in`. In the Crystal programming language, scopes are usually
|
||||||
|
started by some control-flow statement (like `if` or `while`) and ended by
|
||||||
|
`end`.
|
||||||
|
```Crystal
|
||||||
|
if someCondition
|
||||||
|
x = 1
|
||||||
|
# The scope of x ends here
|
||||||
|
else
|
||||||
|
# Cannot use x here
|
||||||
|
end
|
||||||
|
# Crystal is a bit clever, so you can reference
|
||||||
|
# x here, but its type is (Int32 | Nil) to indicate
|
||||||
|
# that it may not have been set.
|
||||||
|
```
|
||||||
|
|
||||||
|
So much for this little tour of variables "in the wild". There are some key
|
||||||
|
aspects of variables that we can gather from the preceding examples:
|
||||||
|
|
||||||
|
1. A variable with the same name can have a different type depending
|
||||||
|
on its location in the code (see the Rust and Crystal examples).
|
||||||
|
2. The type of a variable cannot be guessed from its usage. For
|
||||||
|
instance, both the TypeScript and Kotlin examples have `x+1`,
|
||||||
|
but this expression is of type `number` in the former and `String`
|
||||||
|
in the latter.
|
||||||
|
3. A variable is not always available in every part of the code;
|
||||||
|
it has a scope (Haskell, Rust, Crystal examples).
|
||||||
|
4. In imperative languages, a statement can introduce a variable
|
||||||
|
that subsequent statements can reference (TypeScript, Kotlin,
|
||||||
|
Rust, Crystal).
|
||||||
|
|
||||||
|
To get started with type rules for variables, let's introduce
|
||||||
|
another metavariable, \\(x\\) (along with \\(n\\) from before).
|
||||||
|
Whereas \\(n\\) ranges over any number in our language, \\(x\\) ranges
|
||||||
|
over any variable. It can be used as a stand-in for `x`, `y`, `myVar`, and so on.
|
||||||
|
|
||||||
|
The first property prevents us from writing type rules like the
|
||||||
|
following, since we cannot always assume that a variable has type
|
||||||
|
\\(\\text{number}\\) or \\(\\text{string}\\).
|
||||||
|
{{< latex >}}
|
||||||
|
x : \text{number}
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
The second property prohibits us from trying to guess types of
|
||||||
|
compound expressions without caring too much about the variables;
|
||||||
|
that is, the following rule is invalid.
|
||||||
|
{{< latex >}}
|
||||||
|
x + 1 : \text{number}
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
I won't list rules that are prohibited by the third property, but
|
||||||
|
I do want to note that it prevents us from just having some global
|
||||||
|
list of variables and types "out there", and referring to that in
|
||||||
|
our rules. If all types were globally known, it wouldn't be possible
|
||||||
|
for a variable to take on different types in different places.
|
||||||
|
|
||||||
|
With these constraints in mind, we have enough to start creating
|
||||||
|
rules for expressions (but not statements yet; we'll get to that).
|
||||||
|
The solution to our problem is to add a third "thing" to our rules:
|
||||||
|
the _environment_, typically denoted using the Greek uppercase gamma,
|
||||||
|
\\(\\Gamma\\). The environment is basically a list of pairs associating
|
||||||
|
variables with their types. For instance, if in some situation
|
||||||
|
the variables `x` and `y` were both declared to be of type `int`, we'd
|
||||||
|
write that as follows.
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
\Gamma = \{ \texttt{x} : \text{int}, \texttt{y} : \text{int} \}
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
If, on the other hand, they both had type `string`, our equation would
|
||||||
|
look as follows.
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
\Gamma = \{ \texttt{x} : \text{string}, \texttt{y} : \text{string} \}
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
{{< dialog >}}
|
||||||
|
{{< message "question" "reader" >}}
|
||||||
|
Hey, there's something odd with your typography; how come
|
||||||
|
it says \(\texttt{x}\)? Earlier, you wrote \(x\) for variables.
|
||||||
|
{{< /message >}}
|
||||||
|
{{< message "answer" "Daniel" >}}
|
||||||
|
Good catch, and that is no accident. As I mentioned earlier,
|
||||||
|
\(x\) is a <em>metavariable</em>; it's part of our meta language,
|
||||||
|
and it can stand for many different actual variables in the object
|
||||||
|
language.<br>
|
||||||
|
<br>
|
||||||
|
On the other hand, \(\texttt{x}\) stands for a specific variable,
|
||||||
|
<code>x</code> (it doesn't <em>literally</em> look like <code>x</code>
|
||||||
|
because it's in a mathematical equation, and I can't style it using
|
||||||
|
CSS).<br>
|
||||||
|
<br>
|
||||||
|
In general, \(\textit{italics}\) font is used for metavariables,
|
||||||
|
like \(n\) and \(x\), which can stand for multiple expressions
|
||||||
|
in the object language; on the other hand, \(\texttt{typewriter}\)
|
||||||
|
font will be used for actual, concrete variables, such
|
||||||
|
as \(\texttt{myVariable}\). In practice, the difference is
|
||||||
|
usually clear from context, since concrete variables usually only show up in
|
||||||
|
examples, whereas metavariables only appear in rules. So don't
|
||||||
|
worry too much about spotting the differences in font.
|
||||||
|
{{< /message >}}
|
||||||
|
{{< /dialog >}}
|
||||||
|
|
||||||
|
Okay, so now we know how to write down environments. How does this
|
||||||
|
help us achieve our goal of writing type rules for expressions with
|
||||||
|
variables? Well, the benefits become apparent when we incorporate
|
||||||
|
environments into our typing rules. So far, we've been using the following
|
||||||
|
notation:
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
e : \tau
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
This reads,
|
||||||
|
|
||||||
|
> The expression \\(e\\) [another metavariable, this one is used for
|
||||||
|
all expressions] has type \\(\\tau\\) [also a metavariable, for
|
||||||
|
types].
|
||||||
|
|
||||||
|
However, as we've seen, we can't make global claims like this when variables are
|
||||||
|
involved, since the same expression may have different types depending
|
||||||
|
on the situation. Now, we instead write:
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
\Gamma \vdash e : \tau
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
This version reads,
|
||||||
|
|
||||||
|
> In the environment \\(\\Gamma\\), the expression \\(e\\) has type \\(\\tau\\).
|
||||||
|
|
||||||
|
And here's the difference. The new \\(\\Gamma\\) of ours captures this
|
||||||
|
"depending on the situation" aspect of expressions with variables. It
|
||||||
|
provides us with
|
||||||
|
{{< sidenote "right" "context-note" "much-needed context." >}}
|
||||||
|
In fact, \(\Gamma\) is sometimes called the typing context.
|
||||||
|
{{< /sidenote >}} This version makes it clear that \\(x\\)
|
||||||
|
isn't _always_ of type \\(\\tau\\), but only in the specific situation
|
||||||
|
described by \\(\\Gamma\\). Using our first two-`int` environment,
|
||||||
|
we can make the following (true) claim:
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
\{ \texttt{x} : \text{int}, \texttt{y} : \text{int} \} \vdash \texttt{x}+\texttt{y} : \text{int}
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
Which, in English, can be read as "when `x` and `y` are both integers,
|
||||||
|
the expression `x+y` also results in an integer". The case for strings is similar:
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
\{ \texttt{x} : \text{string}, \texttt{y} : \text{string} \} \vdash \texttt{x}+\texttt{y} : \text{string}
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
This one, can be read "when `x` and `y` are both strings, the expression `x+y`
|
||||||
|
also results in a string".
|
||||||
|
|
||||||
|
Okay, so now we've seen a couple of examples, but these examples are _not_ rules!
|
||||||
|
They capture only specific situations (which we've "hard-coded" by specifying
|
||||||
|
what \\(\\Gamma\\) is). Here's what a general rule __should not look like__:
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
\{ x_1 : \text{string}, x_2 : \text{string} \} \vdash x_1+x_2 : \text{string}
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
{{< dialog >}}
|
||||||
|
{{< message "question" "reader" >}}
|
||||||
|
You pretty much just changed the font and
|
||||||
|
replaced \(x\) and \(y\) with \(x_1\) and \(x_2\).
|
||||||
|
{{< /message >}}
|
||||||
|
{{< message "answer" "Daniel" >}}
|
||||||
|
Yep! Now, the rule has metavariables which range over <em>any</em>
|
||||||
|
(object) variables, so the rule applies whenever any two variables
|
||||||
|
of type \(\text{string}\) are added. Remember, though, this rule
|
||||||
|
is not good. More on that below!
|
||||||
|
{{< /message >}}
|
||||||
|
{{< /dialog >}}
|
||||||
|
|
||||||
|
This rule is bad, and it should feel bad. Here are two reasons:
|
||||||
|
|
||||||
|
1. It only works for expressions like `x+y` or `a+b`, but not for
|
||||||
|
more complicated things like `(a+b)+(c+d)`. This is because
|
||||||
|
by using \\(x\_1\\) and \\(x\_2\\), the metavariables for
|
||||||
|
expressions, it rules out additions that _don't_ add expressions.
|
||||||
|
2. It doesn't play well with other rules; it can't be the _only_
|
||||||
|
rule for addition of integers, since it doesn't work for
|
||||||
|
integer literals (i.e., `1+1` is out).
|
||||||
|
|
||||||
|
The trouble is that this rule is trying to do too much; it's trying
|
||||||
|
to check the environment for variables, but it's _also_ trying to
|
||||||
|
specify the results of adding two integers. That's not how we
|
||||||
|
did it last time! In fact, when it came to numbers, we had two
|
||||||
|
rules. The first said that any number symbol had the \\(\\text{number}\\)
|
||||||
|
type. Previously, we wrote it as follows:
|
||||||
|
|
||||||
|
{{< todo >}}
|
||||||
|
Number vs int. Pick one, probably int.
|
||||||
|
{{< /todo >}}
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
n : \text{number}
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
Another rule specified the type of addition, without caring how the
|
||||||
|
sub-expressions \\(e\_1\\) and \\(e\_2\\) were given _their_ types.
|
||||||
|
As long as they had type \\(\\text{number}\\), all was well.
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
\frac{e_1 : \text{number} \quad e_2 : \text{number}}{e_1 + e_2 : \text{number}}
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
These rules are good, and we should keep them. Now, though, environments
|
||||||
|
are in play. Fortunately, the environment doesn't matter at all when it
|
||||||
|
comes to figuring out what the type of a symbol like `1` is -- it's always
|
||||||
|
a number! We can thus write the updated rule as follows. Leaving \\(\\Gamma\\)
|
||||||
|
unspecified means it can
|
||||||
|
stand for any environment.
|
||||||
|
{{< todo >}}
|
||||||
|
Probably just work in the fact that Gamma is another metavariable.
|
||||||
|
{{< /todo >}}
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
\Gamma \vdash n : \text{number}
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
We can also translate the addition rule in a pretty straightforward
|
||||||
|
manner, by tacking on \\(\\Gamma\\) for every typing claim.
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
\frac{\Gamma \vdash e_1 : \text{number} \quad \Gamma \vdash e_2 : \text{number}}{\Gamma \vdash e_1 + e_2 : \text{number}}
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
So we have a rule for number symbols like `1` or `2`, and we have
|
||||||
|
a rule for addition. All that's left is a rule for variables, like `x`
|
||||||
|
and `y`. This rule needs to make sure that a variable is defined,
|
||||||
|
and that it has a particular type. A variable is defined, and has a type,
|
||||||
|
if a pair \\(x : \\tau\\) is present in the environment \\(\\Gamma\\).
|
||||||
|
Thus, we can write the variable rule like this:
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
\frac{x : \tau \in \Gamma}{\Gamma \vdash x : \tau}
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
Note that we're using the \\(\\tau\\) metavariable to range over any type;
|
||||||
|
this means the rule applies to (object) variables declared to have type
|
||||||
|
\\(\\text{number}\\), \\(\\text{string}\\), or anything else present in
|
||||||
|
our system. A single rule takes care of figuring the types of _all_
|
||||||
|
variables.
|
||||||
|
|
||||||
|
{{< todo >}}
|
||||||
|
The rest of this, but mostly statements.
|
||||||
|
{{< /todo >}}
|
Loading…
Reference in New Issue
Block a user