Add a draft of the variables post in the types series

This commit is contained in:
Danila Fedorin 2022-08-29 21:45:53 -07:00
parent c21757694e
commit 9192b870b6
1 changed files with 334 additions and 0 deletions

View File

@ -0,0 +1,334 @@
---
title: "Everything I Know About Types: Variables"
date: 2022-08-28T19:05:31-07:00
tags: ["Type Systems", "Programming Languages"]
draft: true
---
In the [previous article]({{< relref "01_types_basics.md" >}}), we started
looking at types. We've looked at the types of strings (`"hello"`),
numbers (`1`, `3.14`), and binary operations (`2*3`, `"hello" + "world"`).
However, in pretty much all of these examples, one important thing has been
missing: variables. We haven't mentioned typechecking code like `x*y`,
and the rules we defined in the previous section will not work for such
code! In this post, we'll change that.
Let's take a look at examples of variables in some common programming
languages. If you come from a programming background, it's almost certain
that you're comfortable with variables. Nevertheless, it can't hurt to
see how they appear in many languages (hopefully, with enough variety, at
least a little bit of this will be novel to you).
We've already seen variables in TypeScript. They're declared using `let`
or `const`:
```TypeScript
const x: number = 1; // type added just to be explicit
let y = x + 1; // x is known to have type number, and thus so is x+1
```
In Kotlin, we have `var` and `val`, for mutable and immutable variables,
respectively:
```Kotlin
val x: String = "hello"; // once again, the type is optional
var y = x + 1; // x is known to have type string, and string+int is a string.
```
In Haskell, there's no keyword for declaring variables at the top level. We
could just write:
```Haskell
x :: String -- optional, just to be consistent with previous examples.
x = "hello"
y = x ++ show 1 -- No automatic conversion from numbers to strings
```
Haskell also has `let` expressions for creating variables inside expressions:
```Haskell
(let x = 1 in x + x) + x -- outer x is not known
```
An important thing to note about Haskell `let`-expressions (demonstrated in
the above code), is that the variable `x` only exists within their `in`
portion. Subsequent code does not get access to `x`, and that last code snippet
will result in a type error. This brings us to an important concept, the
_scope_ of a variable. A variable's scope is the part of the code where
it is available. In the above snippet,
{{< sidenote "right" "recursive-let-note" "the part after" >}}
Technically, the scope of the variable <code>x</code> starts right
after the <code>=</code>; Haskell let-expressions can be recursive.
Using only the concepts we've covered so far, we can only use this
for silly examples, like <code>let x = x + 1 in x</code>. This piece of
code will run infinitely, since it will never be able to finish
adding enough <code>1</code>s to find the value of <code>x</code>.
However, there are more useful examples of this feature, such
as defining temporary recursive functions, or creating infinite
lists. We might see these later.
{{< /sidenote >}} the `in` and
before the `)` is the scope of `x`; referring to it anywhere else is an error.
In C and C-like languages, the scope
of a variable usually continues the end of the _block_ (typically
denoted with curly braces) in which it is defined. For instance,
here's Rust:
```Rust
{
let x = 1;
let y = x + 1;
// The scope of x ends here
}
// Cannot use x here
{
let x = "hello";
// The scope of this x ends here
}
// Once again, cannot use x
```
Scopes don't have to be denoted using `{` or `}` -- Haskell, for instance,
just has the `in`. In the Crystal programming language, scopes are usually
started by some control-flow statement (like `if` or `while`) and ended by
`end`.
```Crystal
if someCondition
x = 1
# The scope of x ends here
else
# Cannot use x here
end
# Crystal is a bit clever, so you can reference
# x here, but its type is (Int32 | Nil) to indicate
# that it may not have been set.
```
So much for this little tour of variables "in the wild". There are some key
aspects of variables that we can gather from the preceding examples:
1. A variable with the same name can have a different type depending
on its location in the code (see the Rust and Crystal examples).
2. The type of a variable cannot be guessed from its usage. For
instance, both the TypeScript and Kotlin examples have `x+1`,
but this expression is of type `number` in the former and `String`
in the latter.
3. A variable is not always available in every part of the code;
it has a scope (Haskell, Rust, Crystal examples).
4. In imperative languages, a statement can introduce a variable
that subsequent statements can reference (TypeScript, Kotlin,
Rust, Crystal).
To get started with type rules for variables, let's introduce
another metavariable, \\(x\\) (along with \\(n\\) from before).
Whereas \\(n\\) ranges over any number in our language, \\(x\\) ranges
over any variable. It can be used as a stand-in for `x`, `y`, `myVar`, and so on.
The first property prevents us from writing type rules like the
following, since we cannot always assume that a variable has type
\\(\\text{number}\\) or \\(\\text{string}\\).
{{< latex >}}
x : \text{number}
{{< /latex >}}
The second property prohibits us from trying to guess types of
compound expressions without caring too much about the variables;
that is, the following rule is invalid.
{{< latex >}}
x + 1 : \text{number}
{{< /latex >}}
I won't list rules that are prohibited by the third property, but
I do want to note that it prevents us from just having some global
list of variables and types "out there", and referring to that in
our rules. If all types were globally known, it wouldn't be possible
for a variable to take on different types in different places.
With these constraints in mind, we have enough to start creating
rules for expressions (but not statements yet; we'll get to that).
The solution to our problem is to add a third "thing" to our rules:
the _environment_, typically denoted using the Greek uppercase gamma,
\\(\\Gamma\\). The environment is basically a list of pairs associating
variables with their types. For instance, if in some situation
the variables `x` and `y` were both declared to be of type `int`, we'd
write that as follows.
{{< latex >}}
\Gamma = \{ \texttt{x} : \text{int}, \texttt{y} : \text{int} \}
{{< /latex >}}
If, on the other hand, they both had type `string`, our equation would
look as follows.
{{< latex >}}
\Gamma = \{ \texttt{x} : \text{string}, \texttt{y} : \text{string} \}
{{< /latex >}}
{{< dialog >}}
{{< message "question" "reader" >}}
Hey, there's something odd with your typography; how come
it says \(\texttt{x}\)? Earlier, you wrote \(x\) for variables.
{{< /message >}}
{{< message "answer" "Daniel" >}}
Good catch, and that is no accident. As I mentioned earlier,
\(x\) is a <em>metavariable</em>; it's part of our meta language,
and it can stand for many different actual variables in the object
language.<br>
<br>
On the other hand, \(\texttt{x}\) stands for a specific variable,
<code>x</code> (it doesn't <em>literally</em> look like <code>x</code>
because it's in a mathematical equation, and I can't style it using
CSS).<br>
<br>
In general, \(\textit{italics}\) font is used for metavariables,
like \(n\) and \(x\), which can stand for multiple expressions
in the object language; on the other hand, \(\texttt{typewriter}\)
font will be used for actual, concrete variables, such
as \(\texttt{myVariable}\). In practice, the difference is
usually clear from context, since concrete variables usually only show up in
examples, whereas metavariables only appear in rules. So don't
worry too much about spotting the differences in font.
{{< /message >}}
{{< /dialog >}}
Okay, so now we know how to write down environments. How does this
help us achieve our goal of writing type rules for expressions with
variables? Well, the benefits become apparent when we incorporate
environments into our typing rules. So far, we've been using the following
notation:
{{< latex >}}
e : \tau
{{< /latex >}}
This reads,
> The expression \\(e\\) [another metavariable, this one is used for
all expressions] has type \\(\\tau\\) [also a metavariable, for
types].
However, as we've seen, we can't make global claims like this when variables are
involved, since the same expression may have different types depending
on the situation. Now, we instead write:
{{< latex >}}
\Gamma \vdash e : \tau
{{< /latex >}}
This version reads,
> In the environment \\(\\Gamma\\), the expression \\(e\\) has type \\(\\tau\\).
And here's the difference. The new \\(\\Gamma\\) of ours captures this
"depending on the situation" aspect of expressions with variables. It
provides us with
{{< sidenote "right" "context-note" "much-needed context." >}}
In fact, \(\Gamma\) is sometimes called the typing context.
{{< /sidenote >}} This version makes it clear that \\(x\\)
isn't _always_ of type \\(\\tau\\), but only in the specific situation
described by \\(\\Gamma\\). Using our first two-`int` environment,
we can make the following (true) claim:
{{< latex >}}
\{ \texttt{x} : \text{int}, \texttt{y} : \text{int} \} \vdash \texttt{x}+\texttt{y} : \text{int}
{{< /latex >}}
Which, in English, can be read as "when `x` and `y` are both integers,
the expression `x+y` also results in an integer". The case for strings is similar:
{{< latex >}}
\{ \texttt{x} : \text{string}, \texttt{y} : \text{string} \} \vdash \texttt{x}+\texttt{y} : \text{string}
{{< /latex >}}
This one, can be read "when `x` and `y` are both strings, the expression `x+y`
also results in a string".
Okay, so now we've seen a couple of examples, but these examples are _not_ rules!
They capture only specific situations (which we've "hard-coded" by specifying
what \\(\\Gamma\\) is). Here's what a general rule __should not look like__:
{{< latex >}}
\{ x_1 : \text{string}, x_2 : \text{string} \} \vdash x_1+x_2 : \text{string}
{{< /latex >}}
{{< dialog >}}
{{< message "question" "reader" >}}
You pretty much just changed the font and
replaced \(x\) and \(y\) with \(x_1\) and \(x_2\).
{{< /message >}}
{{< message "answer" "Daniel" >}}
Yep! Now, the rule has metavariables which range over <em>any</em>
(object) variables, so the rule applies whenever any two variables
of type \(\text{string}\) are added. Remember, though, this rule
is not good. More on that below!
{{< /message >}}
{{< /dialog >}}
This rule is bad, and it should feel bad. Here are two reasons:
1. It only works for expressions like `x+y` or `a+b`, but not for
more complicated things like `(a+b)+(c+d)`. This is because
by using \\(x\_1\\) and \\(x\_2\\), the metavariables for
expressions, it rules out additions that _don't_ add expressions.
2. It doesn't play well with other rules; it can't be the _only_
rule for addition of integers, since it doesn't work for
integer literals (i.e., `1+1` is out).
The trouble is that this rule is trying to do too much; it's trying
to check the environment for variables, but it's _also_ trying to
specify the results of adding two integers. That's not how we
did it last time! In fact, when it came to numbers, we had two
rules. The first said that any number symbol had the \\(\\text{number}\\)
type. Previously, we wrote it as follows:
{{< todo >}}
Number vs int. Pick one, probably int.
{{< /todo >}}
{{< latex >}}
n : \text{number}
{{< /latex >}}
Another rule specified the type of addition, without caring how the
sub-expressions \\(e\_1\\) and \\(e\_2\\) were given _their_ types.
As long as they had type \\(\\text{number}\\), all was well.
{{< latex >}}
\frac{e_1 : \text{number} \quad e_2 : \text{number}}{e_1 + e_2 : \text{number}}
{{< /latex >}}
These rules are good, and we should keep them. Now, though, environments
are in play. Fortunately, the environment doesn't matter at all when it
comes to figuring out what the type of a symbol like `1` is -- it's always
a number! We can thus write the updated rule as follows. Leaving \\(\\Gamma\\)
unspecified means it can
stand for any environment.
{{< todo >}}
Probably just work in the fact that Gamma is another metavariable.
{{< /todo >}}
{{< latex >}}
\Gamma \vdash n : \text{number}
{{< /latex >}}
We can also translate the addition rule in a pretty straightforward
manner, by tacking on \\(\\Gamma\\) for every typing claim.
{{< latex >}}
\frac{\Gamma \vdash e_1 : \text{number} \quad \Gamma \vdash e_2 : \text{number}}{\Gamma \vdash e_1 + e_2 : \text{number}}
{{< /latex >}}
So we have a rule for number symbols like `1` or `2`, and we have
a rule for addition. All that's left is a rule for variables, like `x`
and `y`. This rule needs to make sure that a variable is defined,
and that it has a particular type. A variable is defined, and has a type,
if a pair \\(x : \\tau\\) is present in the environment \\(\\Gamma\\).
Thus, we can write the variable rule like this:
{{< latex >}}
\frac{x : \tau \in \Gamma}{\Gamma \vdash x : \tau}
{{< /latex >}}
Note that we're using the \\(\\tau\\) metavariable to range over any type;
this means the rule applies to (object) variables declared to have type
\\(\\text{number}\\), \\(\\text{string}\\), or anything else present in
our system. A single rule takes care of figuring the types of _all_
variables.
{{< todo >}}
The rest of this, but mostly statements.
{{< /todo >}}