Add a draft of the variables post in the types series
This commit is contained in:
parent
c21757694e
commit
9192b870b6
334
content/blog/02_types_variables.md
Normal file
334
content/blog/02_types_variables.md
Normal file
@ -0,0 +1,334 @@
|
||||
---
|
||||
title: "Everything I Know About Types: Variables"
|
||||
date: 2022-08-28T19:05:31-07:00
|
||||
tags: ["Type Systems", "Programming Languages"]
|
||||
draft: true
|
||||
---
|
||||
|
||||
In the [previous article]({{< relref "01_types_basics.md" >}}), we started
|
||||
looking at types. We've looked at the types of strings (`"hello"`),
|
||||
numbers (`1`, `3.14`), and binary operations (`2*3`, `"hello" + "world"`).
|
||||
However, in pretty much all of these examples, one important thing has been
|
||||
missing: variables. We haven't mentioned typechecking code like `x*y`,
|
||||
and the rules we defined in the previous section will not work for such
|
||||
code! In this post, we'll change that.
|
||||
|
||||
Let's take a look at examples of variables in some common programming
|
||||
languages. If you come from a programming background, it's almost certain
|
||||
that you're comfortable with variables. Nevertheless, it can't hurt to
|
||||
see how they appear in many languages (hopefully, with enough variety, at
|
||||
least a little bit of this will be novel to you).
|
||||
|
||||
We've already seen variables in TypeScript. They're declared using `let`
|
||||
or `const`:
|
||||
|
||||
```TypeScript
|
||||
const x: number = 1; // type added just to be explicit
|
||||
let y = x + 1; // x is known to have type number, and thus so is x+1
|
||||
```
|
||||
|
||||
In Kotlin, we have `var` and `val`, for mutable and immutable variables,
|
||||
respectively:
|
||||
```Kotlin
|
||||
val x: String = "hello"; // once again, the type is optional
|
||||
var y = x + 1; // x is known to have type string, and string+int is a string.
|
||||
```
|
||||
|
||||
In Haskell, there's no keyword for declaring variables at the top level. We
|
||||
could just write:
|
||||
```Haskell
|
||||
x :: String -- optional, just to be consistent with previous examples.
|
||||
x = "hello"
|
||||
y = x ++ show 1 -- No automatic conversion from numbers to strings
|
||||
```
|
||||
|
||||
Haskell also has `let` expressions for creating variables inside expressions:
|
||||
```Haskell
|
||||
(let x = 1 in x + x) + x -- outer x is not known
|
||||
```
|
||||
An important thing to note about Haskell `let`-expressions (demonstrated in
|
||||
the above code), is that the variable `x` only exists within their `in`
|
||||
portion. Subsequent code does not get access to `x`, and that last code snippet
|
||||
will result in a type error. This brings us to an important concept, the
|
||||
_scope_ of a variable. A variable's scope is the part of the code where
|
||||
it is available. In the above snippet,
|
||||
{{< sidenote "right" "recursive-let-note" "the part after" >}}
|
||||
Technically, the scope of the variable <code>x</code> starts right
|
||||
after the <code>=</code>; Haskell let-expressions can be recursive.
|
||||
Using only the concepts we've covered so far, we can only use this
|
||||
for silly examples, like <code>let x = x + 1 in x</code>. This piece of
|
||||
code will run infinitely, since it will never be able to finish
|
||||
adding enough <code>1</code>s to find the value of <code>x</code>.
|
||||
However, there are more useful examples of this feature, such
|
||||
as defining temporary recursive functions, or creating infinite
|
||||
lists. We might see these later.
|
||||
{{< /sidenote >}} the `in` and
|
||||
before the `)` is the scope of `x`; referring to it anywhere else is an error.
|
||||
|
||||
In C and C-like languages, the scope
|
||||
of a variable usually continues the end of the _block_ (typically
|
||||
denoted with curly braces) in which it is defined. For instance,
|
||||
here's Rust:
|
||||
```Rust
|
||||
{
|
||||
let x = 1;
|
||||
let y = x + 1;
|
||||
// The scope of x ends here
|
||||
}
|
||||
// Cannot use x here
|
||||
{
|
||||
let x = "hello";
|
||||
// The scope of this x ends here
|
||||
}
|
||||
// Once again, cannot use x
|
||||
```
|
||||
Scopes don't have to be denoted using `{` or `}` -- Haskell, for instance,
|
||||
just has the `in`. In the Crystal programming language, scopes are usually
|
||||
started by some control-flow statement (like `if` or `while`) and ended by
|
||||
`end`.
|
||||
```Crystal
|
||||
if someCondition
|
||||
x = 1
|
||||
# The scope of x ends here
|
||||
else
|
||||
# Cannot use x here
|
||||
end
|
||||
# Crystal is a bit clever, so you can reference
|
||||
# x here, but its type is (Int32 | Nil) to indicate
|
||||
# that it may not have been set.
|
||||
```
|
||||
|
||||
So much for this little tour of variables "in the wild". There are some key
|
||||
aspects of variables that we can gather from the preceding examples:
|
||||
|
||||
1. A variable with the same name can have a different type depending
|
||||
on its location in the code (see the Rust and Crystal examples).
|
||||
2. The type of a variable cannot be guessed from its usage. For
|
||||
instance, both the TypeScript and Kotlin examples have `x+1`,
|
||||
but this expression is of type `number` in the former and `String`
|
||||
in the latter.
|
||||
3. A variable is not always available in every part of the code;
|
||||
it has a scope (Haskell, Rust, Crystal examples).
|
||||
4. In imperative languages, a statement can introduce a variable
|
||||
that subsequent statements can reference (TypeScript, Kotlin,
|
||||
Rust, Crystal).
|
||||
|
||||
To get started with type rules for variables, let's introduce
|
||||
another metavariable, \\(x\\) (along with \\(n\\) from before).
|
||||
Whereas \\(n\\) ranges over any number in our language, \\(x\\) ranges
|
||||
over any variable. It can be used as a stand-in for `x`, `y`, `myVar`, and so on.
|
||||
|
||||
The first property prevents us from writing type rules like the
|
||||
following, since we cannot always assume that a variable has type
|
||||
\\(\\text{number}\\) or \\(\\text{string}\\).
|
||||
{{< latex >}}
|
||||
x : \text{number}
|
||||
{{< /latex >}}
|
||||
|
||||
The second property prohibits us from trying to guess types of
|
||||
compound expressions without caring too much about the variables;
|
||||
that is, the following rule is invalid.
|
||||
{{< latex >}}
|
||||
x + 1 : \text{number}
|
||||
{{< /latex >}}
|
||||
|
||||
I won't list rules that are prohibited by the third property, but
|
||||
I do want to note that it prevents us from just having some global
|
||||
list of variables and types "out there", and referring to that in
|
||||
our rules. If all types were globally known, it wouldn't be possible
|
||||
for a variable to take on different types in different places.
|
||||
|
||||
With these constraints in mind, we have enough to start creating
|
||||
rules for expressions (but not statements yet; we'll get to that).
|
||||
The solution to our problem is to add a third "thing" to our rules:
|
||||
the _environment_, typically denoted using the Greek uppercase gamma,
|
||||
\\(\\Gamma\\). The environment is basically a list of pairs associating
|
||||
variables with their types. For instance, if in some situation
|
||||
the variables `x` and `y` were both declared to be of type `int`, we'd
|
||||
write that as follows.
|
||||
|
||||
{{< latex >}}
|
||||
\Gamma = \{ \texttt{x} : \text{int}, \texttt{y} : \text{int} \}
|
||||
{{< /latex >}}
|
||||
|
||||
If, on the other hand, they both had type `string`, our equation would
|
||||
look as follows.
|
||||
|
||||
{{< latex >}}
|
||||
\Gamma = \{ \texttt{x} : \text{string}, \texttt{y} : \text{string} \}
|
||||
{{< /latex >}}
|
||||
|
||||
{{< dialog >}}
|
||||
{{< message "question" "reader" >}}
|
||||
Hey, there's something odd with your typography; how come
|
||||
it says \(\texttt{x}\)? Earlier, you wrote \(x\) for variables.
|
||||
{{< /message >}}
|
||||
{{< message "answer" "Daniel" >}}
|
||||
Good catch, and that is no accident. As I mentioned earlier,
|
||||
\(x\) is a <em>metavariable</em>; it's part of our meta language,
|
||||
and it can stand for many different actual variables in the object
|
||||
language.<br>
|
||||
<br>
|
||||
On the other hand, \(\texttt{x}\) stands for a specific variable,
|
||||
<code>x</code> (it doesn't <em>literally</em> look like <code>x</code>
|
||||
because it's in a mathematical equation, and I can't style it using
|
||||
CSS).<br>
|
||||
<br>
|
||||
In general, \(\textit{italics}\) font is used for metavariables,
|
||||
like \(n\) and \(x\), which can stand for multiple expressions
|
||||
in the object language; on the other hand, \(\texttt{typewriter}\)
|
||||
font will be used for actual, concrete variables, such
|
||||
as \(\texttt{myVariable}\). In practice, the difference is
|
||||
usually clear from context, since concrete variables usually only show up in
|
||||
examples, whereas metavariables only appear in rules. So don't
|
||||
worry too much about spotting the differences in font.
|
||||
{{< /message >}}
|
||||
{{< /dialog >}}
|
||||
|
||||
Okay, so now we know how to write down environments. How does this
|
||||
help us achieve our goal of writing type rules for expressions with
|
||||
variables? Well, the benefits become apparent when we incorporate
|
||||
environments into our typing rules. So far, we've been using the following
|
||||
notation:
|
||||
|
||||
{{< latex >}}
|
||||
e : \tau
|
||||
{{< /latex >}}
|
||||
|
||||
This reads,
|
||||
|
||||
> The expression \\(e\\) [another metavariable, this one is used for
|
||||
all expressions] has type \\(\\tau\\) [also a metavariable, for
|
||||
types].
|
||||
|
||||
However, as we've seen, we can't make global claims like this when variables are
|
||||
involved, since the same expression may have different types depending
|
||||
on the situation. Now, we instead write:
|
||||
|
||||
{{< latex >}}
|
||||
\Gamma \vdash e : \tau
|
||||
{{< /latex >}}
|
||||
|
||||
This version reads,
|
||||
|
||||
> In the environment \\(\\Gamma\\), the expression \\(e\\) has type \\(\\tau\\).
|
||||
|
||||
And here's the difference. The new \\(\\Gamma\\) of ours captures this
|
||||
"depending on the situation" aspect of expressions with variables. It
|
||||
provides us with
|
||||
{{< sidenote "right" "context-note" "much-needed context." >}}
|
||||
In fact, \(\Gamma\) is sometimes called the typing context.
|
||||
{{< /sidenote >}} This version makes it clear that \\(x\\)
|
||||
isn't _always_ of type \\(\\tau\\), but only in the specific situation
|
||||
described by \\(\\Gamma\\). Using our first two-`int` environment,
|
||||
we can make the following (true) claim:
|
||||
|
||||
{{< latex >}}
|
||||
\{ \texttt{x} : \text{int}, \texttt{y} : \text{int} \} \vdash \texttt{x}+\texttt{y} : \text{int}
|
||||
{{< /latex >}}
|
||||
|
||||
Which, in English, can be read as "when `x` and `y` are both integers,
|
||||
the expression `x+y` also results in an integer". The case for strings is similar:
|
||||
|
||||
{{< latex >}}
|
||||
\{ \texttt{x} : \text{string}, \texttt{y} : \text{string} \} \vdash \texttt{x}+\texttt{y} : \text{string}
|
||||
{{< /latex >}}
|
||||
|
||||
This one, can be read "when `x` and `y` are both strings, the expression `x+y`
|
||||
also results in a string".
|
||||
|
||||
Okay, so now we've seen a couple of examples, but these examples are _not_ rules!
|
||||
They capture only specific situations (which we've "hard-coded" by specifying
|
||||
what \\(\\Gamma\\) is). Here's what a general rule __should not look like__:
|
||||
|
||||
{{< latex >}}
|
||||
\{ x_1 : \text{string}, x_2 : \text{string} \} \vdash x_1+x_2 : \text{string}
|
||||
{{< /latex >}}
|
||||
|
||||
{{< dialog >}}
|
||||
{{< message "question" "reader" >}}
|
||||
You pretty much just changed the font and
|
||||
replaced \(x\) and \(y\) with \(x_1\) and \(x_2\).
|
||||
{{< /message >}}
|
||||
{{< message "answer" "Daniel" >}}
|
||||
Yep! Now, the rule has metavariables which range over <em>any</em>
|
||||
(object) variables, so the rule applies whenever any two variables
|
||||
of type \(\text{string}\) are added. Remember, though, this rule
|
||||
is not good. More on that below!
|
||||
{{< /message >}}
|
||||
{{< /dialog >}}
|
||||
|
||||
This rule is bad, and it should feel bad. Here are two reasons:
|
||||
|
||||
1. It only works for expressions like `x+y` or `a+b`, but not for
|
||||
more complicated things like `(a+b)+(c+d)`. This is because
|
||||
by using \\(x\_1\\) and \\(x\_2\\), the metavariables for
|
||||
expressions, it rules out additions that _don't_ add expressions.
|
||||
2. It doesn't play well with other rules; it can't be the _only_
|
||||
rule for addition of integers, since it doesn't work for
|
||||
integer literals (i.e., `1+1` is out).
|
||||
|
||||
The trouble is that this rule is trying to do too much; it's trying
|
||||
to check the environment for variables, but it's _also_ trying to
|
||||
specify the results of adding two integers. That's not how we
|
||||
did it last time! In fact, when it came to numbers, we had two
|
||||
rules. The first said that any number symbol had the \\(\\text{number}\\)
|
||||
type. Previously, we wrote it as follows:
|
||||
|
||||
{{< todo >}}
|
||||
Number vs int. Pick one, probably int.
|
||||
{{< /todo >}}
|
||||
|
||||
{{< latex >}}
|
||||
n : \text{number}
|
||||
{{< /latex >}}
|
||||
|
||||
Another rule specified the type of addition, without caring how the
|
||||
sub-expressions \\(e\_1\\) and \\(e\_2\\) were given _their_ types.
|
||||
As long as they had type \\(\\text{number}\\), all was well.
|
||||
|
||||
{{< latex >}}
|
||||
\frac{e_1 : \text{number} \quad e_2 : \text{number}}{e_1 + e_2 : \text{number}}
|
||||
{{< /latex >}}
|
||||
|
||||
These rules are good, and we should keep them. Now, though, environments
|
||||
are in play. Fortunately, the environment doesn't matter at all when it
|
||||
comes to figuring out what the type of a symbol like `1` is -- it's always
|
||||
a number! We can thus write the updated rule as follows. Leaving \\(\\Gamma\\)
|
||||
unspecified means it can
|
||||
stand for any environment.
|
||||
{{< todo >}}
|
||||
Probably just work in the fact that Gamma is another metavariable.
|
||||
{{< /todo >}}
|
||||
|
||||
{{< latex >}}
|
||||
\Gamma \vdash n : \text{number}
|
||||
{{< /latex >}}
|
||||
|
||||
We can also translate the addition rule in a pretty straightforward
|
||||
manner, by tacking on \\(\\Gamma\\) for every typing claim.
|
||||
|
||||
{{< latex >}}
|
||||
\frac{\Gamma \vdash e_1 : \text{number} \quad \Gamma \vdash e_2 : \text{number}}{\Gamma \vdash e_1 + e_2 : \text{number}}
|
||||
{{< /latex >}}
|
||||
|
||||
So we have a rule for number symbols like `1` or `2`, and we have
|
||||
a rule for addition. All that's left is a rule for variables, like `x`
|
||||
and `y`. This rule needs to make sure that a variable is defined,
|
||||
and that it has a particular type. A variable is defined, and has a type,
|
||||
if a pair \\(x : \\tau\\) is present in the environment \\(\\Gamma\\).
|
||||
Thus, we can write the variable rule like this:
|
||||
|
||||
{{< latex >}}
|
||||
\frac{x : \tau \in \Gamma}{\Gamma \vdash x : \tau}
|
||||
{{< /latex >}}
|
||||
|
||||
Note that we're using the \\(\\tau\\) metavariable to range over any type;
|
||||
this means the rule applies to (object) variables declared to have type
|
||||
\\(\\text{number}\\), \\(\\text{string}\\), or anything else present in
|
||||
our system. A single rule takes care of figuring the types of _all_
|
||||
variables.
|
||||
|
||||
{{< todo >}}
|
||||
The rest of this, but mostly statements.
|
||||
{{< /todo >}}
|
Loading…
Reference in New Issue
Block a user