blog-static/content/blog/02_types_variables.md

15 KiB

title date tags series draft
Everything I Know About Types: Variables 2022-08-28T19:05:31-07:00
Type Systems
Programming Languages
Everything I Know About Types true

In the [previous article]({{< relref "01_types_basics.md" >}}), we started looking at types. We've looked at the types of strings ("hello"), numbers (1, 3.14), and binary operations (2*3, "hello" + "world"). However, in pretty much all of these examples, one important thing has been missing: variables. We haven't mentioned typechecking code like x*y, and the rules we defined in the previous section will not work for such code! In this post, we'll change that.

Let's take a look at examples of variables in some common programming languages. If you come from a programming background, it's almost certain that you're comfortable with variables. Nevertheless, it can't hurt to see how they appear in many languages (hopefully, with enough variety, at least a little bit of this will be novel to you).

We've already seen variables in TypeScript. They're declared using let or const:

const x: number = 1; // type added just to be explicit
let y = x + 1; // x is known to have type number, and thus so is x+1

In Kotlin, we have var and val, for mutable and immutable variables, respectively:

val x: String = "hello"; // once again, the type is optional
var y = x + 1; // x is known to have type string, and string+int is a string.

In Haskell, there's no keyword for declaring variables at the top level. We could just write:

x :: String -- optional, just to be consistent with previous examples.
x = "hello"
y = x ++ show 1 -- No automatic conversion from numbers to strings

Haskell also has let expressions for creating variables inside expressions:

(let x = 1 in x + x) + x -- outer x is not known

An important thing to note about Haskell let-expressions (demonstrated in the above code), is that the variable x only exists within their in portion. Subsequent code does not get access to x, and that last code snippet will result in a type error. This brings us to an important concept, the scope of a variable. A variable's scope is the part of the code where it is available. In the above snippet, {{< sidenote "right" "recursive-let-note" "the part after" >}} Technically, the scope of the variable x starts right after the =; Haskell let-expressions can be recursive. Using only the concepts we've covered so far, we can only use this for silly examples, like let x = x + 1 in x. This piece of code will run infinitely, since it will never be able to finish adding enough 1s to find the value of x. However, there are more useful examples of this feature, such as defining temporary recursive functions, or creating infinite lists. We might see these later. {{< /sidenote >}} the in and before the ) is the scope of x; referring to it anywhere else is an error.

In C and C-like languages, the scope of a variable usually continues the end of the block (typically denoted with curly braces) in which it is defined. For instance, here's Rust:

{
    let x = 1;
    let y = x + 1;
    // The scope of x ends here
}
// Cannot use x here
{
    let x = "hello";
    // The scope of this x ends here
}
// Once again, cannot use x

Scopes don't have to be denoted using { or } -- Haskell, for instance, just has the in. In the Crystal programming language, scopes are usually started by some control-flow statement (like if or while) and ended by end.

if someCondition
    x = 1
    # The scope of x ends here
else
    # Cannot use x here
end
# Crystal is a bit clever, so you can reference
# x here, but its type is (Int32 | Nil) to indicate
# that it may not have been set.

So much for this little tour of variables "in the wild". There are some key aspects of variables that we can gather from the preceding examples:

  1. A variable with the same name can have a different type depending on its location in the code (see the Rust and Crystal examples).
  2. The type of a variable cannot be guessed from its usage. For instance, both the TypeScript and Kotlin examples have x+1, but this expression is of type number in the former and String in the latter.
  3. A variable is not always available in every part of the code; it has a scope (Haskell, Rust, Crystal examples).
  4. In imperative languages, a statement can introduce a variable that subsequent statements can reference (TypeScript, Kotlin, Rust, Crystal).

To get started with type rules for variables, let's introduce another metavariable, \(x\) (along with \(n\) from before). Whereas \(n\) ranges over any number in our language, \(x\) ranges over any variable. It can be used as a stand-in for x, y, myVar, and so on.

Now, let's start by looking at versions of formal rules that are incorrect. The first property listed above prevents us from writing type rules like the following, since we cannot always assume that a variable has type \(\text{number}\) or \(\text{string}\) (it might have either, depending on where in the code it's used!). {{< latex >}} x : \text{number} {{< /latex >}}

The second property prohibits us from trying to guess types of compound expressions without caring too much about the variables; that is, the following rule is invalid. {{< latex >}} x + 1 : \text{number} {{< /latex >}}

I won't list rules that are prohibited by the third property, but I do want to note that it prevents us from just having some global list of variables and types "out there", and referring to that in our rules. If all types were globally known, it wouldn't be possible for a variable to take on different types in different places.

With these constraints in mind, we have enough to start creating rules for expressions (but not statements yet; we'll get to that). The way to work around the four constraints is to add a third "thing" to our rules: the environment, typically denoted using the Greek uppercase gamma, \(\Gamma\). Much like we avoided writing similar rules for every possible number by using \(n\) as a metavariable for any number, we will use \(\Gamma\) as a metavariable to stand in for any environment. What is an environment, though? It's basically a list of pairs associating variables with their types. For instance, if in some situation the variables x and y were both declared to be of type number, we'd write that as follows.

{{< latex >}} \Gamma = { \texttt{x} : \text{number}, \texttt{y} : \text{number} } {{< /latex >}}

If, on the other hand, they both had type string, our equation would look as follows.

{{< latex >}} \Gamma = { \texttt{x} : \text{string}, \texttt{y} : \text{string} } {{< /latex >}}

{{< dialog >}} {{< message "question" "reader" >}} Hey, there's something odd with your typography; how come it says \texttt{x}? Earlier, you wrote x for variables. {{< /message >}} {{< message "answer" "Daniel" >}} Good catch, and that is no accident. As I mentioned earlier, x is a metavariable; it's part of our meta language, and it can stand for many different actual variables in the object language.

On the other hand, \texttt{x} stands for a specific variable, x (it doesn't literally look like x because it's in a mathematical equation, and I can't style it using CSS).

In general, \textit{italics} font is used for metavariables, like n and x, which can stand for multiple expressions in the object language; on the other hand, \texttt{typewriter} font will be used for actual, concrete variables, such as \texttt{myVariable}. In practice, the difference is usually clear from context, since concrete variables usually only show up in examples, whereas metavariables only appear in rules. So don't worry too much about spotting the differences in font. {{< /message >}} {{< /dialog >}}

Okay, so now we know how to write down environments. How does this help us achieve our goal of writing type rules for expressions with variables? Well, the benefits become apparent when we incorporate environments into our typing rules. So far, we've been using the following notation:

{{< latex >}} e : \tau {{< /latex >}}

This reads,

The expression \(e\) [another metavariable, this one is used for all expressions] has type \(\tau\) [also a metavariable, for types].

However, as we've seen, we can't make global claims like this when variables are involved, since the same expression may have different types depending on the situation. Now, we instead write:

{{< latex >}} \Gamma \vdash e : \tau {{< /latex >}}

This version reads,

In the environment \(\Gamma\), the expression \(e\) has type \(\tau\).

And here's the difference. The new \(\Gamma\) of ours captures this "depending on the situation" aspect of expressions with variables. It provides us with {{< sidenote "right" "context-note" "much-needed context." >}} In fact, \Gamma is sometimes called the typing context. {{< /sidenote >}} This version makes it clear that \(e\) isn't always of type \(\tau\), but only in the specific situation described by \(\Gamma\). Using our first two-number environment, we can make the following (true) claim:

{{< latex >}} { \texttt{x} : \text{number}, \texttt{y} : \text{number} } \vdash \texttt{x}+\texttt{y} : \text{number} {{< /latex >}}

Which, in English, can be read as "when x and y are both numbers, the expression x+y also results in a number". The case for strings is similar:

{{< latex >}} { \texttt{x} : \text{string}, \texttt{y} : \text{string} } \vdash \texttt{x}+\texttt{y} : \text{string} {{< /latex >}}

This one, can be read "when x and y are both strings, the expression x+y also results in a string".

Okay, so now we've seen a couple of examples, but these examples are not rules! They capture only specific situations (which we've "hard-coded" by specifying what \(\Gamma\) is). Here's what a general rule should not look like:

{{< latex >}} { x_1 : \text{string}, x_2 : \text{string} } \vdash x_1+x_2 : \text{string} {{< /latex >}}

{{< dialog >}} {{< message "question" "reader" >}} You pretty much just changed the font and replaced x and y with x_1 and x_2. {{< /message >}} {{< message "answer" "Daniel" >}} Yep! Now, the rule has metavariables which range over any (object) variables, so the rule applies whenever any two variables of type \text{string} are added. Remember, though, this rule is not good. More on that below! {{< /message >}} {{< /dialog >}}

This rule is bad, and it should feel bad. Here are two reasons:

  1. It only works for expressions like x+y or a+b, but not for more complicated things like (a+b)+(c+d). This is because by using \(x_1\) and \(x_2\), the metavariables for variables, it rules out additions that don't add variables (like the middle + in the example).
  2. It doesn't play well with other rules; it can't be the only rule for addition of numbers, since it doesn't work for number literals (i.e., 1+1 is out).

The trouble is that this rule is trying to do too much; it's trying to check the environment for variables, but it's also trying to specify the results of adding two numbers. That's not how we did it last time! In fact, when it came to numbers, we had two rules. The first said that any number symbol had the \(\text{number}\) type. Previously, we wrote it as follows:

{{< latex >}} n : \text{number} {{< /latex >}}

Another rule specified the type of addition, without caring how the sub-expressions \(e_1\) and \(e_2\) were given their types. As long as they had type \(\text{number}\), all was well.

{{< latex >}} \frac{e_1 : \text{number} \quad e_2 : \text{number}}{e_1 + e_2 : \text{number}} {{< /latex >}}

So, instead of having a rule for "adding two number symbols", we had a rule for "adding" and a rule for "number symbols". That approach worked well because the rule for "adding" could be used to figure out the types of compount addition expressions, like (1+1)+(2+2), which are not "additions of number symbols". Taking inspiration from this past success, we want to similarly separate "adding two variables" into "variables" and "adding". We already have the latter, though, so all that's left is the former.

Before we get to that, though, we need to update the two rules we just saw above. These rules are good, and we should keep them. Now, though, environments are in play. Fortunately, the environment doesn't matter at all when it comes to figuring out what the type of a symbol like 1 is -- it's always a number! We can thus write the updated rule as follows. Leaving \(\Gamma\) unspecified means it can stand for any environment.

{{< latex >}} \Gamma \vdash n : \text{number} {{< /latex >}}

We can also translate the addition rule in a pretty straightforward manner, by tacking on \(\Gamma\) for every typing claim.

{{< latex >}} \frac{\Gamma \vdash e_1 : \text{number} \quad \Gamma \vdash e_2 : \text{number}}{\Gamma \vdash e_1 + e_2 : \text{number}} {{< /latex >}}

So we have a rule for number symbols like 1 or 2, and we have a rule for addition. All that's left is a rule for variables, like x and y. This rule needs to make sure that a variable is defined, and that it has a particular type. A variable is defined, and has a type, if a pair \(x : \tau\) is present in the environment \(\Gamma\). Thus, we can write the variable rule like this:

{{< latex >}} \frac{x : \tau \in \Gamma}{\Gamma \vdash x : \tau} {{< /latex >}}

Note that we're using the \(\tau\) metavariable to range over any type; this means the rule applies to (object) variables declared to have type \(\text{number}\), \(\text{string}\), or anything else present in our system. A single rule takes care of figuring the types of all variables.

{{< todo >}} The rest of this, but mostly statements. {{< /todo >}}

This Page at a Glance

Metavariables

Symbol Meaning
\(n\) Numbers
\(s\) Strings
\(e\) Expressions
\(x\) Variables
\(\tau\) Types
\(\Gamma\) Typing environments

Grammar

{{< block >}} {{< latex >}} \begin{array}{rcll} e & ::= & n & \text{(numbers)} \ & | & s & \text{(strings)} \ & | & x & \text{(variables)} \ & | & e+e & \text{(addition)} \end{array} {{< /latex >}} {{< /block >}}

Rules

{{< foldtable >}}

Rule Description
{{< latex >}}\Gamma \vdash n : \text{number} {{< /latex >}} Number literals have type \(\text{number}\)
{{< latex >}}\Gamma \vdash s : \text{string} {{< /latex >}} String literals have type \(\text{string}\)
{{< latex >}}\frac{x:\tau \in \Gamma}{\Gamma\vdash x : \tau}{{< /latex >}} Variables have whatever type is given to them by the environment
{{< latex >}}\frac{\Gamma \vdash e_1 : \text{string}\quad \Gamma \vdash e_2 : \text{string}}{\Gamma \vdash e_1+e_2 : \text{string}} {{< /latex >}} Adding strings gives a string
{{< latex >}}\frac{\Gamma \vdash e_1 : \text{number}\quad \Gamma \vdash e_2 : \text{number}}{\Gamma \vdash e_1+e_2 : \text{number}} {{< /latex >}} Adding numbers gives a number