Add a draft of the variables post in the types series

2022-08-29 21:45:53 -07:00
parent c21757694e
commit 9192b870b6
1 changed files with 334 additions and 0 deletions
--- a/content/blog/02_types_variables.md
+++ b/content/blog/02_types_variables.md
@@ -0,0 +1,334 @@
+---
+title: "Everything I Know About Types: Variables"
+date: 2022-08-28T19:05:31-07:00
+tags: ["Type Systems", "Programming Languages"]
+draft: true
+---
+
+In the [previous article]({{< relref "01_types_basics.md" >}}), we started
+looking at types. We've looked at the types of strings (`"hello"`),
+numbers (`1`, `3.14`), and binary operations (`2*3`, `"hello" + "world"`).
+However, in pretty much all of these examples, one important thing has been
+missing: variables. We haven't mentioned typechecking code like `x*y`,
+and the rules we defined in the previous section will not work for such
+code! In this post, we'll change that.
+
+Let's take a look at examples of variables in some common programming
+languages. If you come from a programming background, it's almost certain
+that you're comfortable with variables. Nevertheless, it can't hurt to
+see how they appear in many languages (hopefully, with enough variety, at
+least a little bit of this will be novel to you).
+
+We've already seen variables in TypeScript. They're declared using `let`
+or `const`:
+
+```TypeScript
+const x: number = 1; // type added just to be explicit
+let y = x + 1; // x is known to have type number, and thus so is x+1
+```
+
+In Kotlin, we have `var` and `val`, for mutable and immutable variables,
+respectively:
+```Kotlin
+val x: String = "hello"; // once again, the type is optional
+var y = x + 1; // x is known to have type string, and string+int is a string.
+```
+
+In Haskell, there's no keyword for declaring variables at the top level. We
+could just write:
+```Haskell
+x :: String -- optional, just to be consistent with previous examples.
+x = "hello"
+y = x ++ show 1 -- No automatic conversion from numbers to strings
+```
+
+Haskell also has `let` expressions for creating variables inside expressions:
+```Haskell
+(let x = 1 in x + x) + x -- outer x is not known
+```
+An important thing to note about Haskell `let`-expressions (demonstrated in
+the above code), is that the variable `x` only exists within their `in`
+portion. Subsequent code does not get access to `x`, and that last code snippet
+will result in a type error. This brings us to an important concept, the
+_scope_ of a variable. A variable's scope is the part of the code where
+it is available. In the above snippet,
+{{< sidenote "right" "recursive-let-note" "the part after" >}}
+Technically, the scope of the variable <code>x</code> starts right
+after the <code>=</code>; Haskell let-expressions can be recursive.
+Using only the concepts we've covered so far, we can only use this
+for silly examples, like <code>let x = x + 1 in x</code>. This piece of
+code will run infinitely, since it will never be able to finish
+adding enough <code>1</code>s to find the value of <code>x</code>.
+However, there are more useful examples of this feature, such
+as defining temporary recursive functions, or creating infinite
+lists. We might see these later. 
+{{< /sidenote >}} the `in` and
+before the `)` is the scope of `x`; referring to it anywhere else is an error.
+
+In C and C-like languages, the scope
+of a variable usually continues the end of the _block_ (typically 
+denoted with curly braces) in which it is defined. For instance,
+here's Rust:
+```Rust
+{
+    let x = 1;
+    let y = x + 1;
+    // The scope of x ends here
+}
+// Cannot use x here
+{
+    let x = "hello";
+    // The scope of this x ends here
+}
+// Once again, cannot use x
+```
+Scopes don't have to be denoted using `{` or `}` -- Haskell, for instance,
+just has the `in`. In the Crystal programming language, scopes are usually
+started by some control-flow statement (like `if` or `while`) and ended by
+`end`.
+```Crystal
+if someCondition
+    x = 1
+    # The scope of x ends here
+else
+    # Cannot use x here
+end
+# Crystal is a bit clever, so you can reference
+# x here, but its type is (Int32 | Nil) to indicate
+# that it may not have been set.
+```
+
+So much for this little tour of variables "in the wild". There are some key
+aspects of variables that we can gather from the preceding examples:
+
+1. A variable with the same name can have a different type depending
+   on its location in the code (see the Rust and Crystal examples).
+2. The type of a variable cannot be guessed from its usage. For
+   instance, both the TypeScript and Kotlin examples have `x+1`,
+   but this expression is of type `number` in the former and `String`
+   in the latter.
+3. A variable is not always available in every part of the code;
+   it has a scope (Haskell, Rust, Crystal examples).
+4. In imperative languages, a statement can introduce a variable
+   that subsequent statements can reference (TypeScript, Kotlin,
+   Rust, Crystal).
+
+To get started with type rules for variables, let's introduce
+another metavariable, \\(x\\) (along with \\(n\\) from before).
+Whereas \\(n\\) ranges over any number in our language, \\(x\\) ranges
+over any variable. It can be used as a stand-in for `x`, `y`, `myVar`, and so on.
+
+The first property prevents us from writing type rules like the
+following, since we cannot always assume that a variable has type
+\\(\\text{number}\\) or \\(\\text{string}\\).
+{{< latex >}}
+x : \text{number}
+{{< /latex >}}
+
+The second property prohibits us from trying to guess types of
+compound expressions without caring too much about the variables;
+that is, the following rule is invalid.
+{{< latex >}}
+x + 1 : \text{number}
+{{< /latex >}}
+
+I won't list rules that are prohibited by the third property, but
+I do want to note that it prevents us from just having some global
+list of variables and types "out there", and referring to that in
+our rules. If all types were globally known, it wouldn't be possible
+for a variable to take on different types in different places.
+
+With these constraints in mind, we have enough to start creating
+rules for expressions (but not statements yet; we'll get to that).
+The solution to our problem is to add a third "thing" to our rules:
+the _environment_, typically denoted using the Greek uppercase gamma,
+\\(\\Gamma\\). The environment is basically a list of pairs associating
+variables with their types. For instance, if in some situation
+the variables `x` and `y` were both declared to be of type `int`, we'd
+write that as follows.
+
+{{< latex >}}
+\Gamma = \{ \texttt{x} : \text{int}, \texttt{y} : \text{int} \}
+{{< /latex >}}
+
+If, on the other hand, they both had type `string`, our equation would
+look as follows.
+
+{{< latex >}}
+\Gamma = \{ \texttt{x} : \text{string}, \texttt{y} : \text{string} \}
+{{< /latex >}}
+
+{{< dialog >}}
+{{< message "question" "reader" >}}
+Hey, there's something odd with your typography; how come
+it says \(\texttt{x}\)? Earlier, you wrote \(x\) for variables.
+{{< /message >}}
+{{< message "answer" "Daniel" >}}
+Good catch, and that is no accident. As I mentioned earlier,
+\(x\) is a <em>metavariable</em>; it's part of our meta language,
+and it can stand for many different actual variables in the object
+language.<br>
+<br>
+On the other hand, \(\texttt{x}\) stands for a specific variable,
+<code>x</code> (it doesn't <em>literally</em> look like <code>x</code>
+because it's in a mathematical equation, and I can't style it using
+CSS).<br>
+<br>
+In general, \(\textit{italics}\) font is used for metavariables,
+like \(n\) and \(x\),  which can stand for multiple expressions
+in the object language; on the other hand, \(\texttt{typewriter}\)
+font will be used for actual, concrete variables, such
+as \(\texttt{myVariable}\). In practice, the difference is
+usually clear from context, since concrete variables usually only show up in
+examples, whereas metavariables only appear in rules. So don't
+worry too much about spotting the differences in font.
+{{< /message >}}
+{{< /dialog >}}
+
+Okay, so now we know how to write down environments. How does this
+help us achieve our goal of writing type rules for expressions with
+variables? Well, the benefits become apparent when we incorporate
+environments into our typing rules. So far, we've been using the following
+notation:
+
+{{< latex >}}
+e : \tau
+{{< /latex >}}
+
+This reads,
+
+> The expression \\(e\\) [another metavariable, this one is used for
+  all expressions] has type \\(\\tau\\) [also a metavariable, for
+  types].
+
+However, as we've seen, we can't make global claims like this when variables are
+involved, since the same expression may have different types depending
+on the situation. Now, we instead write:
+
+{{< latex >}}
+\Gamma \vdash e : \tau
+{{< /latex >}}
+
+This version reads,
+
+> In the environment \\(\\Gamma\\), the expression \\(e\\) has type \\(\\tau\\).
+
+And here's the difference. The new \\(\\Gamma\\) of ours captures this
+"depending on the situation" aspect of expressions with variables. It
+provides us with 
+{{< sidenote "right" "context-note" "much-needed context." >}}
+In fact, \(\Gamma\) is sometimes called the typing context.
+{{< /sidenote >}} This version makes it clear that \\(x\\)
+isn't _always_ of type \\(\\tau\\), but only in the specific situation
+described by \\(\\Gamma\\). Using our first two-`int` environment,
+we can make the following (true) claim:
+
+{{< latex >}}
+\{ \texttt{x} : \text{int}, \texttt{y} : \text{int} \} \vdash \texttt{x}+\texttt{y} : \text{int}
+{{< /latex >}}
+
+Which, in English, can be read as "when `x` and `y` are both integers,
+the expression `x+y` also results in an integer". The case for strings is similar:
+
+{{< latex >}}
+\{ \texttt{x} : \text{string}, \texttt{y} : \text{string} \} \vdash \texttt{x}+\texttt{y} : \text{string}
+{{< /latex >}}
+
+This one, can be read "when `x` and `y` are both strings, the expression `x+y`
+also results in a string".
+
+Okay, so now we've seen a couple of examples, but these examples are _not_ rules!
+They capture only specific situations (which we've "hard-coded" by specifying
+what \\(\\Gamma\\) is). Here's what a general rule __should not look like__:
+
+{{< latex >}}
+\{ x_1 : \text{string}, x_2 : \text{string} \} \vdash x_1+x_2 : \text{string}
+{{< /latex >}}
+
+{{< dialog >}}
+{{< message "question" "reader" >}}
+You pretty much just changed the font and
+replaced \(x\) and \(y\) with \(x_1\) and \(x_2\).
+{{< /message >}}
+{{< message "answer" "Daniel" >}}
+Yep! Now, the rule has metavariables which range over <em>any</em>
+(object) variables, so the rule applies whenever any two variables
+of type \(\text{string}\) are added. Remember, though, this rule
+is not good. More on that below!
+{{< /message >}}
+{{< /dialog >}}
+
+This rule is bad, and it should feel bad. Here are two reasons:
+
+1. It only works for expressions like `x+y` or `a+b`, but not for
+   more complicated things like `(a+b)+(c+d)`. This is because
+   by using \\(x\_1\\) and \\(x\_2\\), the metavariables for
+   expressions, it rules out additions that _don't_ add expressions.
+2. It doesn't play well with other rules; it can't be the _only_
+   rule for addition of integers, since it doesn't work for
+   integer literals (i.e., `1+1` is out).
+
+The trouble is that this rule is trying to do too much; it's trying
+to check the environment for variables, but it's _also_ trying to
+specify the results of adding two integers. That's not how we
+did it last time! In fact, when it came to numbers, we had two
+rules. The first said that any number symbol had the \\(\\text{number}\\)
+type. Previously, we wrote it as follows:
+
+{{< todo >}}
+Number vs int. Pick one, probably int.
+{{< /todo >}}
+
+{{< latex >}}
+n : \text{number}
+{{< /latex >}}
+
+Another rule specified the type of addition, without caring how the
+sub-expressions \\(e\_1\\) and \\(e\_2\\) were given _their_ types.
+As long as they had type \\(\\text{number}\\), all was well.
+
+{{< latex >}}
+\frac{e_1 : \text{number} \quad e_2 : \text{number}}{e_1 + e_2 : \text{number}}
+{{< /latex >}}
+
+These rules are good, and we should keep them. Now, though, environments
+are in play. Fortunately, the environment doesn't matter at all when it
+comes to figuring out what the type of a symbol like `1` is -- it's always
+a number! We can thus write the updated rule as follows. Leaving \\(\\Gamma\\)
+unspecified means it can
+stand for any environment.
+{{< todo >}}
+Probably just work in the fact that Gamma is another metavariable.
+{{< /todo >}}
+
+{{< latex >}}
+\Gamma \vdash n : \text{number}
+{{< /latex >}}
+
+We can also translate the addition rule in a pretty straightforward
+manner, by tacking on \\(\\Gamma\\) for every typing claim.
+
+{{< latex >}}
+\frac{\Gamma \vdash e_1 : \text{number} \quad \Gamma \vdash e_2 : \text{number}}{\Gamma \vdash e_1 + e_2 : \text{number}}
+{{< /latex >}}
+
+So we have a rule for number symbols like `1` or `2`, and we have
+a rule for addition. All that's left is a rule for variables, like `x`
+and `y`. This rule needs to make sure that a variable is defined,
+and that it has a particular type. A variable is defined, and has a type,
+if a pair \\(x : \\tau\\) is present in the environment \\(\\Gamma\\).
+Thus, we can write the variable rule like this:
+
+{{< latex >}}
+\frac{x : \tau \in \Gamma}{\Gamma \vdash x : \tau}
+{{< /latex >}}
+
+Note that we're using the \\(\\tau\\) metavariable to range over any type;
+this means the rule applies to (object) variables declared to have type
+\\(\\text{number}\\), \\(\\text{string}\\), or anything else present in
+our system. A single rule takes care of figuring the types of _all_
+variables.
+
+{{< todo >}}
+The rest of this, but mostly statements.
+{{< /todo >}}