From 9192b870b6253249821e5012d197078161c7ff3e Mon Sep 17 00:00:00 2001 From: Danila Fedorin Date: Mon, 29 Aug 2022 21:45:53 -0700 Subject: [PATCH] Add a draft of the variables post in the types series --- content/blog/02_types_variables.md | 334 +++++++++++++++++++++++++++++ 1 file changed, 334 insertions(+) create mode 100644 content/blog/02_types_variables.md diff --git a/content/blog/02_types_variables.md b/content/blog/02_types_variables.md new file mode 100644 index 0000000..2d93c65 --- /dev/null +++ b/content/blog/02_types_variables.md @@ -0,0 +1,334 @@ +--- +title: "Everything I Know About Types: Variables" +date: 2022-08-28T19:05:31-07:00 +tags: ["Type Systems", "Programming Languages"] +draft: true +--- + +In the [previous article]({{< relref "01_types_basics.md" >}}), we started +looking at types. We've looked at the types of strings (`"hello"`), +numbers (`1`, `3.14`), and binary operations (`2*3`, `"hello" + "world"`). +However, in pretty much all of these examples, one important thing has been +missing: variables. We haven't mentioned typechecking code like `x*y`, +and the rules we defined in the previous section will not work for such +code! In this post, we'll change that. + +Let's take a look at examples of variables in some common programming +languages. If you come from a programming background, it's almost certain +that you're comfortable with variables. Nevertheless, it can't hurt to +see how they appear in many languages (hopefully, with enough variety, at +least a little bit of this will be novel to you). + +We've already seen variables in TypeScript. They're declared using `let` +or `const`: + +```TypeScript +const x: number = 1; // type added just to be explicit +let y = x + 1; // x is known to have type number, and thus so is x+1 +``` + +In Kotlin, we have `var` and `val`, for mutable and immutable variables, +respectively: +```Kotlin +val x: String = "hello"; // once again, the type is optional +var y = x + 1; // x is known to have type string, and string+int is a string. +``` + +In Haskell, there's no keyword for declaring variables at the top level. We +could just write: +```Haskell +x :: String -- optional, just to be consistent with previous examples. +x = "hello" +y = x ++ show 1 -- No automatic conversion from numbers to strings +``` + +Haskell also has `let` expressions for creating variables inside expressions: +```Haskell +(let x = 1 in x + x) + x -- outer x is not known +``` +An important thing to note about Haskell `let`-expressions (demonstrated in +the above code), is that the variable `x` only exists within their `in` +portion. Subsequent code does not get access to `x`, and that last code snippet +will result in a type error. This brings us to an important concept, the +_scope_ of a variable. A variable's scope is the part of the code where +it is available. In the above snippet, +{{< sidenote "right" "recursive-let-note" "the part after" >}} +Technically, the scope of the variable x starts right +after the =; Haskell let-expressions can be recursive. +Using only the concepts we've covered so far, we can only use this +for silly examples, like let x = x + 1 in x. This piece of +code will run infinitely, since it will never be able to finish +adding enough 1s to find the value of x. +However, there are more useful examples of this feature, such +as defining temporary recursive functions, or creating infinite +lists. We might see these later. +{{< /sidenote >}} the `in` and +before the `)` is the scope of `x`; referring to it anywhere else is an error. + +In C and C-like languages, the scope +of a variable usually continues the end of the _block_ (typically +denoted with curly braces) in which it is defined. For instance, +here's Rust: +```Rust +{ + let x = 1; + let y = x + 1; + // The scope of x ends here +} +// Cannot use x here +{ + let x = "hello"; + // The scope of this x ends here +} +// Once again, cannot use x +``` +Scopes don't have to be denoted using `{` or `}` -- Haskell, for instance, +just has the `in`. In the Crystal programming language, scopes are usually +started by some control-flow statement (like `if` or `while`) and ended by +`end`. +```Crystal +if someCondition + x = 1 + # The scope of x ends here +else + # Cannot use x here +end +# Crystal is a bit clever, so you can reference +# x here, but its type is (Int32 | Nil) to indicate +# that it may not have been set. +``` + +So much for this little tour of variables "in the wild". There are some key +aspects of variables that we can gather from the preceding examples: + +1. A variable with the same name can have a different type depending + on its location in the code (see the Rust and Crystal examples). +2. The type of a variable cannot be guessed from its usage. For + instance, both the TypeScript and Kotlin examples have `x+1`, + but this expression is of type `number` in the former and `String` + in the latter. +3. A variable is not always available in every part of the code; + it has a scope (Haskell, Rust, Crystal examples). +4. In imperative languages, a statement can introduce a variable + that subsequent statements can reference (TypeScript, Kotlin, + Rust, Crystal). + +To get started with type rules for variables, let's introduce +another metavariable, \\(x\\) (along with \\(n\\) from before). +Whereas \\(n\\) ranges over any number in our language, \\(x\\) ranges +over any variable. It can be used as a stand-in for `x`, `y`, `myVar`, and so on. + +The first property prevents us from writing type rules like the +following, since we cannot always assume that a variable has type +\\(\\text{number}\\) or \\(\\text{string}\\). +{{< latex >}} +x : \text{number} +{{< /latex >}} + +The second property prohibits us from trying to guess types of +compound expressions without caring too much about the variables; +that is, the following rule is invalid. +{{< latex >}} +x + 1 : \text{number} +{{< /latex >}} + +I won't list rules that are prohibited by the third property, but +I do want to note that it prevents us from just having some global +list of variables and types "out there", and referring to that in +our rules. If all types were globally known, it wouldn't be possible +for a variable to take on different types in different places. + +With these constraints in mind, we have enough to start creating +rules for expressions (but not statements yet; we'll get to that). +The solution to our problem is to add a third "thing" to our rules: +the _environment_, typically denoted using the Greek uppercase gamma, +\\(\\Gamma\\). The environment is basically a list of pairs associating +variables with their types. For instance, if in some situation +the variables `x` and `y` were both declared to be of type `int`, we'd +write that as follows. + +{{< latex >}} +\Gamma = \{ \texttt{x} : \text{int}, \texttt{y} : \text{int} \} +{{< /latex >}} + +If, on the other hand, they both had type `string`, our equation would +look as follows. + +{{< latex >}} +\Gamma = \{ \texttt{x} : \text{string}, \texttt{y} : \text{string} \} +{{< /latex >}} + +{{< dialog >}} +{{< message "question" "reader" >}} +Hey, there's something odd with your typography; how come +it says \(\texttt{x}\)? Earlier, you wrote \(x\) for variables. +{{< /message >}} +{{< message "answer" "Daniel" >}} +Good catch, and that is no accident. As I mentioned earlier, +\(x\) is a metavariable; it's part of our meta language, +and it can stand for many different actual variables in the object +language.
+
+On the other hand, \(\texttt{x}\) stands for a specific variable, +x (it doesn't literally look like x +because it's in a mathematical equation, and I can't style it using +CSS).
+
+In general, \(\textit{italics}\) font is used for metavariables, +like \(n\) and \(x\), which can stand for multiple expressions +in the object language; on the other hand, \(\texttt{typewriter}\) +font will be used for actual, concrete variables, such +as \(\texttt{myVariable}\). In practice, the difference is +usually clear from context, since concrete variables usually only show up in +examples, whereas metavariables only appear in rules. So don't +worry too much about spotting the differences in font. +{{< /message >}} +{{< /dialog >}} + +Okay, so now we know how to write down environments. How does this +help us achieve our goal of writing type rules for expressions with +variables? Well, the benefits become apparent when we incorporate +environments into our typing rules. So far, we've been using the following +notation: + +{{< latex >}} +e : \tau +{{< /latex >}} + +This reads, + +> The expression \\(e\\) [another metavariable, this one is used for + all expressions] has type \\(\\tau\\) [also a metavariable, for + types]. + +However, as we've seen, we can't make global claims like this when variables are +involved, since the same expression may have different types depending +on the situation. Now, we instead write: + +{{< latex >}} +\Gamma \vdash e : \tau +{{< /latex >}} + +This version reads, + +> In the environment \\(\\Gamma\\), the expression \\(e\\) has type \\(\\tau\\). + +And here's the difference. The new \\(\\Gamma\\) of ours captures this +"depending on the situation" aspect of expressions with variables. It +provides us with +{{< sidenote "right" "context-note" "much-needed context." >}} +In fact, \(\Gamma\) is sometimes called the typing context. +{{< /sidenote >}} This version makes it clear that \\(x\\) +isn't _always_ of type \\(\\tau\\), but only in the specific situation +described by \\(\\Gamma\\). Using our first two-`int` environment, +we can make the following (true) claim: + +{{< latex >}} +\{ \texttt{x} : \text{int}, \texttt{y} : \text{int} \} \vdash \texttt{x}+\texttt{y} : \text{int} +{{< /latex >}} + +Which, in English, can be read as "when `x` and `y` are both integers, +the expression `x+y` also results in an integer". The case for strings is similar: + +{{< latex >}} +\{ \texttt{x} : \text{string}, \texttt{y} : \text{string} \} \vdash \texttt{x}+\texttt{y} : \text{string} +{{< /latex >}} + +This one, can be read "when `x` and `y` are both strings, the expression `x+y` +also results in a string". + +Okay, so now we've seen a couple of examples, but these examples are _not_ rules! +They capture only specific situations (which we've "hard-coded" by specifying +what \\(\\Gamma\\) is). Here's what a general rule __should not look like__: + +{{< latex >}} +\{ x_1 : \text{string}, x_2 : \text{string} \} \vdash x_1+x_2 : \text{string} +{{< /latex >}} + +{{< dialog >}} +{{< message "question" "reader" >}} +You pretty much just changed the font and +replaced \(x\) and \(y\) with \(x_1\) and \(x_2\). +{{< /message >}} +{{< message "answer" "Daniel" >}} +Yep! Now, the rule has metavariables which range over any +(object) variables, so the rule applies whenever any two variables +of type \(\text{string}\) are added. Remember, though, this rule +is not good. More on that below! +{{< /message >}} +{{< /dialog >}} + +This rule is bad, and it should feel bad. Here are two reasons: + +1. It only works for expressions like `x+y` or `a+b`, but not for + more complicated things like `(a+b)+(c+d)`. This is because + by using \\(x\_1\\) and \\(x\_2\\), the metavariables for + expressions, it rules out additions that _don't_ add expressions. +2. It doesn't play well with other rules; it can't be the _only_ + rule for addition of integers, since it doesn't work for + integer literals (i.e., `1+1` is out). + +The trouble is that this rule is trying to do too much; it's trying +to check the environment for variables, but it's _also_ trying to +specify the results of adding two integers. That's not how we +did it last time! In fact, when it came to numbers, we had two +rules. The first said that any number symbol had the \\(\\text{number}\\) +type. Previously, we wrote it as follows: + +{{< todo >}} +Number vs int. Pick one, probably int. +{{< /todo >}} + +{{< latex >}} +n : \text{number} +{{< /latex >}} + +Another rule specified the type of addition, without caring how the +sub-expressions \\(e\_1\\) and \\(e\_2\\) were given _their_ types. +As long as they had type \\(\\text{number}\\), all was well. + +{{< latex >}} +\frac{e_1 : \text{number} \quad e_2 : \text{number}}{e_1 + e_2 : \text{number}} +{{< /latex >}} + +These rules are good, and we should keep them. Now, though, environments +are in play. Fortunately, the environment doesn't matter at all when it +comes to figuring out what the type of a symbol like `1` is -- it's always +a number! We can thus write the updated rule as follows. Leaving \\(\\Gamma\\) +unspecified means it can +stand for any environment. +{{< todo >}} +Probably just work in the fact that Gamma is another metavariable. +{{< /todo >}} + +{{< latex >}} +\Gamma \vdash n : \text{number} +{{< /latex >}} + +We can also translate the addition rule in a pretty straightforward +manner, by tacking on \\(\\Gamma\\) for every typing claim. + +{{< latex >}} +\frac{\Gamma \vdash e_1 : \text{number} \quad \Gamma \vdash e_2 : \text{number}}{\Gamma \vdash e_1 + e_2 : \text{number}} +{{< /latex >}} + +So we have a rule for number symbols like `1` or `2`, and we have +a rule for addition. All that's left is a rule for variables, like `x` +and `y`. This rule needs to make sure that a variable is defined, +and that it has a particular type. A variable is defined, and has a type, +if a pair \\(x : \\tau\\) is present in the environment \\(\\Gamma\\). +Thus, we can write the variable rule like this: + +{{< latex >}} +\frac{x : \tau \in \Gamma}{\Gamma \vdash x : \tau} +{{< /latex >}} + +Note that we're using the \\(\\tau\\) metavariable to range over any type; +this means the rule applies to (object) variables declared to have type +\\(\\text{number}\\), \\(\\text{string}\\), or anything else present in +our system. A single rule takes care of figuring the types of _all_ +variables. + +{{< todo >}} +The rest of this, but mostly statements. +{{< /todo >}}