Compare commits
	
		
			31 Commits
		
	
	
		
			e063ff6aa5
			...
			1df315612a
		
	
	| Author | SHA1 | Date | |
|---|---|---|---|
| 1df315612a | |||
| 15beddf96b | |||
| 20d8b18a9b | |||
| 53ff0c39e4 | |||
| 357a3bef09 | |||
| 81eef51e88 | |||
| 10dfb2fe49 | |||
| ef76149112 | |||
| a6a330a78e | |||
| 7f4d0df366 | |||
| 3eddac0a89 | |||
| 6e7ac1c1ca | |||
| 68d9cf1274 | |||
| 5eb0d1548c | |||
| e543904995 | |||
| ffda1d3235 | |||
| b705aa217c | |||
| 6f20b17948 | |||
| 2fde7e5cf8 | |||
| bee06b6731 | |||
| d3fa7336a2 | |||
| 96545a899f | |||
| 6ef5ae2394 | |||
| 05a31dd4d4 | |||
| d9d5c8bf14 | |||
| 291a1f0178 | |||
| 2547b53aa2 | |||
| 409f8b7186 | |||
| 189422bf1e | |||
| 74daeee140 | |||
| befcd3cf98 | 
| @ -26,6 +26,13 @@ defaultContentLanguage = 'en' | ||||
|     endLevel = 4 | ||||
|     ordered = false | ||||
|     startLevel = 3 | ||||
|   [markup.goldmark] | ||||
|     [markup.goldmark.extensions] | ||||
|       [markup.goldmark.extensions.passthrough] | ||||
|         enable = true | ||||
|         [markup.goldmark.extensions.passthrough.delimiters] | ||||
|           block = [['\[', '\]'], ['$$', '$$']] | ||||
|           inline = [['\(', '\)']] | ||||
| 
 | ||||
| [languages] | ||||
|   [languages.en] | ||||
|  | ||||
| @ -58,7 +58,7 @@ Here's another inference rule, this time with some mathematical notation instead | ||||
| {n + 1 < m + 1} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| This one reads, "if \\(n\\) is less than \\(m\\), then \\(n+1\\) is less than \\(m+1\\)". We can use inference | ||||
| This one reads, "if \(n\) is less than \(m\), then \(n+1\) is less than \(m+1\)". We can use inference | ||||
| rules to define various constructs. As an example, let's define what it means for a natural number to be even. | ||||
| It takes two rules: | ||||
| 
 | ||||
| @ -80,7 +80,7 @@ again we see that 4 is even, as well. We can continue this to determine that 6, | ||||
| are even too. Never in this process will we visit the numbers 1 or 3 or 5, and that's good - they're not even! | ||||
| 
 | ||||
| Let's now extend this notion to programming languages, starting with a simple arithmetic language. | ||||
| This language is made up of natural numbers and the \\(\square\\) operation, which represents the addition | ||||
| This language is made up of natural numbers and the \(\square\) operation, which represents the addition | ||||
| of two numbers. Again, we need two rules: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -93,14 +93,14 @@ of two numbers. Again, we need two rules: | ||||
| {e_1 \square e_2 \; \text{evaluates to} \; n_1 + n_2} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| First, let me explain myself. I used \\(\square\\) to demonstrate two important points. First, languages can be made of | ||||
| First, let me explain myself. I used \(\square\) to demonstrate two important points. First, languages can be made of | ||||
| any kind of characters we want; it's the rules that we define that give these languages meaning. | ||||
| Second, while \\(\square\\) is the addition operation _in our language_, \\(+\\) is the _mathematical addition operator_. | ||||
| Second, while \(\square\) is the addition operation _in our language_, \(+\) is the _mathematical addition operator_. | ||||
| They are not the same - we use the latter to define how the former works. | ||||
| 
 | ||||
| Finally, writing "evaluates to" gets quite tedious, especially for complex languages. Instead, | ||||
| PLT people use notation to make their semantics more concise. The symbol \\(\Downarrow\\) is commonly | ||||
| used to mean "evaluates to"; thus, \\(e \Downarrow v\\) reads "the expression \\(e\\) evaluates to the value \\(v\\). | ||||
| PLT people use notation to make their semantics more concise. The symbol \(\Downarrow\) is commonly | ||||
| used to mean "evaluates to"; thus, \(e \Downarrow v\) reads "the expression \(e\) evaluates to the value \(v\). | ||||
| Using this notation, our rules start to look like the following: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -147,8 +147,8 @@ Inductive tinylang : Type := | ||||
|     | box (e1 e2 : tinylang) : tinylang. | ||||
| ``` | ||||
| 
 | ||||
| This defines the two elements of our example language: `number n` corresponds to \\(n\\), and `box e1 e2` corresponds | ||||
| to \\(e_1 \square e_2\\). Finally, we define the inference rules: | ||||
| This defines the two elements of our example language: `number n` corresponds to \(n\), and `box e1 e2` corresponds | ||||
| to \(e_1 \square e_2\). Finally, we define the inference rules: | ||||
| 
 | ||||
| ```Coq {linenos=true} | ||||
| Inductive tinylang_sem : tinylang -> nat -> Prop := | ||||
| @ -158,7 +158,7 @@ Inductive tinylang_sem : tinylang -> nat -> Prop := | ||||
|         tinylang_sem (box e1 e2) (n1 + n2). | ||||
| ``` | ||||
| 
 | ||||
| When we wrote our rules earlier, by using arbitrary variables like \\(e_1\\) and \\(n_1\\), we implicitly meant | ||||
| When we wrote our rules earlier, by using arbitrary variables like \(e_1\) and \(n_1\), we implicitly meant | ||||
| that our rules work for _any_ number or expression. When writing Coq we have to make this assumption explicit | ||||
| by using `forall`. For instance, the rule on line 2 reads, "for any number `n`, the expression `n` evaluates to `n`". | ||||
| 
 | ||||
| @ -167,8 +167,8 @@ by using `forall`. For instance, the rule on line 2 reads, "for any number `n`, | ||||
| We've now written some example big-step operational semantics, both "on paper" and in Coq. Now, it's time to take a look at | ||||
| the specific semantics of the language from Day 8! Our language consists of a few parts. | ||||
| 
 | ||||
| First, there are three opcodes: \\(\texttt{jmp}\\), \\(\\texttt{nop}\\), and \\(\\texttt{add}\\). Opcodes, combined | ||||
| with an integer, make up an instruction. For example, the instruction \\(\\texttt{add} \\; 3\\) will increase the | ||||
| First, there are three opcodes: \(\texttt{jmp}\), \(\texttt{nop}\), and \(\texttt{add}\). Opcodes, combined | ||||
| with an integer, make up an instruction. For example, the instruction \(\texttt{add} \; 3\) will increase the | ||||
| content of the accumulator by three. Finally, a program consists of a sequence of instructions; They're separated | ||||
| by newlines in the puzzle input, but we'll instead separate them by semicolons. For example, here's a complete program. | ||||
| 
 | ||||
| @ -181,17 +181,17 @@ it will add 0 to the accumulator (keeping it the same), | ||||
| do nothing, and finally jump back to the beginning. At this point, it will try to run the addition instruction again, | ||||
| which is not allowed; thus, the program will terminate. | ||||
| 
 | ||||
| Did you catch that? The semantics of this language will require more information than just our program itself (which we'll denote by \\(p\\)). | ||||
| * First, to evaluate the program we will need a program counter, \\(\\textit{c}\\). This program counter | ||||
| Did you catch that? The semantics of this language will require more information than just our program itself (which we'll denote by \(p\)). | ||||
| * First, to evaluate the program we will need a program counter, \(\textit{c}\). This program counter | ||||
| will tell us the position of the instruction to be executed next. It can also point past the last instruction, | ||||
| which means our program terminated successfully.  | ||||
| * Next, we'll need the accumulator \\(a\\). Addition instructions can change the accumulator, and we will be interested | ||||
| * Next, we'll need the accumulator \(a\). Addition instructions can change the accumulator, and we will be interested | ||||
| in the number that ends up in the accumulator when our program finishes executing. | ||||
| * Finally, and more subtly, we'll need to keep track of the states we visited. For instance, | ||||
| in the course of evaluating our program above, we encounter the \\((c, a)\\) pair of \\((0, 0)\\) twice: once | ||||
| in the course of evaluating our program above, we encounter the \((c, a)\) pair of \((0, 0)\) twice: once | ||||
| at the beginning, and once at the end. However, whereas at the beginning we have not yet encountered the addition | ||||
| instruction, at the end we have, so the evaluation behaves differently. To make the proofs work better in Coq, | ||||
| we'll use a set \\(v\\) of | ||||
| we'll use a set \(v\) of | ||||
| {{< sidenote "right" "allowed-note" "allowed (valid) program counters (as opposed to visited program counters)." >}} | ||||
| Whereas the set of "visited" program counters keeps growing as our evaluation continues, | ||||
| the set of "allowed" program counters keeps shrinking. Because the "allowed" set never stops shrinking, | ||||
| @ -206,10 +206,10 @@ never changes; only the state does. So I propose this (rather unorthodox) notati | ||||
| (c, a, v) \Rightarrow_p (c', a', v') | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| This reads, "after starting at program counter \\(c\\), accumulator \\(a\\), and set of valid addresses \\(v\\), | ||||
| the program \\(p\\) terminates with program counter \\(c'\\), accumulator \\(a'\\), and set of valid addresses \\(v'\\)". | ||||
| This reads, "after starting at program counter \(c\), accumulator \(a\), and set of valid addresses \(v\), | ||||
| the program \(p\) terminates with program counter \(c'\), accumulator \(a'\), and set of valid addresses \(v'\)". | ||||
| Before creating the inference rules for this evaluation relation, let's define the effect of evaluating a single | ||||
| instruction, using notation \\((c, a) \rightarrow_i (c', a')\\). An addition instruction changes the accumulator, | ||||
| instruction, using notation \((c, a) \rightarrow_i (c', a')\). An addition instruction changes the accumulator, | ||||
| and increases the program counter by 1. | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -240,8 +240,8 @@ is done evaluating, and is in a "failed" state. | ||||
| {(c, a, v) \Rightarrow_{p} (c, a, v)} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| We use \\(\\text{length}(p)\\) to represent the number of instructions in \\(p\\). Note the second premise: | ||||
| even if our program counter \\(c\\) is not included in the valid set, if it's "past the end of the program", | ||||
| We use \(\text{length}(p)\) to represent the number of instructions in \(p\). Note the second premise: | ||||
| even if our program counter \(c\) is not included in the valid set, if it's "past the end of the program", | ||||
| the program terminates in an "ok" state. | ||||
| {{< sidenote "left" "avoid-c-note" "Here's a rule for terminating in the \"ok\" state:" >}} | ||||
| In the presented rule, we don't use the variable <code>c</code> all that much, and we know its concrete | ||||
| @ -281,20 +281,20 @@ our program can take a step, and continue evaluating from there. | ||||
| This is quite a rule. A lot of things need to work out for a program to evauate from a state that isn't | ||||
| currently the final state: | ||||
| 
 | ||||
| * The current program counter \\(c\\) must be valid. That is, it must be an element of \\(v\\). | ||||
| * This program counter must correspond to an instruction \\(i\\) in \\(p\\), which we write as \\(p[c] = i\\). | ||||
| * This instruction must be executed, changing our program counter from \\(c\\) to \\(c'\\) and our | ||||
| accumulator from \\(a\\) to \\(a'\\). The set of valid instructions will no longer include \\(c\\), | ||||
| and will become \\(v - \\{c\\}\\). | ||||
| * The current program counter \(c\) must be valid. That is, it must be an element of \(v\). | ||||
| * This program counter must correspond to an instruction \(i\) in \(p\), which we write as \(p[c] = i\). | ||||
| * This instruction must be executed, changing our program counter from \(c\) to \(c'\) and our | ||||
| accumulator from \(a\) to \(a'\). The set of valid instructions will no longer include \(c\), | ||||
| and will become \(v - \{c\}\). | ||||
| * Our program must then finish executing, starting at state | ||||
| \\((c', a', v - \\{c\\})\\), and ending in some (unknown) state \\((c'', a'', v'')\\). | ||||
| \((c', a', v - \{c\})\), and ending in some (unknown) state \((c'', a'', v'')\). | ||||
| 
 | ||||
| If all of these conditions are met, our program, starting at \\((c, a, v)\\), will terminate in the state \\((c'', a'', v'')\\). This third rule completes our semantics; a program being executed will keep running instructions using the third rule, until it finally | ||||
| If all of these conditions are met, our program, starting at \((c, a, v)\), will terminate in the state \((c'', a'', v'')\). This third rule completes our semantics; a program being executed will keep running instructions using the third rule, until it finally | ||||
| hits an invalid program counter (terminating with the first rule) or gets to the end of the program (terminating with the second rule). | ||||
| 
 | ||||
| #### Aside: Vectors and Finite \\(\mathbb{N}\\) | ||||
| #### Aside: Vectors and Finite \\(\\mathbb{N}\\) | ||||
| We'll be getting to the Coq implementation of our semantics soon, but before we do: | ||||
| what type should \\(c\\) be? It's entirely possible for an instruction like \\(\\texttt{jmp} \\; -10000\\) | ||||
| what type should \(c\) be? It's entirely possible for an instruction like \(\texttt{jmp} \; -10000\) | ||||
| to throw our program counter way before the first instruction of our program, so at first, it seems | ||||
| as though we should use an integer. But the prompt doesn't even specify what should happen in this | ||||
| case - it only says an instruction shouldn't be run twice. The "valid set", although it may help resolve | ||||
| @ -302,7 +302,7 @@ this debate, is our invention, and isn't part of the original specification. | ||||
| 
 | ||||
| There is, however, something we can infer from this problem. Since the problem of jumping "too far behind" or | ||||
| "too far ahead" is never mentioned, we can assume that _all jumps will lead either to an instruction, | ||||
| or right to the end of a program_. This means that \\(c\\) is a natural number, with | ||||
| or right to the end of a program_. This means that \(c\) is a natural number, with | ||||
| 
 | ||||
| {{< latex >}} | ||||
| 0 \leq c \leq \text{length}(p) | ||||
| @ -322,9 +322,9 @@ inference rules, let's present two rules that define such a number: | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| This is a variation of the [Peano encoding](https://wiki.haskell.org/Peano_numbers) of natural numbers. | ||||
| It reads as follows: zero (\\(Z\\)) is a finite natural number less than any positive natural number \\(n\\). Then, if a finite natural number | ||||
| \\(f\\) is less than \\(n\\), then adding one to that number (using the successor function \\(S\\)) | ||||
| will create a natural number less than \\(n+1\\). We encode this in Coq as follows | ||||
| It reads as follows: zero (\(Z\)) is a finite natural number less than any positive natural number \(n\). Then, if a finite natural number | ||||
| \(f\) is less than \(n\), then adding one to that number (using the successor function \(S\)) | ||||
| will create a natural number less than \(n+1\). We encode this in Coq as follows | ||||
| ([originally from here](https://coq.inria.fr/library/Coq.Vectors.Fin.html#t)): | ||||
| 
 | ||||
| ```Coq | ||||
| @ -333,9 +333,9 @@ Inductive t : nat -> Set := | ||||
|     | FS : forall {n}, t n -> t (S n). | ||||
| ``` | ||||
| 
 | ||||
| The `F1` constructor here is equivalent to our \\(Z\\), and `FS` is equivalent to our \\(S\\). | ||||
| To represent positive natural numbers \\(\\mathbb{N}^+\\), we simply take a regular natural | ||||
| number from \\(\mathbb{N}\\) and find its successor using `S` (simply adding 1). Again, we have | ||||
| The `F1` constructor here is equivalent to our \(Z\), and `FS` is equivalent to our \(S\). | ||||
| To represent positive natural numbers \(\mathbb{N}^+\), we simply take a regular natural | ||||
| number from \(\mathbb{N}\) and find its successor using `S` (simply adding 1). Again, we have | ||||
| to explicitly use `forall` in our type signatures. | ||||
| 
 | ||||
| We can use a similar technique to represent a list with a known number of elements, known | ||||
| @ -352,9 +352,9 @@ a vector: | ||||
| {(x::\textit{xs}) : \text{Vec} \; t \; (n+1)} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| These rules read: the empty list \\([]\\) is zero-length vector of any type \\(t\\). Then, | ||||
| if we take an element \\(x\\) of type \\(t\\), and an \\(n\\)-long vector \\(\textit{xs}\\) of \\(t\\), | ||||
| then we can prepend \\(x\\) to \\(\textit{xs}\\) and get an \\((n+1)\\)-long vector of \\(t\\). | ||||
| These rules read: the empty list \([]\) is zero-length vector of any type \(t\). Then, | ||||
| if we take an element \(x\) of type \(t\), and an \(n\)-long vector \(\textit{xs}\) of \(t\), | ||||
| then we can prepend \(x\) to \(\textit{xs}\) and get an \((n+1)\)-long vector of \(t\). | ||||
| In Coq, we write this as follows ([originally from here](https://coq.inria.fr/library/Coq.Vectors.VectorDef.html#t)): | ||||
| 
 | ||||
| ```Coq | ||||
| @ -363,32 +363,32 @@ Inductive t A : nat -> Type := | ||||
|     | cons : forall (h:A) (n:nat), t A n -> t A (S n). | ||||
| ``` | ||||
| 
 | ||||
| The `nil` constructor represents the empty list \\([]\\), and `cons` represents | ||||
| the operation of prepending an element (called `h` in the code and \\(x\\) in our inference rules) | ||||
| to another vector of length \\(n\\), which remains unnamed in the code but is called \\(\\textit{xs}\\) in our rules. | ||||
| The `nil` constructor represents the empty list \([]\), and `cons` represents | ||||
| the operation of prepending an element (called `h` in the code and \(x\) in our inference rules) | ||||
| to another vector of length \(n\), which remains unnamed in the code but is called \(\textit{xs}\) in our rules. | ||||
| 
 | ||||
| These two definitions work together quite well. For instance, suppose we have a vector of length \\(n\\). | ||||
| If we were to access its elements by indices starting at 0, we'd be allowed to access indices 0 through \\(n-1\\). | ||||
| These are precisely the values of the finite natural numbers less than \\(n\\), \\(\\text{Fin} \\; n \\). | ||||
| Thus, given such an index \\(\\text{Fin} \\; n\\) and a vector \\(\\text{Vec} \\; t \\; n\\), we are guaranteed | ||||
| These two definitions work together quite well. For instance, suppose we have a vector of length \(n\). | ||||
| If we were to access its elements by indices starting at 0, we'd be allowed to access indices 0 through \(n-1\). | ||||
| These are precisely the values of the finite natural numbers less than \(n\), \(\text{Fin} \; n \). | ||||
| Thus, given such an index \(\text{Fin} \; n\) and a vector \(\text{Vec} \; t \; n\), we are guaranteed | ||||
| to be able to retrieve the element at the given index! In our code, we will not have to worry about bounds checking. | ||||
| 
 | ||||
| Of course, if our program has \\(n\\) elements, our program counter will be a finite number less than \\(n+1\\), | ||||
| Of course, if our program has \(n\) elements, our program counter will be a finite number less than \(n+1\), | ||||
| since there's always the possibility of it pointing past the instructions, indicating that we've finished | ||||
| running the program. This leads to some minor complications: we can't safely access the program instruction | ||||
| at index \\(\\text{Fin} \\; (n+1)\\). We can solve this problem by considering two cases: | ||||
| at index \(\text{Fin} \; (n+1)\). We can solve this problem by considering two cases: | ||||
| either our index points one past the end of the program (in which case its value is exactly the finite | ||||
| representation of \\(n\\)), or it's less than \\(n\\), in which case we can "tighten" the upper bound, | ||||
| and convert that index into a \\(\\text{Fin} \\; n\\). We formalize it in a lemma: | ||||
| representation of \(n\)), or it's less than \(n\), in which case we can "tighten" the upper bound, | ||||
| and convert that index into a \(\text{Fin} \; n\). We formalize it in a lemma: | ||||
| 
 | ||||
| {{< codelines "Coq" "aoc-2020/day8.v" 80 82 >}} | ||||
| 
 | ||||
| There's a little bit of a gotcha here. Instead of translating our above statement literally, | ||||
| and returning a value that's the result of "tightening" our input `f`, we return a value | ||||
| `f'` that can be "weakened" to `f`. This is because "tightening" is not a total function -  | ||||
| it's not always possible to convert a \\(\\text{Fin} \\; (n+1)\\) into a \\(\\text{Fin} \\; n\\). | ||||
| However, "weakening" \\(\\text{Fin} \\; n\\) _is_ a total function, since a number less than \\(n\\) | ||||
| is, by the transitive property of a total order, also less than \\(n+1\\). | ||||
| it's not always possible to convert a \(\text{Fin} \; (n+1)\) into a \(\text{Fin} \; n\). | ||||
| However, "weakening" \(\text{Fin} \; n\) _is_ a total function, since a number less than \(n\) | ||||
| is, by the transitive property of a total order, also less than \(n+1\). | ||||
| 
 | ||||
| The Coq proof for this claim is as follows: | ||||
| 
 | ||||
| @ -409,11 +409,11 @@ we assume is nonzero, since there isn't a natural number less than zero). | ||||
| can itself be tightened. | ||||
|     * If it can't be tightened, then our smaller number is a finite representation of | ||||
|     `n-1`. This, in turn, means that adding one to it will be the finite representation | ||||
|     of `n` (if \\(x\\) is equal to \\(n-1\\), then \\(x+1\\) is equal to \\(n\\)). | ||||
|     * If it _can_ be tightened, then so can the successor (if \\(x\\) is less | ||||
|     than \\(n-1\\), then \\(x+1\\) is less than \\(n\\)). | ||||
|     of `n` (if \(x\) is equal to \(n-1\), then \(x+1\) is equal to \(n\)). | ||||
|     * If it _can_ be tightened, then so can the successor (if \(x\) is less | ||||
|     than \(n-1\), then \(x+1\) is less than \(n\)). | ||||
| 
 | ||||
| Next, let's talk about addition, specifically the kind of addition done by the \\(\\texttt{jmp}\\) instruction. | ||||
| Next, let's talk about addition, specifically the kind of addition done by the \(\texttt{jmp}\) instruction. | ||||
| We can always add an integer to a natural number, but we can at best guarantee that the result | ||||
| will be an integer. For instance, we can add `-1000` to `1`, and get `-999`, which is _not_ a natural | ||||
| number. We implement this kind of addition in a function called `jump_t`: | ||||
| @ -436,7 +436,7 @@ Now, suppose we wanted to write a function that _does_ return a valid program | ||||
| counter after adding the offset to it. Since it's possible for this function to fail | ||||
| (for instance, if the offset is very negative), it has to return `option (fin (S n))`. | ||||
| That is, this function may either fail (returning `None`) or succeed, returning | ||||
| `Some f`, where `f` is of type `fin (S n)`, aka \\(\\text{Fin} \\; (n + 1)\\). Here's | ||||
| `Some f`, where `f` is of type `fin (S n)`, aka \(\text{Fin} \; (n + 1)\). Here's | ||||
| the function in Coq (again, don't worry too much about the definition): | ||||
| 
 | ||||
| {{< codelines "Coq" "aoc-2020/day8.v" 61 61 >}} | ||||
| @ -464,7 +464,7 @@ The star `*` is used here to represent a [product type](https://en.wikipedia.org | ||||
| rather than arithmetic multiplication. Our state type accepts an argument, | ||||
| `n`, much like a finite natural number or a vector. In fact, this `n` is passed on | ||||
| to the state's program counter and set types. Rightly, a state for a program | ||||
| of length \\(n\\) will not be of the same type as a state for a program of length \\(n+1\\). | ||||
| of length \(n\) will not be of the same type as a state for a program of length \(n+1\). | ||||
| 
 | ||||
| An instruction is also a tuple, but this time containing only two elements: the opcode and | ||||
| the number. We write this as follows: | ||||
| @ -479,11 +479,11 @@ definition: | ||||
| {{< codelines "Coq" "aoc-2020/day8.v" 38 38 >}} | ||||
| 
 | ||||
| So far, so good! Finally, it's time to get started on the semantics themselves. | ||||
| We begin with the inductive definition of \\((\\rightarrow_i)\\). | ||||
| We begin with the inductive definition of \((\rightarrow_i)\). | ||||
| I think this is fairly straightforward. However, we do use | ||||
| `t` instead of \\(n\\) from the rules, and we use `FS` | ||||
| instead of \\(+1\\). Also, we make the formerly implicit | ||||
| assumption that \\(c+n\\) is valid explicit, by | ||||
| `t` instead of \(n\) from the rules, and we use `FS` | ||||
| instead of \(+1\). Also, we make the formerly implicit | ||||
| assumption that \(c+n\) is valid explicit, by | ||||
| providing a proof that `valid_jump_t pc t = Some pc'`. | ||||
| 
 | ||||
| {{< codelines "Coq" "aoc-2020/day8.v" 103 110 >}} | ||||
| @ -507,8 +507,8 @@ counter will be equal to the length of the program `n`, | ||||
| so we use `nat_to_fin n`. On the other hand, if the program | ||||
| terminates in as stuck state, it must be that it terminated | ||||
| at a program counter that points to an instruction. Thus, this | ||||
| program counter is actually a \\(\\text{Fin} \\; n\\), and not | ||||
| a \\(\\text{Fin} \\ (n+1)\\), and is not in the set of allowed program counters. | ||||
| program counter is actually a \(\text{Fin} \; n\), and not | ||||
| a \(\text{Fin} \ (n+1)\), and is not in the set of allowed program counters. | ||||
| We use the same "weakening" trick we saw earlier to represent | ||||
| this. | ||||
| 
 | ||||
| @ -518,8 +518,8 @@ Finally, we encode the three inference rules we came up with: | ||||
| 
 | ||||
| Notice that we fused two of the premises in the last rule. | ||||
| Instead of naming the instruction at the current program | ||||
| counter (by writing \\(p[c] = i\\)) and using it in another premise, we simply use | ||||
| `nth inp pc`, which corresponds to \\(p[c]\\) in our | ||||
| counter (by writing \(p[c] = i\)) and using it in another premise, we simply use | ||||
| `nth inp pc`, which corresponds to \(p[c]\) in our | ||||
| "paper" semantics. | ||||
| 
 | ||||
| Before we go on writing some actual proofs, we have | ||||
| @ -532,12 +532,12 @@ start off, we'll define the notion of a "valid instruction", which is guaranteed | ||||
| to keep the program counter in the correct range. | ||||
| There are a couple of ways to do this, but we'll use yet another definition based | ||||
| on inference rules. First, though, observe that the same instruction may be valid | ||||
| for one program, and invalid for another. For instance, \\(\\texttt{jmp} \\; 100\\) | ||||
| for one program, and invalid for another. For instance, \(\texttt{jmp} \; 100\) | ||||
| is perfectly valid for a program with thousands of instructions, but if it occurs | ||||
| in a program with only 3 instructions, it will certainly lead to disaster. Specifically, | ||||
| the validity of an instruction depends on the length of the program in which it resides, | ||||
| and the program counter at which it's encountered. | ||||
| Thus, we refine our idea of validity to "being valid for a program of length \\(n\\) at program counter \\(f\\)". | ||||
| Thus, we refine our idea of validity to "being valid for a program of length \(n\) at program counter \(f\)". | ||||
| For this, we can use the following two inference rules: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -550,14 +550,14 @@ For this, we can use the following two inference rules: | ||||
| {o \; t \; \text{valid for} \; n, c } | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| The first rule states that if a program has length \\(n\\), then \\(\\texttt{add}\\) is valid | ||||
| at any program counter whose value is less than \\(n\\). This is because running | ||||
| \\(\\texttt{add}\\) will increment the program counter \\(c\\) by 1, | ||||
| and thus, create a new program counter that's less than \\(n+1\\), | ||||
| The first rule states that if a program has length \(n\), then \(\texttt{add}\) is valid | ||||
| at any program counter whose value is less than \(n\). This is because running | ||||
| \(\texttt{add}\) will increment the program counter \(c\) by 1, | ||||
| and thus, create a new program counter that's less than \(n+1\), | ||||
| which, as we discussed above, is perfectly valid. | ||||
| 
 | ||||
| The second rule works for the other two instructions. It has an extra premise: | ||||
| the result of `jump_valid_t` (written as \\(J_v\\)) has to be \\(\\text{Some} \\; c'\\), | ||||
| the result of `jump_valid_t` (written as \(J_v\)) has to be \(\text{Some} \; c'\), | ||||
| that is, `jump_valid_t` must succeed. Note that we require this even for no-ops, | ||||
| since it later turns out that one of the them may be a jump after all. | ||||
| 
 | ||||
| @ -568,26 +568,26 @@ We encode the rules in Coq as follows: | ||||
| {{< codelines "Coq" "aoc-2020/day8.v" 152 157 >}} | ||||
| 
 | ||||
| Note that we have three rules instead of two. This is because we "unfolded" | ||||
| \\(o\\) from our second rule: rather than using set notation (or "or"), we | ||||
| \(o\) from our second rule: rather than using set notation (or "or"), we | ||||
| just generated two rules that vary in nothing but the operation involved. | ||||
| 
 | ||||
| Of course, we must have that every instruction in a program is valid. | ||||
| We don't really need inference rules for this, as much as a "forall" quantifier. | ||||
| A program \\(p\\) of length \\(n\\) is valid if the following holds: | ||||
| A program \(p\) of length \(n\) is valid if the following holds: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \forall (c : \text{Fin} \; n). p[c] \; \text{valid for} \; n, c | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| That is, for every possible in-bounds program counter \\(c\\), | ||||
| That is, for every possible in-bounds program counter \(c\), | ||||
| the instruction at the program counter is valid. We can now | ||||
| encode this in Coq, too: | ||||
| 
 | ||||
| {{< codelines "Coq" "aoc-2020/day8.v" 160 161 >}} | ||||
| 
 | ||||
| In the above, `n` is made implicit where possible. | ||||
| Since \\(c\\) (called `pc` in the code) is of type \\(\\text{Fin} \\; n\\), there's no | ||||
| need to write \\(n\\) _again_. The curly braces tell Coq to infer that | ||||
| Since \(c\) (called `pc` in the code) is of type \(\text{Fin} \; n\), there's no | ||||
| need to write \(n\) _again_. The curly braces tell Coq to infer that | ||||
| argument where possible. | ||||
| 
 | ||||
| ### Proving Termination | ||||
| @ -613,7 +613,7 @@ The first proof is rather simple. The claim is: | ||||
| > For our valid program, at any program counter `pc` | ||||
| and accumulator `acc`, there must exist another program | ||||
| counter `pc'` and accumulator `acc'` such that the | ||||
| instruction evaluation relation \\((\rightarrow_i)\\) | ||||
| instruction evaluation relation \((\rightarrow_i)\) | ||||
| connects the two. That is, valid addresses aside, | ||||
| we can always make a step. | ||||
| 
 | ||||
| @ -676,14 +676,14 @@ case analysis on `o`. | ||||
| There are three possible cases we have to consider, | ||||
| one for each type of instruction. | ||||
| 
 | ||||
| * If the instruction is \\(\\texttt{add}\\), we know | ||||
| * If the instruction is \(\texttt{add}\), we know | ||||
| that `pc' = pc + 1` and `acc' = acc + t0`. That is, | ||||
| the program counter is simply incremented, and the accumulator | ||||
| is modified with the number part of the instruction. | ||||
| * If the instruction is \\(\\texttt{nop}\\), the program | ||||
| * If the instruction is \(\texttt{nop}\), the program | ||||
| coutner will again be incremented (`pc' = pc + 1`), | ||||
| but the accumulator will stay the same, so `acc' = acc`. | ||||
| * If the instruction is \\(\\texttt{jmp}\\), things are | ||||
| * If the instruction is \(\texttt{jmp}\), things are | ||||
| more complicated. We must rely on the assumption | ||||
| that our input is valid, which tells us that adding | ||||
| `t0` to our `pc` will result in `Some f`, and not `None`. | ||||
| @ -737,7 +737,7 @@ counter that is not included in the valid set. | ||||
| to a "next" state. | ||||
| 
 | ||||
| Alternatively, we could say that one of the inference rules | ||||
| for \\((\\Rightarrow_p)\\) must apply. This is not the case if the input | ||||
| for \((\Rightarrow_p)\) must apply. This is not the case if the input | ||||
| is not valid, since, as I said | ||||
| before, an arbitrary input program can lead us to jump | ||||
| to a negative address (or to an address _way_ past the end of the program). | ||||
|  | ||||
| @ -55,31 +55,31 @@ patterns that a string has to match. We define regular expressions | ||||
| as follows: | ||||
| 
 | ||||
| * Any character is a regular expression that matches that character. Thus, | ||||
| \\(a\\) is a regular expression (from now shortened to regex) that matches | ||||
| \(a\) is a regular expression (from now shortened to regex) that matches | ||||
| the character 'a', and nothing else. | ||||
| * \\(r_1r_2\\), or the concatenation of \\(r_1\\) and \\(r_2\\), is | ||||
| a regular expression that matches anything matched by \\(r_1\\), followed | ||||
| by anything that matches \\(r_2\\). For instance, \\(ab\\), matches | ||||
| * \(r_1r_2\), or the concatenation of \(r_1\) and \(r_2\), is | ||||
| a regular expression that matches anything matched by \(r_1\), followed | ||||
| by anything that matches \(r_2\). For instance, \(ab\), matches | ||||
| the character 'a' followed by the character 'b' (thus matching "ab"). | ||||
| * \\(r_1|r_2\\) matches anything that is either matched by \\(r_1\\) or | ||||
| \\(r_2\\). Thus, \\(a|b\\) matches the character 'a' or the character 'b'. | ||||
| * \\(r_1?\\) matches either an empty string, or anything matched by \\(r_1\\). | ||||
| * \\(r_1+\\) matches one or more things matched by \\(r_1\\). So, | ||||
| \\(a+\\) matches "a", "aa", "aaa", and so on. | ||||
| * \\((r_1)\\) matches anything that matches \\(r_1\\). This is mostly used | ||||
| * \(r_1|r_2\) matches anything that is either matched by \(r_1\) or | ||||
| \(r_2\). Thus, \(a|b\) matches the character 'a' or the character 'b'. | ||||
| * \(r_1?\) matches either an empty string, or anything matched by \(r_1\). | ||||
| * \(r_1+\) matches one or more things matched by \(r_1\). So, | ||||
| \(a+\) matches "a", "aa", "aaa", and so on. | ||||
| * \((r_1)\) matches anything that matches \(r_1\). This is mostly used | ||||
| to group things together in more complicated expressions. | ||||
| * \\(.\\) matches any character. | ||||
| * \(.\) matches any character. | ||||
| 
 | ||||
| More powerful variations of regex also include an "any of" operator, \\([c_1c_2c_3]\\), | ||||
| which is equivalent to \\(c_1|c_2|c_3\\), and a "range" operator, \\([c_1-c_n]\\), which | ||||
| matches all characters in the range between \\(c_1\\) and \\(c_n\\), inclusive. | ||||
| More powerful variations of regex also include an "any of" operator, \([c_1c_2c_3]\), | ||||
| which is equivalent to \(c_1|c_2|c_3\), and a "range" operator, \([c_1-c_n]\), which | ||||
| matches all characters in the range between \(c_1\) and \(c_n\), inclusive. | ||||
| 
 | ||||
| Let's see some examples. An integer, such as 326, can be represented with \\([0-9]+\\). | ||||
| Let's see some examples. An integer, such as 326, can be represented with \([0-9]+\). | ||||
| This means, one or more characters between 0 or 9. Some (most) regex implementations | ||||
| have a special symbol for \\([0-9]\\), written as \\(\\setminus d\\). A variable, | ||||
| have a special symbol for \([0-9]\), written as \(\setminus d\). A variable, | ||||
| starting with a lowercase letter and containing lowercase or uppercase letters after it, | ||||
| can be written as \\(\[a-z\]([a-zA-Z]+)?\\). Again, most regex implementations provide | ||||
| a special operator for \\((r_1+)?\\), written as \\(r_1*\\). | ||||
| can be written as \([a-z]([a-zA-Z]+)?\). Again, most regex implementations provide | ||||
| a special operator for \((r_1+)?\), written as \(r_1*\). | ||||
| 
 | ||||
| So how does one go about checking if a regular expression matches a string? An efficient way is to | ||||
| first construct a [state machine](https://en.wikipedia.org/wiki/Finite-state_machine). A type of state machine can be constructed from a regular expression | ||||
|  | ||||
| @ -143,7 +143,7 @@ def prog(xs): | ||||
|     return (state, total+left+right) | ||||
| ``` | ||||
| 
 | ||||
| Honestly, that's pretty clean. As clean as `left.reverse()` to allow for \\(O(1)\\) pop is. | ||||
| Honestly, that's pretty clean. As clean as `left.reverse()` to allow for \(O(1)\) pop is. | ||||
| What's really clean, however, is the implementation of mergesort in our language. | ||||
| It goes as follows: | ||||
| 
 | ||||
|  | ||||
| @ -198,7 +198,7 @@ but what we're looking for is something similar. Instead, we simply require a | ||||
| similar function for our signs. We call this function "[least upper bound](https://en.wikipedia.org/wiki/Least-upper-bound_property)", | ||||
| since it is the "least (most specific) | ||||
| element that's greater (less specific) than either `s1` or `s2`". Conventionally, | ||||
| this function is written as \\(a \\sqcup b\\) (or in our case, \\(s_1 \\sqcup s_2\\)). | ||||
| this function is written as \(a \sqcup b\) (or in our case, \(s_1 \sqcup s_2\)). | ||||
| We can define it for our signs so far using the following [Cayley table](https://en.wikipedia.org/wiki/Cayley_table). | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -212,7 +212,7 @@ We can define it for our signs so far using the following [Cayley table](https:/ | ||||
| \end{array} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| By using the above table, we can see that \\((+\ \\sqcup\ -)\ =\ ?\\) (aka `unknown`). | ||||
| By using the above table, we can see that \((+\ \sqcup\ -)\ =\ ?\) (aka `unknown`). | ||||
| This is correct; given the four signs we're working with, that's the most we can say. | ||||
| Let's explore the analogy to the `max` function a little bit more, by observing | ||||
| that this function has certain properties: | ||||
| @ -220,18 +220,18 @@ that this function has certain properties: | ||||
| * `max(a, a) = a`. The maximum of one number is just that number. | ||||
|   Mathematically, this property is called _idempotence_. Note that | ||||
|   by inspecting the diagonal of the above table, we can confirm that our | ||||
|   \\((\\sqcup)\\) function is idempotent. | ||||
|   \((\sqcup)\) function is idempotent. | ||||
| * `max(a, b) = max(b, a)`. If you're taking the maximum of two numbers, | ||||
|   it doesn't matter which one you consider first. This property is called | ||||
|   _commutativity_. Note that if you mirror the table along the diagonal, | ||||
|   it doesn't change; this shows that our \\((\\sqcup)\\) function is | ||||
|   it doesn't change; this shows that our \((\sqcup)\) function is | ||||
|   commutative. | ||||
| * `max(a, max(b, c)) = max(max(a, b), c)`. When you have three numbers, | ||||
|   and you're determining the maximum value, it doesn't matter which pair of | ||||
|   numbers you compare first. This property is called _associativity_. You | ||||
|   can use the table above to verify the \\((\\sqcup)\\) is associative, too. | ||||
|   can use the table above to verify the \((\sqcup)\) is associative, too. | ||||
| 
 | ||||
| A set that has a binary operation (like `max` or \\((\\sqcup)\\)) that | ||||
| A set that has a binary operation (like `max` or \((\sqcup)\)) that | ||||
| satisfies the above properties is called a [semilattice](https://en.wikipedia.org/wiki/Semilattice). In Agda, we can write this definition roughly as follows: | ||||
| 
 | ||||
| ```Agda | ||||
| @ -260,11 +260,24 @@ and the "least upper bound" function can be constructed from one another. | ||||
| As it turns out, the `min` function has very similar properties to `max`: | ||||
| it's idempotent, commutative, and associative. For a partial order like | ||||
| ours, the analog to `min` is "greatest lower bound", or "the largest value | ||||
| that's smaller than both inputs". Such a function is denoted as \\(a\\sqcap b\\). | ||||
| As for what it means, where \\(s_1 \\sqcup s_2\\) means "combine two signs where | ||||
| that's smaller than both inputs". Such a function is denoted as \(a\sqcap b\). | ||||
| As for what it means, where \(s_1 \sqcup s_2\) means "combine two signs where | ||||
| you don't know which one will be used" (like in an `if`/`else`),  | ||||
| \\(s_1 \\sqcap s_2\\) means "combine two signs where you know both of | ||||
| them to be true". For example, \\((+\ \\sqcap\ ?)\ =\ +\\), because a variable | ||||
| \(s_1 \sqcap s_2\) means "combine two signs where you know | ||||
| {{< sidenote "right" "or-join-note" "both of them to be true" -7 >}} | ||||
| If you're familiar with <a href="https://en.wikipedia.org/wiki/Boolean_algebra"> | ||||
| Boolean algebra</a>, this might look a little bit familiar to you. In fact, | ||||
| the symbol for "and" on booleans is \(\land\). Similarly, the symbol | ||||
| for "or" is \(\lor\). So, \(s_1 \sqcup s_2\) means "the sign is \(s_1\) or \(s_2\)", | ||||
| or "(the sign is \(s_1\)) \(\lor\) (the sign is \(s_2\))". Similarly, | ||||
| \(s_1 \sqcap s_2\) means "(the sign is \(s_1\)) \(\land\) (the sign is \(s_2\))". | ||||
| Don't these symbols look similar?<br> | ||||
| <br> | ||||
| In fact, booleans with \((\lor)\) and \((\land)\) satisfy the semilattice | ||||
| laws we've been discussing, and together form a lattice (to which I'm building | ||||
| to in the main body of the text). The same is true for the set union and | ||||
| intersection operations, \((\cup)\) and \((\cap)\). | ||||
| {{< /sidenote >}}". For example, \((+\ \sqcap\ ?)\ =\ +\), because a variable | ||||
| that's both "any sign" and "positive" must be positive. | ||||
| 
 | ||||
| There's just one hiccup: what's the greatest lower bound of `+` and `-`? | ||||
| @ -276,8 +289,8 @@ out that this "impossible" value is the least element of our set (we added | ||||
| it to be the lower bound of `+` and co., which in turn are less than `unknown`). | ||||
| Similarly, `unknown` is the largest element of our set, since it's greater | ||||
| than `+` and co, and transitively greater than `impossible`. In mathematics, | ||||
| it's not uncommon to define the least element as \\(\\bot\\) (read "bottom"), and the | ||||
| greatest element as \\(\\top\\) (read "top"). With that in mind, the | ||||
| it's not uncommon to define the least element as \(\bot\) (read "bottom"), and the | ||||
| greatest element as \(\top\) (read "top"). With that in mind, the | ||||
| following are the updated Cayley tables for our operations. | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -325,7 +338,7 @@ In Agda, we can therefore write a lattice as follows: | ||||
| ### Concrete Example: Natural Numbers | ||||
| 
 | ||||
| Since we've been talking about `min` and `max` as motivators for properties | ||||
| of \\((\\sqcap)\\) and \\((\\sqcup)\\), it might not be all that surprising | ||||
| of \((\sqcap)\) and \((\sqcup)\), it might not be all that surprising | ||||
| that natural numbers form a lattice with `min` and `max` as the two binary | ||||
| operations. In fact, the Agda standard library writes `min` as `_⊓_` and | ||||
| `max` as `_⊔_`! We can make use of the already-proven properties of these | ||||
|  | ||||
| @ -89,7 +89,7 @@ fixed set of types, and we are required to stick to names in that set. We have n | ||||
| duty when using mathematical notation. The main goal of a mathematical definition | ||||
| is not to run the code, or check if it's correct; it's to communicate something | ||||
| to others. As long as others understand what you mean, you can do whatever you want. | ||||
| I _chose_ to use the word \\(\\text{number}\\) to represent the type | ||||
| I _chose_ to use the word \(\text{number}\) to represent the type | ||||
| of numbers, mainly because it's _very_ clear what that means. A theorist writing | ||||
| a paper might cringe at the verbosity of such a convention. My goal, however, is | ||||
| to communicate things to _you_, dear reader, and I think it's best to settle for | ||||
| @ -120,8 +120,8 @@ TNumber @ type(lit(?n), number) <- num(?n); | ||||
| 
 | ||||
| {{< bergamot_exercise label="bergamot; tweaking notation" preset="notation-preset" id="exercise-1" >}} | ||||
| Bergamot, the interactive tool I've developed for doing exercises, supports | ||||
| customizing the notation for rules. Try changing the \\(:\\) symbol to | ||||
| the \\(\\sim\\) symbol (denoted in latex as `\sim`). | ||||
| customizing the notation for rules. Try changing the \(:\) symbol to | ||||
| the \(\sim\) symbol (denoted in latex as `\sim`). | ||||
| 
 | ||||
| To change the way that rules are rendered, click the "Presentation Rules" | ||||
| tab in the "Rules" section. There will be a lot there: I've added rules for | ||||
| @ -129,7 +129,7 @@ pretty-printing a fair amount of the standard programming languages notation. | ||||
| Scroll down to `LatexTypeBin`, and change `:` to | ||||
| `\\sim` on that line (the extra backslash is to handle string | ||||
| escaping). Now try typing numbers into the input box; you should see | ||||
| something like \\(1 \\sim \text{number} \\) | ||||
| something like \(1 \sim \text{number} \) | ||||
| {{< /bergamot_exercise >}} | ||||
| 
 | ||||
| One more thing. So far, we've only written down one claim: the value 1 is a number. | ||||
| @ -148,8 +148,8 @@ This is exactly what is done in PL. We'd write the following. | ||||
| n:\text{number} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| What's this \\(n\\)? First, recall that notation is up to us. I'm choosing to use the letter | ||||
| \\(n\\) to stand for "any value that is a number". We write a symbol, say what we want it to mean, | ||||
| What's this \(n\)? First, recall that notation is up to us. I'm choosing to use the letter | ||||
| \(n\) to stand for "any value that is a number". We write a symbol, say what we want it to mean, | ||||
| and we're done.  | ||||
| 
 | ||||
| {{< dialog >}} | ||||
| @ -172,21 +172,21 @@ by \(n\)) the type \(\text{number}\). | ||||
| {{< /dialog >}} | ||||
| 
 | ||||
| Actually, to be extra precise, we might want to be explicit about our claim | ||||
| that \\(n\\) is a number, rather than resorting to notational conventions. | ||||
| that \(n\) is a number, rather than resorting to notational conventions. | ||||
| To do so, we'd need to write something like the following: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \cfrac{n \in \texttt{Num}}{n : \text{number}} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| Where \\(\\texttt{Num}\\) denotes the set of numbers in our syntax (`1`, `3.14`, etc.) | ||||
| Where \(\texttt{Num}\) denotes the set of numbers in our syntax (`1`, `3.14`, etc.) | ||||
| The stuff about the line is called a premise, and it's a simply a condition | ||||
| required for the rule to hold. The rule then says that \\(n\\) has type number -- | ||||
| but only if \\(n\\) is a numeric symbol in our language. We'll talk about premises | ||||
| required for the rule to hold. The rule then says that \(n\) has type number -- | ||||
| but only if \(n\) is a numeric symbol in our language. We'll talk about premises | ||||
| in more detail later on. | ||||
| 
 | ||||
| Having introduced this variable-like thing \\(n\\), we need to be careful. | ||||
| It's important to note that the letter \\(n\\) is | ||||
| Having introduced this variable-like thing \(n\), we need to be careful. | ||||
| It's important to note that the letter \(n\) is | ||||
| not a variable like `x` in our code snippets above. In fact, it's not at all part of the programming | ||||
| language we're discussing. Rather, it's kind of like a variable in our _rules_. | ||||
| 
 | ||||
| @ -194,26 +194,26 @@ This distinction comes up a lot. The thing is, the notation we're building up to | ||||
| kind of language. It's not meant for a computer to execute, mind you, but that's not a requirement | ||||
| for something to be language (ever heard of English?). The bottom line is, we have symbols with | ||||
| particular meanings, and there are rules to how they have to be written. The statement "1 is a number" | ||||
| must be written by first writing 1, then a colon, then \\(\text{number}\\). It's a language. | ||||
| must be written by first writing 1, then a colon, then \(\text{number}\). It's a language. | ||||
| 
 | ||||
| Really, then, we have two languages to think about: | ||||
| * The _object language_ is the programming language we're trying to describe and mathematically | ||||
|   formalize. This is the language that has variables like `x`, keywords like `let` and `const`, and so on. | ||||
| 
 | ||||
|   Some examples of our object language that we've seen so far are `1` and `2+3`. | ||||
|   In our mathematical notation, they look like \\(1\\) and \\(2+3\\). | ||||
|   In our mathematical notation, they look like \(1\) and \(2+3\). | ||||
| 
 | ||||
| * The _meta language_ is the notation we use to talk about our object language. It consists of | ||||
|   the various symbols we define, and is really just a system for communicating various things | ||||
|   (like type rules) to others. | ||||
| 
 | ||||
|   Expressions like \\(n \\in \\texttt{Num}\\) and \\(1 : \\text{number}\\) | ||||
|   Expressions like \(n \in \texttt{Num}\) and \(1 : \text{number}\) | ||||
|   are examples of our meta language. | ||||
| 
 | ||||
| Using this terminology, \\(n\\) is a variable in our meta language; this is commonly called | ||||
| a _metavariable_. A rule such as \\(n:\\text{number}\\) that contains metavariables isn't | ||||
| Using this terminology, \(n\) is a variable in our meta language; this is commonly called | ||||
| a _metavariable_. A rule such as \(n:\text{number}\) that contains metavariables isn't | ||||
| really a rule by itself; rather, it stands for a whole bunch of rules, one for each possible | ||||
| number that \\(n\\) can be. We call this a _rule schema_. | ||||
| number that \(n\) can be. We call this a _rule schema_. | ||||
| 
 | ||||
| Alright, that's enough theory for now. Let's go back to the real world. Working with | ||||
| plain old values like `1` gets boring quickly. There's not many programs you can write | ||||
| @ -273,9 +273,9 @@ const x: number = 1.1 + 1; // just fine! | ||||
| 
 | ||||
| That concludes the second round of real-world examples. Let's take a look at formalizing | ||||
| all of this mathematically. As a starting point, we can look at a rule that matches the TypeScript | ||||
| view of having only a single number type, \\(\\text{number}\\). This rule needs a little | ||||
| view of having only a single number type, \(\text{number}\). This rule needs a little | ||||
| bit "more" than the ones we've seen so far; we can't just blindly give things in the | ||||
| form \\(a+b\\) the type \\(\\text{number}\\) (what if we're adding strings?). For our | ||||
| form \(a+b\) the type \(\text{number}\) (what if we're adding strings?). For our | ||||
| rule to behave in the way we have in mind, it's necessary for us to add _premises_. | ||||
| Before I explain any further, let me show you the rule. | ||||
| 
 | ||||
| @ -283,7 +283,7 @@ Before I explain any further, let me show you the rule. | ||||
| \frac{e_1:\text{number}\quad e_2:\text{number}}{e_1+e_2:\text{number}} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| In the above (and elsewhere) we will use the metavariable \\(e\\) as a stand-in for | ||||
| In the above (and elsewhere) we will use the metavariable \(e\) as a stand-in for | ||||
| any _expression_ in our source language. In general, expressions are things such as `1`, | ||||
| `x`, `1.0+someFunction(y)`, and so on. In other words, they're things we can evaluate | ||||
| to a value. For the purposes of this article, though, we're only looking at basic | ||||
| @ -293,12 +293,12 @@ For the moment, we will avoid rules for checking _statements_ (like `let x = 5;` | ||||
| 
 | ||||
| Rules like the above consist of premises (above the line) and conclusions (below the line). | ||||
| The conclusion is the claim / fact that we can determine from the rule. In this specific case, | ||||
| the conclusion is that \\(e_1+e_2\\) has type \\(\\text{number}\\).  | ||||
| the conclusion is that \(e_1+e_2\) has type \(\text{number}\).  | ||||
| For this to be true, however, some conditions must be met; specifically, the sub-expressions | ||||
| \\(e_1\\) and \\(e_2\\) must themselves be of type \\(\\text{number}\\). These are the premises. | ||||
| \(e_1\) and \(e_2\) must themselves be of type \(\text{number}\). These are the premises. | ||||
| Reading in plain English, we could pronounce this rule as: | ||||
| 
 | ||||
| > If \\(e_1\\) and \\(e_2\\) have type \\(\\text{number}\\), then \\(e_1+e_2\\) has type \\(\\text{number}\\). | ||||
| > If \(e_1\) and \(e_2\) have type \(\text{number}\), then \(e_1+e_2\) has type \(\text{number}\). | ||||
| 
 | ||||
| Notice that we don't care what the left and right operands are (we say they can be any expression). | ||||
| We need not concern ourselves with how to compute _their_ type in this specific rule. Thus, the rule | ||||
| @ -315,7 +315,7 @@ Just to get some more practice, let's take a look at a rule for adding strings. | ||||
| 
 | ||||
| This rule is read as follows: | ||||
| 
 | ||||
| > If \\(e_1\\) and \\(e_2\\) have type \\(\\text{string}\\), then \\(e_1+e_2\\) has type \\(\\text{string}\\). | ||||
| > If \(e_1\) and \(e_2\) have type \(\text{string}\), then \(e_1+e_2\) has type \(\text{string}\). | ||||
| 
 | ||||
| {{< bergamot_preset name="string-preset" prompt="PromptConverter @ prompt(type(?term, ?t)) <- input(?term);" query="\"hello\"+\"world\"">}} | ||||
| TNumber @ type(lit(?n), number) <- num(?n); | ||||
| @ -336,15 +336,15 @@ Thus, the rule | ||||
| TNumber @ type(lit(?n), number) <- num(?n); | ||||
| ``` | ||||
| 
 | ||||
| Has one premise, that the term \\(n\\) is a number, and the conclusion is that | ||||
| a number literal has type \\(\\text{number}\\). The `num` condition | ||||
| is a Bergamot builtin, corresponding to our earlier notation of \\(n \\in \\texttt{Num}\\). | ||||
| Has one premise, that the term \(n\) is a number, and the conclusion is that | ||||
| a number literal has type \(\text{number}\). The `num` condition | ||||
| is a Bergamot builtin, corresponding to our earlier notation of \(n \in \texttt{Num}\). | ||||
| It holds for all numbers: it's always true that `num(1)`, `num(2)`, | ||||
| etc. The equivalent builtin for something being a string is `str`. | ||||
| 
 | ||||
| To edit the rules in Bergamot, click the "Editor" button in the "Rules" | ||||
| section. You will need to add two rules, just like we did for numbers: | ||||
| a rule for string literals (something like \\(\\texttt{\"Hello\"} : \\text{string}\\), | ||||
| a rule for string literals (something like \(\texttt{"Hello"} : \text{string}\), | ||||
| but more general) and for adding two strings together. I suggest naming | ||||
| these two rules `TString` and `TPlusS` respectively. | ||||
| 
 | ||||
| @ -353,8 +353,8 @@ expressions such as `"Hello"` and `"Hello" + "World"`. | ||||
| {{< /bergamot_exercise >}} | ||||
| 
 | ||||
| These rules generally work in other languages. Things get more complicated in languages like Java and Rust, | ||||
| where types for numbers are more precise (\\(\\text{int}\\) and \\(\\text{float}\\) instead of | ||||
| \\(\\text{number}\\)). In these languages, we need rules for both. | ||||
| where types for numbers are more precise (\(\text{int}\) and \(\text{float}\) instead of | ||||
| \(\text{number}\)). In these languages, we need rules for both. | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \frac{e_1:\text{int}\quad e_2:\text{int}}{e_1+e_2:\text{int}} | ||||
| @ -412,12 +412,12 @@ allow users to (for example) write numbers where the language expects strings, w | ||||
| understanding that the number will be automatically turned into a string. | ||||
| 
 | ||||
| To avoid having an explosion of various rules, we instead define the "converts to" | ||||
| relation, \\(\\tau_1 \\preceq \\tau_2\\), where \\(\\tau_1\\) and \\(\\tau_2\\) | ||||
| relation, \(\tau_1 \preceq \tau_2\), where \(\tau_1\) and \(\tau_2\) | ||||
| are types. To say that an integer can be automatically converted to a floating | ||||
| pointer number, we can write \\(\\text{integer} \\preceq \\text{float}\\). | ||||
| pointer number, we can write \(\text{integer} \preceq \text{float}\). | ||||
| Then, we add only a single additional rule to our language: `TConverts`. | ||||
| This rule says that we can treat an expression of type \\(\\tau_1\\) as | ||||
| an expression of type \\(\\tau_2\\), if the former can be converted to the | ||||
| This rule says that we can treat an expression of type \(\tau_1\) as | ||||
| an expression of type \(\tau_2\), if the former can be converted to the | ||||
| latter. | ||||
| 
 | ||||
| I have written some rules using these concepts. Input some expressions into | ||||
| @ -439,7 +439,7 @@ post. For the moment, we shall content ourselves with the tedious approach. | ||||
| Another thing to note is that we haven't yet seen rules for what programs are _incorrect_, | ||||
| and we never will. When formalizing type systems we rarely (if ever) explicitly enumerate | ||||
| cases that produce errors. Rather, we interpret the absence of matching rules to indicate | ||||
| that something is wrong. Since no rule has premises that match \\(e_1:\\text{float}\\) and \\(e_2:\\text{string}\\), | ||||
| that something is wrong. Since no rule has premises that match \(e_1:\text{float}\) and \(e_2:\text{string}\), | ||||
| we can infer that | ||||
|  {{< sidenote "right" "float-string-note" "given the rules so far," >}} | ||||
| I'm trying to be careful here, since adding a float to a string | ||||
| @ -494,16 +494,16 @@ Here's a quick summary of what we've covered: | ||||
|    or TypeScript. The _meta language_ is the language that we use | ||||
|    to reason and talk about the object language. Typically, this is | ||||
|    the language we use for writing down our rules. | ||||
| 3. The common type-theoretic notation for "expression \\(x\\) | ||||
|    has type \\(\\tau\\)" is \\(x : \\tau\\). | ||||
| 3. The common type-theoretic notation for "expression \(x\) | ||||
|    has type \(\tau\)" is \(x : \tau\). | ||||
| 4. In writing more complicated rules, we will frequently make use | ||||
|    of the inference rule notation, which looks something like | ||||
|    the following. | ||||
|    {{< latex >}} | ||||
|    \frac{P_1 \quad P_2 \quad ... \quad P_n}{P} | ||||
|    {{< /latex >}} | ||||
|    The above is read as "if \\(P_1\\) through \\(P_n\\) are | ||||
|    true, then \\(P\\) is also true." | ||||
|    The above is read as "if \(P_1\) through \(P_n\) are | ||||
|    true, then \(P\) is also true." | ||||
| 5. To support operators like `+` that can work on, say, both numbers | ||||
|    and strings, we provide inference rules for each such case. If this | ||||
|    gets cumbersome, we can introduce a system of _subtypes_ into our | ||||
| @ -539,9 +539,9 @@ and already be up-to-speed on a big chunk of the content. | ||||
| #### Metavariables | ||||
| | Symbol  | Meaning      | Syntactic Category    | | ||||
| |---------|--------------|-----------------------| | ||||
| | \\(n\\) |  Numbers     | \\(\\texttt{Num}\\)   | | ||||
| | \\(s\\) |  Strings     | \\(\\texttt{Str}\\)   | | ||||
| | \\(e\\) |  Expressions | \\(\\texttt{Expr}\\)  | | ||||
| | \(n\) |  Numbers     | \(\texttt{Num}\)   | | ||||
| | \(s\) |  Strings     | \(\texttt{Str}\)   | | ||||
| | \(e\) |  Expressions | \(\texttt{Expr}\)  | | ||||
| 
 | ||||
| #### Grammar | ||||
| {{< block >}} | ||||
| @ -558,8 +558,8 @@ and already be up-to-speed on a big chunk of the content. | ||||
| {{< foldtable >}} | ||||
| | Rule         | Description | | ||||
| |--------------|-------------| | ||||
| | {{< latex >}}\frac{n \in \texttt{Num}}{n : \text{number}} {{< /latex >}}| Number literals have type \\(\\text{number}\\) | | ||||
| | {{< latex >}}\frac{s \in \texttt{Str}}{s : \text{string}} {{< /latex >}}| String literals have type \\(\\text{string}\\) | | ||||
| | {{< latex >}}\frac{n \in \texttt{Num}}{n : \text{number}} {{< /latex >}}| Number literals have type \(\text{number}\) | | ||||
| | {{< latex >}}\frac{s \in \texttt{Str}}{s : \text{string}} {{< /latex >}}| String literals have type \(\text{string}\) | | ||||
| | {{< latex >}}\frac{e_1 : \text{string}\quad e_2 : \text{string}}{e_1+e_2 : \text{string}} {{< /latex >}}| Adding strings gives a string | | ||||
| | {{< latex >}}\frac{e_1 : \text{number}\quad e_2 : \text{number}}{e_1+e_2 : \text{number}} {{< /latex >}}| Adding numbers gives a number | | ||||
| 
 | ||||
|  | ||||
| @ -13,9 +13,9 @@ recognizing tokens. For instance, consider a simple language of a matching | ||||
| number of open and closed parentheses, like `()` and `((()))`. You can't | ||||
| write a regular expression for it! We resort to a wider class of languages, called | ||||
| __context free languages__. These languages are ones that are matched by __context free grammars__. | ||||
| A context free grammar is a list of rules in the form of \\(S \\rightarrow \\alpha\\), where | ||||
| \\(S\\) is a __nonterminal__ (conceptualy, a thing that expands into other things), and | ||||
| \\(\\alpha\\) is a sequence of nonterminals and terminals (a terminal is a thing that doesn't | ||||
| A context free grammar is a list of rules in the form of \(S \rightarrow \alpha\), where | ||||
| \(S\) is a __nonterminal__ (conceptualy, a thing that expands into other things), and | ||||
| \(\alpha\) is a sequence of nonterminals and terminals (a terminal is a thing that doesn't | ||||
| expand into other things; for us, this is a token). | ||||
| 
 | ||||
| Let's write a context free grammar (CFG from now on) to match our parenthesis language: | ||||
| @ -27,9 +27,9 @@ S & \rightarrow () | ||||
| \end{aligned} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| So, how does this work? We start with a "start symbol" nonterminal, which we usually denote as \\(S\\). Then, to get a desired string, | ||||
| So, how does this work? We start with a "start symbol" nonterminal, which we usually denote as \(S\). Then, to get a desired string, | ||||
| we replace a nonterminal with the sequence of terminals and nonterminals on the right of one of its rules. For instance, to get `()`, | ||||
| we start with \\(S\\) and replace it with the body of the second one of its rules. This gives us `()` right away. To get `((()))`, we | ||||
| we start with \(S\) and replace it with the body of the second one of its rules. This gives us `()` right away. To get `((()))`, we | ||||
| have to do a little more work: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -38,8 +38,8 @@ S \rightarrow (S) \rightarrow ((S)) \rightarrow ((())) | ||||
| 
 | ||||
| In practice, there are many ways of using a CFG to parse a programming language. Various parsing algorithms support various subsets | ||||
| of context free languages. For instance, top down parsers follow nearly exactly the structure that we had. They try to parse | ||||
| a nonterminal by trying to match each symbol in its body. In the rule \\(S \\rightarrow \\alpha \\beta \\gamma\\), it will | ||||
| first try to match \\(\\alpha\\), then \\(\\beta\\), and so on. If one of the three contains a nonterminal, it will attempt to parse | ||||
| a nonterminal by trying to match each symbol in its body. In the rule \(S \rightarrow \alpha \beta \gamma\), it will | ||||
| first try to match \(\alpha\), then \(\beta\), and so on. If one of the three contains a nonterminal, it will attempt to parse | ||||
| that nonterminal following the same strategy. However, this leaves a flaw - For instance, consider the grammar | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -49,8 +49,8 @@ S & \rightarrow a | ||||
| \end{aligned} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| A top down parser will start with \\(S\\). It will then try the first rule, which starts with \\(S\\). So, dutifully, it will | ||||
| try to parse __that__ \\(S\\). And to do that, it will once again try the first rule, and find that it starts with another \\(S\\)... | ||||
| A top down parser will start with \(S\). It will then try the first rule, which starts with \(S\). So, dutifully, it will | ||||
| try to parse __that__ \(S\). And to do that, it will once again try the first rule, and find that it starts with another \(S\)... | ||||
| This will never end, and the parser will get stuck. A grammar in which a nonterminal can appear in the beginning of one of its rules | ||||
| __left recursive__, and top-down parsers aren't able to handle those grammars. | ||||
| 
 | ||||
| @ -68,8 +68,8 @@ We see nothing interesting on the left side of the dot, so we move (or __shift__ | ||||
| a.aa | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| Now, on the left side of the dot, we see something! In particular, we see the body of one of the rules for \\(S\\) (the second one). | ||||
| So we __reduce__ the thing on the left side of the dot, by replacing it with the left hand side of the rule (\\(S\\)): | ||||
| Now, on the left side of the dot, we see something! In particular, we see the body of one of the rules for \(S\) (the second one). | ||||
| So we __reduce__ the thing on the left side of the dot, by replacing it with the left hand side of the rule (\(S\)): | ||||
| 
 | ||||
| {{< latex >}} | ||||
| S.aa | ||||
| @ -94,7 +94,7 @@ start symbol, and nothing on the right of the dot, so we're done! | ||||
| In practice, we don't want to just match a grammar. That would be like saying "yup, this is our language". | ||||
| Instead, we want to create something called an __abstract syntax tree__, or AST for short. This tree | ||||
| captures the structure of our language, and is easier to work with than its textual representation. The structure | ||||
| of the tree we build will often mimic the structure of our grammar: a rule in the form \\(S \\rightarrow A B\\) | ||||
| of the tree we build will often mimic the structure of our grammar: a rule in the form \(S \rightarrow A B\) | ||||
| will result in a tree named "S", with two children corresponding the trees built for A and B. Since | ||||
| an AST captures the structure of the language, we'll be able to toss away some punctuation | ||||
| like `,` and `(`. These tokens will appear in our grammar, but we will tweak our parser to simply throw them away. Additionally, | ||||
| @ -109,8 +109,8 @@ For instance, for `3+2*6`, we want our tree to have `+` as the root, `3` as the | ||||
| Why? Because this tree represents "the addition of 3 and the result of multiplying 2 by 6". If we had `*` be the root, we'd have | ||||
| a tree representing "the multiplication of the result of adding 3 to 2 and 6", which is __not__ what our expression means. | ||||
| 
 | ||||
| So, with this in mind, we want our rule for __addition__ (represented with the nonterminal \\(A\_{add}\\), to be matched first, and | ||||
| for its children to be trees created by the multiplication rule, \\(A\_{mult}\\). So we write the following rules: | ||||
| So, with this in mind, we want our rule for __addition__ (represented with the nonterminal \(A_{add}\), to be matched first, and | ||||
| for its children to be trees created by the multiplication rule, \(A_{mult}\). So we write the following rules: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \begin{aligned} | ||||
| @ -120,7 +120,7 @@ A_{add} & \rightarrow A_{mult} | ||||
| \end{aligned} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| The first rule matches another addition, added to the result of a multiplication. Similarly, the second rule matches another addition, from which the result of a multiplication is then subtracted. We use the \\(A\_{add}\\) on the left side of \\(+\\) and \\(-\\) in the body | ||||
| The first rule matches another addition, added to the result of a multiplication. Similarly, the second rule matches another addition, from which the result of a multiplication is then subtracted. We use the \(A_{add}\) on the left side of \(+\) and \(-\) in the body | ||||
| because we want to be able to parse strings like `1+2+3+4`, which we want to view as `((1+2)+3)+4` (mostly because | ||||
| subtraction is [left-associative](https://en.wikipedia.org/wiki/Operator_associativity)). So, we want the top level | ||||
| of the tree to be the rightmost `+` or `-`, since that means it will be the "last" operation. You may be asking, | ||||
| @ -139,7 +139,7 @@ A_{mult} & \rightarrow P | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| P, in this case, is an application (remember, application has higher precedence than any binary operator). | ||||
| Once again, if there's no `*` or `\`, we simply fall through to a \\(P\\) nonterminal, representing application. | ||||
| Once again, if there's no `*` or `\`, we simply fall through to a \(P\) nonterminal, representing application. | ||||
| 
 | ||||
| Application is refreshingly simple: | ||||
| 
 | ||||
| @ -150,7 +150,7 @@ P & \rightarrow B | ||||
| \end{aligned} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| An application is either only one "thing" (represented with \\(B\\), for base), such as a number or an identifier, | ||||
| An application is either only one "thing" (represented with \(B\), for base), such as a number or an identifier, | ||||
| or another application followed by a thing. | ||||
| 
 | ||||
| We now need to define what a "thing" is. As we said before, it's a number, or an identifier. We also make a parenthesized | ||||
| @ -166,7 +166,7 @@ B & \rightarrow C | ||||
| \end{aligned} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| What's the last \\(C\\)? We also want a "thing" to be a case expression. Here are the rules for that: | ||||
| What's the last \(C\)? We also want a "thing" to be a case expression. Here are the rules for that: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \begin{aligned} | ||||
| @ -181,15 +181,15 @@ L_L & \rightarrow \epsilon | ||||
| \end{aligned} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| \\(L\_B\\) is the list of branches in our case expression. \\(R\\) is a single branch, which is in the | ||||
| form `Pattern -> Expression`. \\(N\\) is a pattern, which we will for now define to be either a variable name | ||||
| (\\(\\text{lowerVar}\\)), or a constructor with some arguments. The arguments of a constructor will be | ||||
| lowercase names, and a list of those arguments will be represented with \\(L\_L\\). One of the bodies | ||||
| of this nonterminal is just the character \\(\\epsilon\\), which just means "nothing". | ||||
| \(L_B\) is the list of branches in our case expression. \(R\) is a single branch, which is in the | ||||
| form `Pattern -> Expression`. \(N\) is a pattern, which we will for now define to be either a variable name | ||||
| (\(\text{lowerVar}\)), or a constructor with some arguments. The arguments of a constructor will be | ||||
| lowercase names, and a list of those arguments will be represented with \(L_L\). One of the bodies | ||||
| of this nonterminal is just the character \(\epsilon\), which just means "nothing". | ||||
| We use this because a constructor can have no arguments (like Nil). | ||||
| 
 | ||||
| We can use these grammar rules to represent any expression we want. For instance, let's try `3+(multiply 2 6)`, | ||||
| where multiply is a function that, well, multiplies. We start with \\(A_{add}\\): | ||||
| where multiply is a function that, well, multiplies. We start with \(A_{add}\): | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \begin{aligned} | ||||
| @ -227,15 +227,15 @@ L_U & \rightarrow \epsilon | ||||
| \end{aligned} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| That's a lot of rules! \\(T\\) is the "top-level declaration rule. It matches either | ||||
| That's a lot of rules! \(T\) is the "top-level declaration rule. It matches either | ||||
| a function or a data definition. A function definition consists of the keyword "defn", | ||||
| followed by a function name (starting with a lowercase letter), followed by a list of | ||||
| parameters, represented by \\(L\_P\\). | ||||
| parameters, represented by \(L_P\). | ||||
| 
 | ||||
| A data type definition consists of the name of the data type (starting with an uppercase letter), | ||||
| and a list \\(L\_D\\) of data constructors \\(D\\). There must be at least one data constructor in this list, | ||||
| and a list \(L_D\) of data constructors \(D\). There must be at least one data constructor in this list, | ||||
| so we don't use the empty string rule here. A data constructor is simply an uppercase variable representing | ||||
| a constructor of the data type, followed by a list \\(L\_U\\) of zero or more uppercase variables (representing | ||||
| a constructor of the data type, followed by a list \(L_U\) of zero or more uppercase variables (representing | ||||
| the types of the arguments of the constructor). | ||||
| 
 | ||||
| Finally, we want one or more of these declarations in a valid program: | ||||
| @ -266,7 +266,7 @@ Next, observe that there's | ||||
| a certain symmetry between our parser and our scanner. In our scanner, we mixed the theoretical idea of a regular expression | ||||
| with an __action__, a C++ code snippet to be executed when a regular expression is matched. This same idea is present | ||||
| in the parser, too. Each rule can produce a value, which we call a __semantic value__. For type safety, we allow | ||||
| each nonterminal and terminal to produce only one type of semantic value. For instance, all rules for \\(A_{add}\\) must | ||||
| each nonterminal and terminal to produce only one type of semantic value. For instance, all rules for \(A_{add}\) must | ||||
| produce an expression. We specify the type of each nonterminal and using `%type` directives. The types of terminals | ||||
| are specified when they're declared. | ||||
| 
 | ||||
|  | ||||
| @ -115,14 +115,14 @@ aspects of variables that we can gather from the preceding examples: | ||||
|    Rust, Crystal). | ||||
| 
 | ||||
| To get started with type rules for variables, let's introduce | ||||
| another metavariable, \\(x\\) (along with \\(n\\) from before). | ||||
| Whereas \\(n\\) ranges over any number in our language, \\(x\\) ranges | ||||
| another metavariable, \(x\) (along with \(n\) from before). | ||||
| Whereas \(n\) ranges over any number in our language, \(x\) ranges | ||||
| over any variable. It can be used as a stand-in for `x`, `y`, `myVar`, and so on. | ||||
| 
 | ||||
| Now, let's start by looking at versions of formal rules that are | ||||
| __incorrect__. The first property listed above prevents us from writing type rules like the | ||||
| following, since we cannot always assume that a variable has type | ||||
| \\(\\text{number}\\) or \\(\\text{string}\\) (it might have either, | ||||
| \(\text{number}\) or \(\text{string}\) (it might have either, | ||||
| depending on where in the code it's used!). | ||||
| {{< latex >}} | ||||
| x : \text{number} | ||||
| @ -145,8 +145,8 @@ With these constraints in mind, we have enough to start creating | ||||
| rules for expressions (but not statements yet; we'll get to that). | ||||
| The way to work around the four constraints is to add a third "thing" to our rules: | ||||
| the _environment_, typically denoted using the Greek uppercase gamma, | ||||
| \\(\\Gamma\\). Much like we avoided writing similar rules for every possible | ||||
| number by using \\(n\\) as a metavariable for _any_ number, we will use \\(\\Gamma\\) | ||||
| \(\Gamma\). Much like we avoided writing similar rules for every possible | ||||
| number by using \(n\) as a metavariable for _any_ number, we will use \(\Gamma\) | ||||
| as a metavariable to stand in for _any_ environment. What is an environment, | ||||
| though? It's basically a list of pairs associating | ||||
| variables with their types. For instance, if in some situation | ||||
| @ -203,8 +203,8 @@ e : \tau | ||||
| 
 | ||||
| This reads, | ||||
| 
 | ||||
| > The expression \\(e\\) [another metavariable, this one is used for | ||||
|   all expressions] has type \\(\\tau\\) [also a metavariable, for | ||||
| > The expression \(e\) [another metavariable, this one is used for | ||||
|   all expressions] has type \(\tau\) [also a metavariable, for | ||||
|   types]. | ||||
| 
 | ||||
| However, as we've seen, we can't make global claims like this when variables are | ||||
| @ -217,16 +217,16 @@ on the situation. Now, we instead write: | ||||
| 
 | ||||
| This version reads, | ||||
| 
 | ||||
| > In the environment \\(\\Gamma\\), the expression \\(e\\) has type \\(\\tau\\). | ||||
| > In the environment \(\Gamma\), the expression \(e\) has type \(\tau\). | ||||
| 
 | ||||
| And here's the difference. The new \\(\\Gamma\\) of ours captures this | ||||
| And here's the difference. The new \(\Gamma\) of ours captures this | ||||
| "depending on the situation" aspect of expressions with variables. It | ||||
| provides us with  | ||||
| {{< sidenote "right" "context-note" "much-needed context." >}} | ||||
| In fact, \(\Gamma\) is sometimes called the typing context. | ||||
| {{< /sidenote >}} This version makes it clear that \\(e\\) | ||||
| isn't _always_ of type \\(\\tau\\), but only in the specific situation | ||||
| described by \\(\\Gamma\\). Using our first two-`number` environment, | ||||
| {{< /sidenote >}} This version makes it clear that \(e\) | ||||
| isn't _always_ of type \(\tau\), but only in the specific situation | ||||
| described by \(\Gamma\). Using our first two-`number` environment, | ||||
| we can make the following (true) claim: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -245,7 +245,7 @@ also results in a string". | ||||
| 
 | ||||
| Okay, so now we've seen a couple of examples, but these examples are _not_ rules! | ||||
| They capture only specific situations (which we've "hard-coded" by specifying | ||||
| what \\(\\Gamma\\) is). Here's what a general rule __should not look like__: | ||||
| what \(\Gamma\) is). Here's what a general rule __should not look like__: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \{ x_1 : \text{string}, x_2 : \text{string} \} \vdash x_1+x_2 : \text{string} | ||||
| @ -268,7 +268,7 @@ This rule is bad, and it should feel bad. Here are two reasons: | ||||
| 
 | ||||
| 1. It only works for expressions like `x+y` or `a+b`, but not for | ||||
|    more complicated things like `(a+b)+(c+d)`. This is because | ||||
|    by using \\(x\_1\\) and \\(x\_2\\), the metavariables for | ||||
|    by using \(x_1\) and \(x_2\), the metavariables for | ||||
|    variables, it rules out additions that _don't_ add variables | ||||
|    (like the middle `+` in the example). | ||||
| 2. It doesn't play well with other rules; it can't be the _only_ | ||||
| @ -279,7 +279,7 @@ The trouble is that this rule is trying to do too much; it's trying | ||||
| to check the environment for variables, but it's _also_ trying to | ||||
| specify the results of adding two numbers. That's not how we | ||||
| did it last time! In fact, when it came to numbers, we had two | ||||
| rules. The first said that any number symbol had the \\(\\text{number}\\) | ||||
| rules. The first said that any number symbol had the \(\text{number}\) | ||||
| type. Previously, we wrote it as follows: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -287,8 +287,8 @@ n : \text{number} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| Another rule specified the type of addition, without caring how the | ||||
| sub-expressions \\(e\_1\\) and \\(e\_2\\) were given _their_ types. | ||||
| As long as they had type \\(\\text{number}\\), all was well. | ||||
| sub-expressions \(e_1\) and \(e_2\) were given _their_ types. | ||||
| As long as they had type \(\text{number}\), all was well. | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \frac{e_1 : \text{number} \quad e_2 : \text{number}}{e_1 + e_2 : \text{number}} | ||||
| @ -307,7 +307,7 @@ Before we get to that, though, we need to update the two rules we | ||||
| just saw above. These rules are good, and we should keep them. Now, though, environments | ||||
| are in play. Fortunately, the environment doesn't matter at all when it | ||||
| comes to figuring out what the type of a symbol like `1` is -- it's always | ||||
| a number! We can thus write the updated rule as follows. Leaving \\(\\Gamma\\) | ||||
| a number! We can thus write the updated rule as follows. Leaving \(\Gamma\) | ||||
| unspecified means it can | ||||
| stand for any environment. | ||||
| 
 | ||||
| @ -316,7 +316,7 @@ stand for any environment. | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| We can also translate the addition rule in a pretty straightforward | ||||
| manner, by tacking on \\(\\Gamma\\) for every typing claim. | ||||
| manner, by tacking on \(\Gamma\) for every typing claim. | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \frac{\Gamma \vdash e_1 : \text{number} \quad \Gamma \vdash e_2 : \text{number}}{\Gamma \vdash e_1 + e_2 : \text{number}} | ||||
| @ -326,16 +326,16 @@ So we have a rule for number symbols like `1` or `2`, and we have | ||||
| a rule for addition. All that's left is a rule for variables, like `x` | ||||
| and `y`. This rule needs to make sure that a variable is defined, | ||||
| and that it has a particular type. A variable is defined, and has a type, | ||||
| if a pair \\(x : \\tau\\) is present in the environment \\(\\Gamma\\). | ||||
| if a pair \(x : \tau\) is present in the environment \(\Gamma\). | ||||
| Thus, we can write the variable rule like this: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \frac{x : \tau \in \Gamma}{\Gamma \vdash x : \tau} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| Note that we're using the \\(\\tau\\) metavariable to range over any type; | ||||
| Note that we're using the \(\tau\) metavariable to range over any type; | ||||
| this means the rule applies to (object) variables declared to have type | ||||
| \\(\\text{number}\\), \\(\\text{string}\\), or anything else present in | ||||
| \(\text{number}\), \(\text{string}\), or anything else present in | ||||
| our system. A single rule takes care of figuring the types of _all_ | ||||
| variables. | ||||
| 
 | ||||
| @ -347,12 +347,12 @@ The rest of this, but mostly statements. | ||||
| #### Metavariables | ||||
| | Symbol  | Meaning      | | ||||
| |---------|--------------| | ||||
| | \\(n\\) |  Numbers     | | ||||
| | \\(s\\) |  Strings     | | ||||
| | \\(e\\) |  Expressions | | ||||
| | \\(x\\) |  Variables | | ||||
| | \\(\\tau\\) |  Types | | ||||
| | \\(\\Gamma\\) |  Typing environments | | ||||
| | \(n\) |  Numbers     | | ||||
| | \(s\) |  Strings     | | ||||
| | \(e\) |  Expressions | | ||||
| | \(x\) |  Variables | | ||||
| | \(\tau\) |  Types | | ||||
| | \(\Gamma\) |  Typing environments | | ||||
| 
 | ||||
| #### Grammar | ||||
| {{< block >}} | ||||
| @ -370,8 +370,8 @@ The rest of this, but mostly statements. | ||||
| {{< foldtable >}} | ||||
| | Rule         | Description | | ||||
| |--------------|-------------| | ||||
| | {{< latex >}}\Gamma \vdash n : \text{number} {{< /latex >}}| Number literals have type \\(\\text{number}\\) | | ||||
| | {{< latex >}}\Gamma \vdash s : \text{string} {{< /latex >}}| String literals have type \\(\\text{string}\\) | | ||||
| | {{< latex >}}\Gamma \vdash n : \text{number} {{< /latex >}}| Number literals have type \(\text{number}\) | | ||||
| | {{< latex >}}\Gamma \vdash s : \text{string} {{< /latex >}}| String literals have type \(\text{string}\) | | ||||
| | {{< latex >}}\frac{x:\tau \in \Gamma}{\Gamma\vdash x : \tau}{{< /latex >}}| Variables have whatever type is given to them by the environment | | ||||
| | {{< latex >}}\frac{\Gamma \vdash e_1 : \text{string}\quad \Gamma \vdash e_2 : \text{string}}{\Gamma \vdash e_1+e_2 : \text{string}} {{< /latex >}}| Adding strings gives a string | | ||||
| | {{< latex >}}\frac{\Gamma \vdash e_1 : \text{number}\quad \Gamma \vdash e_2 : \text{number}}{\Gamma \vdash e_1+e_2 : \text{number}} {{< /latex >}}| Adding numbers gives a number | | ||||
|  | ||||
| @ -159,8 +159,8 @@ the form: | ||||
| \frac{A_1 \ldots A_n} {B_1 \ldots B_m} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| This reads, "given that the premises \\(A\_1\\) through \\(A\_n\\) are true, | ||||
| it holds that the conclusions \\(B\_1\\) through \\(B\_m\\) are true". | ||||
| This reads, "given that the premises \(A_1\) through \(A_n\) are true, | ||||
| it holds that the conclusions \(B_1\) through \(B_m\) are true". | ||||
| 
 | ||||
| For example, we can have the following inference rule: | ||||
| 
 | ||||
| @ -173,18 +173,18 @@ For example, we can have the following inference rule: | ||||
| Since you wear a jacket when it's cold, and it's cold, we can conclude | ||||
| that you will wear a jacket. | ||||
| 
 | ||||
| When talking about type systems, it's common to represent a type with \\(\\tau\\). | ||||
| When talking about type systems, it's common to represent a type with \(\tau\). | ||||
| The letter, which is the greek character "tau", is used as a placeholder for | ||||
| some __concrete type__. It's kind of like a template, to be filled in | ||||
| with an actual value. When we plug in an actual value into a rule containing | ||||
| \\(\\tau\\), we say we are __instantiating__ it. Similarly, we will use | ||||
| \\(e\\) to serve as a placeholder for an expression (matched by our | ||||
| \\(A\_{add}\\) grammar rule from part 2). Next, we have the typing relation, | ||||
| written as \\(e:\\tau\\). This says that "expression \\(e\\) has the type | ||||
| \\(\\tau\\)". | ||||
| \(\tau\), we say we are __instantiating__ it. Similarly, we will use | ||||
| \(e\) to serve as a placeholder for an expression (matched by our | ||||
| \(A_{add}\) grammar rule from part 2). Next, we have the typing relation, | ||||
| written as \(e:\tau\). This says that "expression \(e\) has the type | ||||
| \(\tau\)". | ||||
| 
 | ||||
| Alright, this is enough to get us started with some typing rules. | ||||
| Let's start with one for numbers. If we define \\(n\\) to mean | ||||
| Let's start with one for numbers. If we define \(n\) to mean | ||||
| "any expression that is a just a number, like 3, 2, 6, etc.", | ||||
| we can write the typing rule as follows: | ||||
| 
 | ||||
| @ -206,30 +206,30 @@ Now, let's move on to the rule for function application: | ||||
| 
 | ||||
| This rule includes everything we've said before: | ||||
| the thing being applied has to have a function type | ||||
| (\\(\\tau\_1 \\rightarrow \\tau\_2\\)), and | ||||
| (\(\tau_1 \rightarrow \tau_2\)), and | ||||
| the expression the function is applied to | ||||
| has to have the same type \\(\\tau\_1\\) as the | ||||
| has to have the same type \(\tau_1\) as the | ||||
| left type of the function. | ||||
| 
 | ||||
| It's the variable rule that forces us to adjust our notation. | ||||
| Our rules don't take into account the context that we've | ||||
| already discussed, and thus, we can't bring | ||||
| in any outside information. Let's fix that! It's convention | ||||
| to use the symbol \\(\\Gamma\\) for the context. We then | ||||
| add notation to say, "using the context \\(\\Gamma\\), | ||||
| we can deduce that \\(e\\) has type \\(\\tau\\)". We will | ||||
| write this as \\(\\Gamma \\vdash e : \\tau\\). | ||||
| to use the symbol \(\Gamma\) for the context. We then | ||||
| add notation to say, "using the context \(\Gamma\), | ||||
| we can deduce that \(e\) has type \(\tau\)". We will | ||||
| write this as \(\Gamma \vdash e : \tau\). | ||||
| 
 | ||||
| But what __is__ our context? We can think of it | ||||
| as a mapping from variable names to their known types. We | ||||
| can represent such a mapping using a set of pairs | ||||
| in the form \\(x : \\tau\\), where \\(x\\) represents | ||||
| in the form \(x : \tau\), where \(x\) represents | ||||
| a variable name.  | ||||
| 
 | ||||
| Since \\(\\Gamma\\) is just a regular set, we can | ||||
| write \\(x : \\tau \\in \\Gamma\\), meaning that | ||||
| in the current context, it is known that \\(x\\) | ||||
| has the type \\(\\tau\\). | ||||
| Since \(\Gamma\) is just a regular set, we can | ||||
| write \(x : \tau \in \Gamma\), meaning that | ||||
| in the current context, it is known that \(x\) | ||||
| has the type \(\tau\). | ||||
| 
 | ||||
| Let's update our rules with this new addition. | ||||
| 
 | ||||
| @ -283,19 +283,19 @@ Let's first take a look at the whole case expression rule: | ||||
| This is a lot more complicated than the other rules we've seen, and we've used some notation | ||||
| that we haven't seen before. Let's take this step by step: | ||||
| 
 | ||||
| 1. \\(e : \\tau\\), in this case, means that the expression between `case` and `of`, is of type \\(\\tau\\). | ||||
| 2. \\(\\text{matcht}(\\tau, p\_i) = b\_i\\) means that the pattern \\(p\_i\\) can match a value of type | ||||
| \\(\\tau\\), producing additional type pairs \\(b\_i\\). We need \\(b\_i\\) because a pattern | ||||
| such as `Cons x xs` will introduce new type information, namely \\(\text{x} : \text{Int}\\) and \\(\text{xs} : \text{List}\\). | ||||
| 3. \\(\\Gamma,b\_i \\vdash e\_i : \\tau\_c\\) means that each individual branch can be deduced to have the type | ||||
| \\(\\tau\_c\\), using the previously existing context \\(\\Gamma\\), with the addition of \\(b\_i\\), the new type information. | ||||
| 4. Finally, the conclusion is that the case expression, if all the premises are met, is of type \\(\\tau\_c\\). | ||||
| 1. \(e : \tau\), in this case, means that the expression between `case` and `of`, is of type \(\tau\). | ||||
| 2. \(\text{matcht}(\tau, p_i) = b_i\) means that the pattern \(p_i\) can match a value of type | ||||
| \(\tau\), producing additional type pairs \(b_i\). We need \(b_i\) because a pattern | ||||
| such as `Cons x xs` will introduce new type information, namely \(\text{x} : \text{Int}\) and \(\text{xs} : \text{List}\). | ||||
| 3. \(\Gamma,b_i \vdash e_i : \tau_c\) means that each individual branch can be deduced to have the type | ||||
| \(\tau_c\), using the previously existing context \(\Gamma\), with the addition of \(b_i\), the new type information. | ||||
| 4. Finally, the conclusion is that the case expression, if all the premises are met, is of type \(\tau_c\). | ||||
| 
 | ||||
| For completeness, let's add rules for \\(\\text{matcht}(\\tau, p\_i) = b\_i\\). We'll need two: one for | ||||
| For completeness, let's add rules for \(\text{matcht}(\tau, p_i) = b_i\). We'll need two: one for | ||||
| the "basic" pattern, which always matches the value and binds a variable to it, and one | ||||
| for a constructor pattern, that matches a constructor and its parameters. | ||||
| 
 | ||||
| Let's define \\(v\\) to be a variable name in the context of a pattern. For the basic pattern: | ||||
| Let's define \(v\) to be a variable name in the context of a pattern. For the basic pattern: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \frac | ||||
| @ -303,7 +303,7 @@ Let's define \\(v\\) to be a variable name in the context of a pattern. For the | ||||
| {\text{matcht}(\tau, v) = \{v : \tau \}} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| For the next rule, let's define \\(c\\) to be a constructor name. The rule for the constructor pattern, then, is: | ||||
| For the next rule, let's define \(c\) to be a constructor name. The rule for the constructor pattern, then, is: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \frac | ||||
| @ -312,8 +312,8 @@ For the next rule, let's define \\(c\\) to be a constructor name. The rule for t | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| This rule means that whenever we have a pattern in the form of a constructor applied to | ||||
| \\(n\\) variable names, if the constructor takes \\(n\\) arguments of types \\(\\tau\_1\\) | ||||
| through \\(\\tau\_n\\), then the each variable will have a corresponding type. | ||||
| \(n\) variable names, if the constructor takes \(n\) arguments of types \(\tau_1\) | ||||
| through \(\tau_n\), then the each variable will have a corresponding type. | ||||
| 
 | ||||
| We didn't include lambda expressions in our syntax, and thus we won't need typing rules for them, | ||||
| so it actually seems like we're done with the first draft of our type rules. | ||||
| @ -399,8 +399,8 @@ we're binding is the same as the string we're binding it to | ||||
| We now have a unification algorithm, but we still | ||||
| need to implement our rules. Our rules | ||||
| usually include three things: an environment | ||||
| \\(\\Gamma\\), an expression \\(e\\), | ||||
| and a type \\(\\tau\\). We will | ||||
| \(\Gamma\), an expression \(e\), | ||||
| and a type \(\tau\). We will | ||||
| represent this as a method on `ast`, our struct | ||||
| for an expression tree. This | ||||
| method will take an environment and return | ||||
| @ -413,7 +413,7 @@ to a type. So naively, we can implement this simply | ||||
| using an `std::map`. But observe | ||||
| that we only extend the environment in one case so far: | ||||
| a case expression. In a case expression, we have the base | ||||
| envrionment \\(\\Gamma\\), and for each branch, | ||||
| envrionment \(\Gamma\), and for each branch, | ||||
| we extend it with the bindings produced by | ||||
| the pattern match. Each branch receives a modified | ||||
| copy of the original environment, one that | ||||
| @ -459,7 +459,7 @@ We start with with a signature inside `ast`: | ||||
| ``` | ||||
| virtual type_ptr typecheck(type_mgr& mgr, const type_env& env) const; | ||||
| ``` | ||||
| We also implement the \\(\\text{matchp}\\) function | ||||
| We also implement the \(\text{matchp}\) function | ||||
| as a method `match` on `pattern` with the following signature: | ||||
| ``` | ||||
| virtual void match(type_ptr t, type_mgr& mgr, type_env& env) const; | ||||
|  | ||||
| @ -147,19 +147,19 @@ We will follow the same notation as Simon Peyton Jones in | ||||
| [his book](https://www.microsoft.com/en-us/research/wp-content/uploads/1992/01/student.pdf) | ||||
| , which was my source of truth when implementing my compiler. The machine | ||||
| will be executing instructions that we give it, and as such, it must have | ||||
| an instruction queue, which we will reference as \\(i\\). We will write | ||||
| \\(x:i\\) to mean "an instruction queue that starts with | ||||
| an instruction x and ends with instructions \\(i\\)". A stack machine | ||||
| obviously needs to have a stack - we will call it \\(s\\), and will | ||||
| adopt a similar notation to the instruction queue: \\(a\_1, a\_2, a\_3 : s\\) | ||||
| will mean "a stack with the top values \\(a\_1\\), \\(a\_2\\), and \\(a\_3\\), | ||||
| and remaining instructions \\(s\\)". Finally, as we said, our stack | ||||
| machine has a dump, which we will write as \\(d\\). On this dump, | ||||
| an instruction queue, which we will reference as \(i\). We will write | ||||
| \(x:i\) to mean "an instruction queue that starts with | ||||
| an instruction x and ends with instructions \(i\)". A stack machine | ||||
| obviously needs to have a stack - we will call it \(s\), and will | ||||
| adopt a similar notation to the instruction queue: \(a_1, a_2, a_3 : s\) | ||||
| will mean "a stack with the top values \(a_1\), \(a_2\), and \(a_3\), | ||||
| and remaining instructions \(s\)". Finally, as we said, our stack | ||||
| machine has a dump, which we will write as \(d\). On this dump, | ||||
| we will push not only the current stack, but also the current | ||||
| instructions that we are executing, so we may resume execution | ||||
| later. We will write \\(\\langle i, s \\rangle : d\\) to mean | ||||
| "a dump with instructions \\(i\\) and stack \\(s\\) on top, | ||||
| followed by instructions and stacks in \\(d\\)". | ||||
| later. We will write \(\langle i, s \rangle : d\) to mean | ||||
| "a dump with instructions \(i\) and stack \(s\) on top, | ||||
| followed by instructions and stacks in \(d\)". | ||||
| 
 | ||||
| There's one more thing the G-machine will have that we've not yet discussed at all, | ||||
| and it's needed because of the following quip earlier in the post: | ||||
| @ -179,14 +179,14 @@ its address the same, but change the value on the heap. | ||||
| This way, all trees that reference the node we change become updated, | ||||
| without us having to change them - their child address remains the same, | ||||
| but the child has now been updated. We represent the heap | ||||
| using \\(h\\). We write \\(h[a : v]\\) to say "the address \\(a\\) points | ||||
| to value \\(v\\) in the heap \\(h\\)". Now you also know why we used | ||||
| the letter \\(a\\) when describing values on the stack - the stack contains | ||||
| using \(h\). We write \(h[a : v]\) to say "the address \(a\) points | ||||
| to value \(v\) in the heap \(h\)". Now you also know why we used | ||||
| the letter \(a\) when describing values on the stack - the stack contains | ||||
| addresses of (or pointers to) tree nodes. | ||||
| 
 | ||||
| _Compiling Functional Languages: a tutorial_ also keeps another component | ||||
| of the G-machine, the __global map__, which maps function names to addresses of nodes | ||||
| that represent them. We'll stick with this, and call this global map \\(m\\). | ||||
| that represent them. We'll stick with this, and call this global map \(m\). | ||||
| 
 | ||||
| Finally, let's talk about what kind of nodes our trees will be made of. | ||||
| We don't have to include every node that we've defined as a subclass of | ||||
| @ -218,7 +218,7 @@ First up is __PushInt__: | ||||
| 
 | ||||
| Let's go through this. We start with an instruction queue | ||||
| with `PushInt n` on top. We allocate a new `NInt` with the | ||||
| number `n` on the heap at address \\(a\\). We then push | ||||
| number `n` on the heap at address \(a\). We then push | ||||
| the address of the `NInt` node on top of the stack. Next, | ||||
| __PushGlobal__: | ||||
| 
 | ||||
| @ -251,7 +251,7 @@ __Push__: | ||||
| {{< /gmachine >}} | ||||
| 
 | ||||
| We define this instruction to work if and only if there exists an address | ||||
| on the stack at offset \\(n\\). We take the value at that offset, and | ||||
| on the stack at offset \(n\). We take the value at that offset, and | ||||
| push it onto the stack again. This can be helpful for something like | ||||
| `f x x`, where we use the same tree twice. Speaking of that - let's | ||||
| define an instruction to combine two nodes into an application: | ||||
| @ -274,11 +274,11 @@ that is an `NApp` node, with its two children being the nodes we popped off. | ||||
| Finally, we push it onto the stack. | ||||
| 
 | ||||
| Let's try use these instructions to get a feel for it. In | ||||
| order to conserve space, let's use \\(\\text{G}\\) for PushGlobal, | ||||
| \\(\\text{I}\\) for PushInt, and \\(\\text{A}\\) for PushApp. | ||||
| order to conserve space, let's use \(\text{G}\) for PushGlobal, | ||||
| \(\text{I}\) for PushInt, and \(\text{A}\) for PushApp. | ||||
| Let's say we want to construct a graph for `double 326`. We'll | ||||
| use the instructions \\(\\text{I} \; 326\\), \\(\\text{G} \; \\text{double}\\), | ||||
| and \\(\\text{A}\\). Let's watch these instructions play out: | ||||
| use the instructions \(\text{I} \; 326\), \(\text{G} \; \text{double}\), | ||||
| and \(\text{A}\). Let's watch these instructions play out: | ||||
| {{< latex >}} | ||||
| \begin{aligned} | ||||
| [\text{I} \; 326, \text{G} \; \text{double}, \text{A}] & \quad s \quad & d \quad & h \quad & m[\text{double} : a_d] \\ | ||||
| @ -346,13 +346,13 @@ code for the global function: | ||||
|     {{< /gmachine_inner >}} | ||||
| {{< /gmachine >}} | ||||
| 
 | ||||
| In this rule, we used a general rule for \\(a\_k\\), in which \\(k\\) is any number | ||||
| between 1 and \\(n-1\\). We also expect the `NGlobal` node to contain two parameters, | ||||
| \\(n\\) and \\(c\\). \\(n\\) is the arity of the function (the number of arguments | ||||
| it expects), and \\(c\\) are the instructions to construct the function's tree. | ||||
| In this rule, we used a general rule for \(a_k\), in which \(k\) is any number | ||||
| between 1 and \(n-1\). We also expect the `NGlobal` node to contain two parameters, | ||||
| \(n\) and \(c\). \(n\) is the arity of the function (the number of arguments | ||||
| it expects), and \(c\) are the instructions to construct the function's tree. | ||||
| 
 | ||||
| The attentive reader will have noticed a catch: we kept \\(a\_{n-1}\\) on the stack! | ||||
| This once again goes back to replacing a node in-place. \\(a\_{n-1}\\) is the address of the "root" of the | ||||
| The attentive reader will have noticed a catch: we kept \(a_{n-1}\) on the stack! | ||||
| This once again goes back to replacing a node in-place. \(a_{n-1}\) is the address of the "root" of the | ||||
| whole expression we're simplifying. Thus, to replace the value at this address, we need to keep | ||||
| the address until we have something to replace it with. | ||||
| 
 | ||||
| @ -451,7 +451,7 @@ and define it to contain a mapping from tags to instructions | ||||
| to be executed for a value of that tag. For instance, | ||||
| if the constructor `Nil` has tag 0, and `Cons` has tag 1, | ||||
| the mapping for the case expression of a length function | ||||
| could be written as \\([0 \\rightarrow [\\text{PushInt} \; 0], 1 \\rightarrow [\\text{PushGlobal} \; \\text{length}, ...] ]\\). | ||||
| could be written as \([0 \rightarrow [\text{PushInt} \; 0], 1 \rightarrow [\text{PushGlobal} \; \text{length}, ...] ]\). | ||||
| Let's define the rule for it: | ||||
| 
 | ||||
| {{< gmachine "Jump" >}} | ||||
| @ -475,7 +475,7 @@ creating a final graph. We then continue to reduce this final | ||||
| graph. But we've left the function parameters on the stack! | ||||
| This is untidy. We define a __Slide__ instruction, | ||||
| which keeps the address at the top of the stack, but gets | ||||
| rid of the next \\(n\\) addresses: | ||||
| rid of the next \(n\) addresses: | ||||
| 
 | ||||
| {{< gmachine "Slide" >}} | ||||
|     {{< gmachine_inner "Before">}} | ||||
|  | ||||
| @ -13,9 +13,9 @@ this G-machine. We will define a __compilation scheme__, | ||||
| which will be a set of rules that tell us how to | ||||
| translate programs in our language into G-machine instructions. | ||||
| To mirror _Implementing Functional Languages: a tutorial_, we'll | ||||
| call this compilation scheme \\(\\mathcal{C}\\), and write it | ||||
| as \\(\\mathcal{C} ⟦e⟧ = i\\), meaning "the expression \\(e\\) | ||||
| compiles to the instructions \\(i\\)". | ||||
| call this compilation scheme \(\mathcal{C}\), and write it | ||||
| as \(\mathcal{C} ⟦e⟧ = i\), meaning "the expression \(e\) | ||||
| compiles to the instructions \(i\)". | ||||
| 
 | ||||
| To follow our route from the typechecking, let's start | ||||
| with compiling expressions that are numbers. It's pretty easy: | ||||
| @ -37,7 +37,7 @@ we want to compile it last: | ||||
| \mathcal{C} ⟦e_1 \; e_2⟧ = \mathcal{C} ⟦e_2⟧ ⧺ \mathcal{C} ⟦e_1⟧ ⧺ [\text{MkApp}] | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| Here, we used the \\(⧺\\) operator to represent the concatenation of two | ||||
| Here, we used the \(⧺\) operator to represent the concatenation of two | ||||
| lists. Otherwise, this should be pretty intutive - we first run the instructions | ||||
| to create the parameter, then we run the instructions to create the function, | ||||
| and finally, we combine them using MkApp. | ||||
| @ -49,17 +49,17 @@ define our case expression compilation scheme to match. However, | ||||
| we still need to know __where__ on the stack each variable is, | ||||
| and this changes as the stack is modified. | ||||
| 
 | ||||
| To accommodate for this, we define an environment, \\(\\rho\\), | ||||
| To accommodate for this, we define an environment, \(\rho\), | ||||
| to be a partial function mapping variable names to thier | ||||
| offsets on the stack. We write \\(\\rho = [x \\rightarrow n, y \\rightarrow m]\\) | ||||
| to say "the environment \\(\\rho\\) maps variable \\(x\\) to stack offset \\(n\\), | ||||
| and variable \\(y\\) to stack offset \\(m\\)". We also write \\(\\rho \\; x\\) to | ||||
| say "look up \\(x\\) in \\(\\rho\\)", since \\(\\rho\\) is a function. Finally, | ||||
| offsets on the stack. We write \(\rho = [x \rightarrow n, y \rightarrow m]\) | ||||
| to say "the environment \(\rho\) maps variable \(x\) to stack offset \(n\), | ||||
| and variable \(y\) to stack offset \(m\)". We also write \(\rho \; x\) to | ||||
| say "look up \(x\) in \(\rho\)", since \(\rho\) is a function. Finally, | ||||
| to help with the ever-changing stack, we define an augmented environment | ||||
| \\(\\rho^{+n}\\), such that \\(\\rho^{+n} \\; x = \\rho \\; x + n\\). In words, | ||||
| this basically means "\\(\\rho^{+n}\\) has all the variables from \\(\\rho\\), | ||||
| but their addresses are incremented by \\(n\\)". We now pass \\(\\rho\\) | ||||
| in to \\(\\mathcal{C}\\) together with the expression \\(e\\). Let's | ||||
| \(\rho^{+n}\), such that \(\rho^{+n} \; x = \rho \; x + n\). In words, | ||||
| this basically means "\(\rho^{+n}\) has all the variables from \(\rho\), | ||||
| but their addresses are incremented by \(n\)". We now pass \(\rho\) | ||||
| in to \(\mathcal{C}\) together with the expression \(e\). Let's | ||||
| rewrite our first two rules. For numbers: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -72,8 +72,8 @@ For function application: | ||||
| \mathcal{C} ⟦e_1 \; e_2⟧ \; \rho = \mathcal{C} ⟦e_2⟧ \; \rho \; ⧺  \;\mathcal{C} ⟦e_1⟧ \; \rho^{+1} \; ⧺ \; [\text{MkApp}] | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| Notice how in that last rule, we passed in \\(\\rho^{+1}\\) when compiling the function's expression. This is because | ||||
| the result of running the instructions for \\(e\_2\\) will have left on the stack the function's parameter. Whatever | ||||
| Notice how in that last rule, we passed in \(\rho^{+1}\) when compiling the function's expression. This is because | ||||
| the result of running the instructions for \(e_2\) will have left on the stack the function's parameter. Whatever | ||||
| was at the top of the stack (and thus, had index 0), is now the second element from the top (address 1). The | ||||
| same is true for all other things that were on the stack. So, we increment the environment accordingly. | ||||
| 
 | ||||
| @ -84,7 +84,7 @@ With the environment, the variable rule is simple: | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| One more thing. If we run across a function name, we want to | ||||
| use PushGlobal rather than Push. Defining \\(f\\) to be a name | ||||
| use PushGlobal rather than Push. Defining \(f\) to be a name | ||||
| of a global function, we capture this using the following rule: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -93,8 +93,8 @@ of a global function, we capture this using the following rule: | ||||
| 
 | ||||
| Now it's time for us to compile case expressions, but there's a bit of | ||||
| an issue - our case expressions branches don't map one-to-one with | ||||
| the \\(t \\rightarrow i\_t\\) format of the Jump instruction. | ||||
| This is because we allow for name patterns in the form \\(x\\), | ||||
| the \(t \rightarrow i_t\) format of the Jump instruction. | ||||
| This is because we allow for name patterns in the form \(x\), | ||||
| which can possibly match more than one tag. Consider this | ||||
| rather useless example: | ||||
| 
 | ||||
| @ -120,8 +120,8 @@ Effectively, we'll be performing [desugaring](https://en.wikipedia.org/wiki/Synt | ||||
| 
 | ||||
| Now, on to defining the compilation rules for case expressions. | ||||
| It's helpful to define compiling a single branch of a case expression | ||||
| separately. For a branch in the form \\(t \\; x\_1 \\; x\_2 \\; ... \\; x\_n \\rightarrow \text{body}\\), | ||||
| we define a compilation scheme \\(\\mathcal{A}\\) as follows: | ||||
| separately. For a branch in the form \(t \; x_1 \; x_2 \; ... \; x_n \rightarrow \text{body}\), | ||||
| we define a compilation scheme \(\mathcal{A}\) as follows: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \begin{aligned} | ||||
| @ -133,15 +133,15 @@ t \rightarrow [\text{Split} \; n] \; ⧺ \; \mathcal{C}⟦\text{body}⟧ \; \rho | ||||
| 
 | ||||
| First, we run Split - the node on the top of the stack is a packed constructor, | ||||
| and we want access to its member variables, since they can be referenced by | ||||
| the branch's body via \\(x\_i\\). For the same reason, we must make sure to include | ||||
| \\(x\_1\\) through \\(x\_n\\) in our environment. Furthermore, since the split values now occupy the stack, | ||||
| we have to offset our environment by \\(n\\) before adding bindings to our new variables. | ||||
| Doing all these things gives us \\(\\rho'\\), which we use to compile the body, placing | ||||
| the branch's body via \(x_i\). For the same reason, we must make sure to include | ||||
| \(x_1\) through \(x_n\) in our environment. Furthermore, since the split values now occupy the stack, | ||||
| we have to offset our environment by \(n\) before adding bindings to our new variables. | ||||
| Doing all these things gives us \(\rho'\), which we use to compile the body, placing | ||||
| the resulting instructions after Split. This leaves us with the desired graph on top of | ||||
| the stack - the only thing left to do is to clean up the stack of the unpacked values, | ||||
| which we do using Slide. | ||||
| 
 | ||||
| Notice that we didn't just create instructions - we created a mapping from the tag \\(t\\) | ||||
| Notice that we didn't just create instructions - we created a mapping from the tag \(t\) | ||||
| to the instructions that correspond to it. | ||||
| 
 | ||||
| Now, it's time for compiling the whole case expression. We first want | ||||
| @ -155,7 +155,7 @@ is captured by the following rule: | ||||
| \mathcal{C} ⟦e⟧ \; \rho \; ⧺ [\text{Eval}, \text{Jump} \; [\mathcal{A} ⟦\text{alt}_1⟧ \; \rho, ..., \mathcal{A} ⟦\text{alt}_n⟧ \; \rho]] | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| This works because \\(\\mathcal{A}\\) creates not only instructions, | ||||
| This works because \(\mathcal{A}\) creates not only instructions, | ||||
| but also a tag mapping. We simply populate our Jump instruction such mappings | ||||
| resulting from compiling each branch. | ||||
| 
 | ||||
| @ -177,7 +177,7 @@ as always, you can look at the full project source code, which is | ||||
| freely available for each post in the series. | ||||
| 
 | ||||
| We can now envision a method on the `ast` struct that takes an environment  | ||||
| (just like our compilation scheme takes the environment \\(\\rho\\\)), | ||||
| (just like our compilation scheme takes the environment \(\rho\)), | ||||
| and compiles the `ast`. Rather than returning a vector | ||||
| of instructions (which involves copying, unless we get some optimization kicking in), | ||||
| we'll pass a reference to a vector to our method. The method will then place the generated | ||||
| @ -188,7 +188,7 @@ from a variable? A naive solution would be to take a list or map of | ||||
| global functions as a third parameter to our `compile` method. | ||||
| But there's an easier way! We know that the program passed type checking. | ||||
| This means that every referenced variable exists. From then, the situation is easy - | ||||
| if actual variable names are kept in the environment, \\(\\rho\\), then whenever | ||||
| if actual variable names are kept in the environment, \(\rho\), then whenever | ||||
| we see a variable that __isn't__ in the current environment, it must be a function name. | ||||
| 
 | ||||
| Having finished contemplating out method, it's time to define a signature: | ||||
|  | ||||
| @ -47,7 +47,7 @@ some of our notation from the [typechecking]({{< relref "03_compiler_typecheckin | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| But using our rules so far, such a thing is impossible, since there is no way for | ||||
| \\(\text{Int}\\) to be unified with \\(\text{Bool}\\). We need a more powerful | ||||
| \(\text{Int}\) to be unified with \(\text{Bool}\). We need a more powerful | ||||
| set of rules to describe our program's types. | ||||
| 
 | ||||
| 
 | ||||
| @ -72,23 +72,23 @@ Rule|Name and Description | ||||
| \frac | ||||
|     {x:\sigma \in \Gamma} | ||||
|     {\Gamma \vdash x:\sigma} | ||||
| {{< /latex >}}| __Var__: If the variable \\(x\\) is known to have some polymorphic type \\(\\sigma\\) then an expression consisting only of that variable is of that type. | ||||
| {{< /latex >}}| __Var__: If the variable \(x\) is known to have some polymorphic type \(\sigma\) then an expression consisting only of that variable is of that type. | ||||
| {{< latex >}} | ||||
| \frac | ||||
|     {\Gamma \vdash e_1 : \tau_1 \rightarrow \tau_2 \quad \Gamma \vdash e_2 : \tau_1} | ||||
|     {\Gamma \vdash e_1 \; e_2 : \tau_2} | ||||
| {{< /latex >}}| __App__: If an expression \\(e\_1\\), which is a function from monomorphic type \\(\\tau\_1\\) to another monomorphic type \\(\\tau\_2\\), is applied to an argument \\(e\_2\\) of type \\(\\tau\_1\\), then the result is of type \\(\\tau\_2\\). | ||||
| {{< /latex >}}| __App__: If an expression \(e_1\), which is a function from monomorphic type \(\tau_1\) to another monomorphic type \(\tau_2\), is applied to an argument \(e_2\) of type \(\tau_1\), then the result is of type \(\tau_2\). | ||||
| {{< latex >}} | ||||
| \frac | ||||
|     {\Gamma \vdash e : \tau \quad \text{matcht}(\tau, p_i) = b_i | ||||
|         \quad \Gamma,b_i \vdash e_i : \tau_c} | ||||
|     {\Gamma \vdash \text{case} \; e \; \text{of} \; | ||||
|         \{ (p_1,e_1) \ldots (p_n, e_n) \} : \tau_c } | ||||
| {{< /latex >}}| __Case__: This rule is not part of Hindley-Milner, and is specific to our language. If the expression being case-analyzed is of type \\(\\tau\\) and each branch \\((p\_i, e\_i)\\) is of the same type \\(\\tau\_c\\) when the pattern \\(p\_i\\) works with type \\(\\tau\\) producing extra bindings \\(b\_i\\), the whole case expression is of type \\(\\tau\_c\\). | ||||
| {{< /latex >}}| __Case__: This rule is not part of Hindley-Milner, and is specific to our language. If the expression being case-analyzed is of type \(\tau\) and each branch \((p_i, e_i)\) is of the same type \(\tau_c\) when the pattern \(p_i\) works with type \(\tau\) producing extra bindings \(b_i\), the whole case expression is of type \(\tau_c\). | ||||
| {{< latex >}} | ||||
| \frac{\Gamma \vdash e : \sigma' \quad \sigma' \sqsubseteq \sigma} | ||||
|     {\Gamma \vdash e : \sigma} | ||||
| {{< /latex >}}| __Inst (New)__: If type \\(\\sigma\\) is an instantiation (or specialization) of type \\(\\sigma\'\\) then an expression of type \\(\\sigma\'\\) is also an expression of type \\(\\sigma\\). | ||||
| {{< /latex >}}| __Inst (New)__: If type \(\sigma\) is an instantiation (or specialization) of type \(\sigma'\) then an expression of type \(\sigma'\) is also an expression of type \(\sigma\). | ||||
| {{< latex >}} | ||||
| \frac | ||||
|     {\Gamma \vdash e : \sigma \quad \alpha \not \in \text{free}(\Gamma)} | ||||
| @ -96,29 +96,29 @@ Rule|Name and Description | ||||
| {{< /latex >}}| __Gen (New)__: If an expression has a type with free variables, this rule allows us generalize it to allow all possible types to be used for these free variables. | ||||
| 
 | ||||
| Here, there is a distinction between different forms of types. First, there are | ||||
| monomorphic types, or __monotypes__, \\(\\tau\\), which are types such as \\(\\text{Int}\\), | ||||
| \\(\\text{Int} \\rightarrow \\text{Bool}\\), \\(a \\rightarrow b\\) | ||||
| monomorphic types, or __monotypes__, \(\tau\), which are types such as \(\text{Int}\), | ||||
| \(\text{Int} \rightarrow \text{Bool}\), \(a \rightarrow b\) | ||||
| and so on. These types are what we've been working with so far. Each of them | ||||
| represents one (hence, "mono-"), concrete type. This is obvious in the case | ||||
| of \\(\\text{Int}\\) and \\(\\text{Int} \\rightarrow \\text{Bool}\\), but | ||||
| for \\(a \\rightarrow b\\) things are slightly less clear. Does it really | ||||
| represent a single type, if we can put an arbtirary thing for \\(a\\)? | ||||
| The answer is "yes"! Although \\(a\\) is not currently known, it stands | ||||
| of \(\text{Int}\) and \(\text{Int} \rightarrow \text{Bool}\), but | ||||
| for \(a \rightarrow b\) things are slightly less clear. Does it really | ||||
| represent a single type, if we can put an arbtirary thing for \(a\)? | ||||
| The answer is "yes"! Although \(a\) is not currently known, it stands | ||||
| in place of another monotype, which is yet to be determined. | ||||
| 
 | ||||
| So, how do we represent polymorphic types, like that of \\(\\text{if}\\)? | ||||
| So, how do we represent polymorphic types, like that of \(\text{if}\)? | ||||
| We borrow some more notation from mathematics, and use the "forall" quantifier: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \text{if} : \forall a \; . \; \text{Bool} \rightarrow a \rightarrow a \rightarrow a | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| This basically says, "the type of \\(\\text{if}\\) is \\(\\text{Bool} \\rightarrow a \\rightarrow a \\rightarrow a\\) | ||||
| for all possible \\(a\\)". This new expression using "forall" is what we call a type scheme, or a polytype \\(\\sigma\\). | ||||
| This basically says, "the type of \(\text{if}\) is \(\text{Bool} \rightarrow a \rightarrow a \rightarrow a\) | ||||
| for all possible \(a\)". This new expression using "forall" is what we call a type scheme, or a polytype \(\sigma\). | ||||
| For simplicity, we only allow "forall" to be at the front of a polytype. That is, expressions like | ||||
| \\(a \\rightarrow \\forall b \\; . \\; b \\rightarrow b\\) are not valid polytypes as far as we're concerned. | ||||
| \(a \rightarrow \forall b \; . \; b \rightarrow b\) are not valid polytypes as far as we're concerned. | ||||
| 
 | ||||
| It's key to observe that only some of the typing rules in the above table use polytypes (\\(\\sigma\\)). Whereas expressions | ||||
| It's key to observe that only some of the typing rules in the above table use polytypes (\(\sigma\)). Whereas expressions | ||||
| consisting of a single variable can be polymorphically typed, this is not true for function applications and case expressions. | ||||
| In fact, according to our rules, there is no way to introduce a polytype anywhere into our system! | ||||
| 
 | ||||
| @ -127,8 +127,8 @@ this is called __Let-Polymorphism__, which means that only in `let`/`in` express | ||||
| be given a polymorphic type. We, on the other hand, do not (yet) have `let`/`in` expressions, so our polymorphism | ||||
| is further limited: only functions (and data type constructors) can be polymorphically typed. | ||||
| 
 | ||||
| Let's talk about what __Inst__ does, and what "\\(\\sqsubseteq\\)" means. | ||||
| First, why don't we go ahead and write the formal inference rule for \\(\\sqsubseteq\\): | ||||
| Let's talk about what __Inst__ does, and what "\(\sqsubseteq\)" means. | ||||
| First, why don't we go ahead and write the formal inference rule for \(\sqsubseteq\): | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \frac | ||||
| @ -137,21 +137,21 @@ First, why don't we go ahead and write the formal inference rule for \\(\\sqsubs | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| In my opinion, this is one of the more confusing inference rules we have to deal with in Hindley-Milner. | ||||
| In principle, though, it's not too hard to understand. \\(\\sigma\' \\sqsubseteq \\sigma\\) says "\\(\\sigma\'\\) | ||||
| is more general than \\(\\sigma\\)". Alternatively, we can interpret it as "\\(\\sigma\\) is an instance of \\(\\sigma\'\\)". | ||||
| In principle, though, it's not too hard to understand. \(\sigma' \sqsubseteq \sigma\) says "\(\sigma'\) | ||||
| is more general than \(\sigma\)". Alternatively, we can interpret it as "\(\sigma\) is an instance of \(\sigma'\)". | ||||
| 
 | ||||
| What does it mean for one polytype to be more general than another? Intuitively, we might say that \\(\forall a \\; . \\; a \\rightarrow a\\) is | ||||
| more general than \\(\\text{Int} \\rightarrow \\text{Int}\\), because the former type can represent the latter, and more. Formally, | ||||
| we define this in terms of __substitutions__. A substitution \\(\\{\\alpha \\mapsto \\tau \\}\\) replaces a variable | ||||
| \\(\\alpha\\) with a monotype \\(\\tau\\). If we can use a substitution to convert one type into another, then the first | ||||
| What does it mean for one polytype to be more general than another? Intuitively, we might say that \(\forall a \; . \; a \rightarrow a\) is | ||||
| more general than \(\text{Int} \rightarrow \text{Int}\), because the former type can represent the latter, and more. Formally, | ||||
| we define this in terms of __substitutions__. A substitution \(\{\alpha \mapsto \tau \}\) replaces a variable | ||||
| \(\alpha\) with a monotype \(\tau\). If we can use a substitution to convert one type into another, then the first | ||||
| type (the one on which the substitution was performed) is more general than the resulting type. In our previous example, | ||||
| since we can apply the substitution \\(\\{a \\mapsto \\text{Int}\\}\\) to get \\(\\text{Int} \\rightarrow \\text{Int}\\) | ||||
| from \\(\\forall a \\; . \\; a \\rightarrow a\\), the latter type is more general; using our notation, | ||||
| \\(\\forall a \\; . \\; a \\rightarrow a \\sqsubseteq \\text{Int} \\rightarrow \\text{Int}\\). | ||||
| since we can apply the substitution \(\{a \mapsto \text{Int}\}\) to get \(\text{Int} \rightarrow \text{Int}\) | ||||
| from \(\forall a \; . \; a \rightarrow a\), the latter type is more general; using our notation, | ||||
| \(\forall a \; . \; a \rightarrow a \sqsubseteq \text{Int} \rightarrow \text{Int}\). | ||||
| 
 | ||||
| That's pretty much all that the rule says, really. But what about the whole thing with \\(\\beta\\) and \\(\\text{free}\\)? The reason | ||||
| That's pretty much all that the rule says, really. But what about the whole thing with \(\beta\) and \(\text{free}\)? The reason | ||||
| for that part of the rule is that, in principle, we can substitute polytypes into polytypes. However, we can't | ||||
| do this using \\(\\{ \\alpha \\mapsto \\sigma \\}\\). Consider, for example: | ||||
| do this using \(\{ \alpha \mapsto \sigma \}\). Consider, for example: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \{ a \mapsto \forall b \; . \; b \rightarrow b \} \; \text{Bool} \rightarrow a \rightarrow a \stackrel{?}{=} | ||||
| @ -159,9 +159,9 @@ do this using \\(\\{ \\alpha \\mapsto \\sigma \\}\\). Consider, for example: | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| Hmm, this is not good. Didn't we agree that we only want quantifiers at the front of our types? Instead, to make that substitution, | ||||
| we only substitute the monotype \\(b \\rightarrow b\\), and then add the \\(\\forall b\\) at the front. But | ||||
| to do this, we must make sure that \\(b\\) doesn't occur anywhere in the original type | ||||
| \\(\forall a \\; . \\; \\text{Bool} \\rightarrow a \\rightarrow a\\) (otherwise we can accidentally generalize | ||||
| we only substitute the monotype \(b \rightarrow b\), and then add the \(\forall b\) at the front. But | ||||
| to do this, we must make sure that \(b\) doesn't occur anywhere in the original type | ||||
| \(\forall a \; . \; \text{Bool} \rightarrow a \rightarrow a\) (otherwise we can accidentally generalize | ||||
| too much). So then, our concrete inference rule is as follows: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -176,9 +176,9 @@ too much). So then, our concrete inference rule is as follows: | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| In the above rule we: | ||||
| 1. Replaced \\(a\\) with \\(b \\rightarrow b\\), getting rid of \\(a\\) in the quantifier. | ||||
| 2. Observed that \\(b\\) is not a free variable in the original type, and thus can be generalized. | ||||
| 3. Added the generalization for \\(b\\) to the front of the resulting type. | ||||
| 1. Replaced \(a\) with \(b \rightarrow b\), getting rid of \(a\) in the quantifier. | ||||
| 2. Observed that \(b\) is not a free variable in the original type, and thus can be generalized. | ||||
| 3. Added the generalization for \(b\) to the front of the resulting type. | ||||
| 
 | ||||
| Now, __Inst__ just allows us to perform specialization / substitution as many times | ||||
| as we need to get to the type we want. | ||||
| @ -188,12 +188,12 @@ as we need to get to the type we want. | ||||
| Alright, now we have all these rules. How does this change our typechecking algorithm? | ||||
| How about the following: | ||||
| 
 | ||||
| 1. To every declared function, assign the type \\(a \\rightarrow ... \\rightarrow y \\rightarrow z\\), | ||||
| 1. To every declared function, assign the type \(a \rightarrow ... \rightarrow y \rightarrow z\), | ||||
| where | ||||
| {{< sidenote "right" "arguments-note" "\(a\) through \(y\) are the types of the arguments to the function," >}} | ||||
| Of course, there can be more or less than 25 arguments to any function. This is just a generalization; | ||||
| we use as many input types as are needed. | ||||
| {{< /sidenote >}} and \\(z\\) is the function's | ||||
| {{< /sidenote >}} and \(z\) is the function's | ||||
| return type. | ||||
| 2. We typecheck each declared function, using the __Var__, __Case__, __App__, and __Inst__ rules. | ||||
| 3. Whatever type variables we don't fill in, we assume can be filled in with any type, | ||||
| @ -214,15 +214,15 @@ defn testTwo = { if True 0 1 } | ||||
| 
 | ||||
| If we go through and typecheck them top-to-bottom, the following happens: | ||||
| 
 | ||||
| 1. We start by assuming \\(\\text{if} : a \\rightarrow b \\rightarrow c \\rightarrow d\\), | ||||
| \\(\\text{testOne} : e\\) and \\(\\text{testTwo} : f\\). | ||||
| 2. We look at `if`. We infer the type of `c` to be \\(\\text{Bool}\\), and thus, \\(a = \\text{Bool}\\). | ||||
| We also infer that \\(b = c\\), since they occur in two branches of the same case expression. | ||||
| Finally, we infer that \\(c = d\\), since whatever the case expression returns becomes the return | ||||
| value of the function. Thus, we come out knowing that \\(\\text{if} : \\text{Bool} \\rightarrow b | ||||
| \\rightarrow b \\rightarrow b\\). | ||||
| 3. Now, since we never figured out \\(b\\), we use __Gen__: \\(\\text{if} : \\forall b \\; . \\; | ||||
| \\text{Bool} \\rightarrow b \\rightarrow b \\rightarrow b\\). Like we'd want, `if` works with | ||||
| 1. We start by assuming \(\text{if} : a \rightarrow b \rightarrow c \rightarrow d\), | ||||
| \(\text{testOne} : e\) and \(\text{testTwo} : f\). | ||||
| 2. We look at `if`. We infer the type of `c` to be \(\text{Bool}\), and thus, \(a = \text{Bool}\). | ||||
| We also infer that \(b = c\), since they occur in two branches of the same case expression. | ||||
| Finally, we infer that \(c = d\), since whatever the case expression returns becomes the return | ||||
| value of the function. Thus, we come out knowing that \(\text{if} : \text{Bool} \rightarrow b | ||||
| \rightarrow b \rightarrow b\). | ||||
| 3. Now, since we never figured out \(b\), we use __Gen__: \(\text{if} : \forall b \; . \; | ||||
| \text{Bool} \rightarrow b \rightarrow b \rightarrow b\). Like we'd want, `if` works with | ||||
| all types, as long as both its inputs are of the same type. | ||||
| 4. When we typecheck the body of `testOne`, we use __Inst__ to turn the above type for `if` | ||||
| into a single, monomorphic instance. Then, type inference proceeds as before, and all is well. | ||||
| @ -231,15 +231,15 @@ and all is well again. | ||||
| 
 | ||||
| So far, so good. But what if we started from the bottom, and went to the top? | ||||
| 
 | ||||
| 1. We start by assuming \\(\\text{if} : a \\rightarrow b \\rightarrow c \\rightarrow d\\), | ||||
| \\(\\text{testOne} : e\\) and \\(\\text{testTwo} : f\\). | ||||
| 2. We look at `testTwo`. We infer that \\(a = \\text{Bool}\\) (since | ||||
| we pass in `True` to `if`). We also infer that \\(b = \\text{Int}\\), and that \\(c = \\text{Int}\\). | ||||
| 1. We start by assuming \(\text{if} : a \rightarrow b \rightarrow c \rightarrow d\), | ||||
| \(\text{testOne} : e\) and \(\text{testTwo} : f\). | ||||
| 2. We look at `testTwo`. We infer that \(a = \text{Bool}\) (since | ||||
| we pass in `True` to `if`). We also infer that \(b = \text{Int}\), and that \(c = \text{Int}\). | ||||
| Not yet sure of the return type of `if`, this is where we stop. We are left with | ||||
| the knowledge that \\(f = d\\) (because the return type of `if` is the return type of `testTwo`), | ||||
| but that's about it. Since \\(f\\) is no longer free, we don't generalize, and conclude that \\(\text{testTwo} : f\\). | ||||
| 3. We look at `testOne`. We infer that \\(a = \\text{Bool}\\) (already known). We also infer | ||||
| that \\(b = \\text{Bool}\\), and that \\(c = \\text{Bool}\\). But wait a minute! This is not right. | ||||
| the knowledge that \(f = d\) (because the return type of `if` is the return type of `testTwo`), | ||||
| but that's about it. Since \(f\) is no longer free, we don't generalize, and conclude that \(\text{testTwo} : f\). | ||||
| 3. We look at `testOne`. We infer that \(a = \text{Bool}\) (already known). We also infer | ||||
| that \(b = \text{Bool}\), and that \(c = \text{Bool}\). But wait a minute! This is not right. | ||||
| We are back to where we started, with a unification error! | ||||
| 
 | ||||
| What went wrong? I claim that we typechecked the functions that _used_ `if` before we typechecked `if` itself, | ||||
| @ -257,11 +257,11 @@ A transitive closure of a vertex in a graph is the list of all vertices reachabl | ||||
| from that original vertex. Check out the <a href="https://en.wikipedia.org/wiki/Transitive_closure#In_graph_theory"> | ||||
| Wikipedia page on this</a>. | ||||
| {{< /sidenote >}} | ||||
| of the function dependencies. We define a function \\(f\\) to be dependent on another function \\(g\\) | ||||
| if \\(f\\) calls \\(g\\). The transitive closure will help us find functions that are related indirectly. | ||||
| For instance, if \\(g\\) also depends on \\(h\\), then the transitive closure of \\(f\\) will  | ||||
| include \\(h\\), even if \\(f\\) directly doesn't use \\(h\\). | ||||
| 2. We isolate groups of mutually dependent functions. If \\(f\\) depends on \\(g\\) and \\(g\\) depends \\(f\\), | ||||
| of the function dependencies. We define a function \(f\) to be dependent on another function \(g\) | ||||
| if \(f\) calls \(g\). The transitive closure will help us find functions that are related indirectly. | ||||
| For instance, if \(g\) also depends on \(h\), then the transitive closure of \(f\) will  | ||||
| include \(h\), even if \(f\) directly doesn't use \(h\). | ||||
| 2. We isolate groups of mutually dependent functions. If \(f\) depends on \(g\) and \(g\) depends \(f\), | ||||
| they are placed in one group. We then construct a dependency graph __of these groups__. | ||||
| 3. We compute a topological order of the group graph. This helps us typecheck the dependencies | ||||
| of functions before checking the functions themselves. In our specific case, this would ensure | ||||
| @ -271,7 +271,7 @@ in a group. | ||||
| 4. We typecheck the function groups, and functions within them, following the above topological order. | ||||
| 
 | ||||
| To find the transitive closure of a graph, we can use [Warshall's Algorithm](https://cs.winona.edu/lin/cs440/ch08-2.pdf). | ||||
| This algorithm, with complexity \\(O(|V|^3)\\), goes as follows: | ||||
| This algorithm, with complexity \(O(|V|^3)\), goes as follows: | ||||
| {{< latex >}} | ||||
| \begin{aligned} | ||||
| & A, R^{(i)} \in \mathbb{B}^{n \times n} \\ | ||||
| @ -285,12 +285,12 @@ This algorithm, with complexity \\(O(|V|^3)\\), goes as follows: | ||||
| \end{aligned} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| In the above notation, \\(R^{(i)}\\) is the \\(i\\)th matrix \\(R\\), and \\(A\\) is the adjacency | ||||
| matrix of the graph in question. All matrices in the algorithm are from \\(\\mathbb{B}^{n \times n}\\), | ||||
| the set of \\(n\\) by \\(n\\) boolean matrices. Once this algorithm is complete, we get as output a | ||||
| transitive closure adjacency matrix \\(R^{(n)}\\). Mutually dependent functions will be pretty easy to | ||||
| isolate from this matrix. If \\(R^{(n)}[i,j]\\) and \\(R^{(n)}[j,i]\\), then the functions represented by vertices | ||||
| \\(i\\) and \\(j\\) depend on each other. | ||||
| In the above notation, \(R^{(i)}\) is the \(i\)th matrix \(R\), and \(A\) is the adjacency | ||||
| matrix of the graph in question. All matrices in the algorithm are from \(\mathbb{B}^{n \times n}\), | ||||
| the set of \(n\) by \(n\) boolean matrices. Once this algorithm is complete, we get as output a | ||||
| transitive closure adjacency matrix \(R^{(n)}\). Mutually dependent functions will be pretty easy to | ||||
| isolate from this matrix. If \(R^{(n)}[i,j]\) and \(R^{(n)}[j,i]\), then the functions represented by vertices | ||||
| \(i\) and \(j\) depend on each other. | ||||
| 
 | ||||
| Once we've identified the groups, and | ||||
| {{< sidenote "right" "group-graph-note" "constructed a group graph," >}} | ||||
| @ -596,7 +596,7 @@ a value of a non-existent type), but a mature compiler should prevent this from | ||||
| On the other hand, here are the steps for function definitions: | ||||
| 
 | ||||
| 1. Find the free variables of each function to create the ordered list of groups as described above. | ||||
| 2. Within each group, insert a general function type (like \\(a \\rightarrow b \\rightarrow c\\)) | ||||
| 2. Within each group, insert a general function type (like \(a \rightarrow b \rightarrow c\)) | ||||
| into the environment for each function. | ||||
| 3. Within each group (in the same pass) run typechecking | ||||
| (including polymorphism, using the rules as described above). | ||||
| @ -707,8 +707,8 @@ is also updated to use topological ordering: | ||||
| 
 | ||||
| The above code uses the yet-unexplained `generalize` method. What's going on? | ||||
| 
 | ||||
| Observe that the __Var__ rule of the Hindley-Milner type system says that a variable \\(x\\) | ||||
| can have a __polytype__ in the environment \\(\\Gamma\\). Our `type_ptr` can only represent monotypes, | ||||
| Observe that the __Var__ rule of the Hindley-Milner type system says that a variable \(x\) | ||||
| can have a __polytype__ in the environment \(\Gamma\). Our `type_ptr` can only represent monotypes, | ||||
| so we must change what `type_env` associates with names to a new struct for representing polytypes, | ||||
| which we will call `type_scheme`. The `type_scheme` struct, just like the formal definition of | ||||
| a polytype, contains zero or more "forall"-quantified type variables, followed by a monotype which | ||||
|  | ||||
| @ -42,11 +42,11 @@ empty. | ||||
| 
 | ||||
| Let's talk about `List` itself, now. I suggest that we ponder the following table: | ||||
| 
 | ||||
| \\(\\text{List}\\)|\\(\\text{Cons}\\) | ||||
| \(\text{List}\)|\(\text{Cons}\) | ||||
| ----|---- | ||||
| \\(\\text{List}\\) is not a type; it must be followed up with arguments, like \\(\\text{List} \\; \\text{Int}\\).|\\(\\text{Cons}\\) is not a list; it must be followed up with arguments, like \\(\\text{Cons} \\; 3 \\; \\text{Nil}\\). | ||||
| \\(\\text{List} \\; \\text{Int}\\) is in its simplest form.|\\(\\text{Cons} \\; 3 \\; \\text{Nil}\\) is in its simplest form. | ||||
| \\(\\text{List} \\; \\text{Int}\\) is a type.|\\(\\text{Cons} \\; 3 \\; \\text{Nil}\\) is a value of type \\(\\text{List} \\; \\text{Int}\\). | ||||
| \(\text{List}\) is not a type; it must be followed up with arguments, like \(\text{List} \; \text{Int}\).|\(\text{Cons}\) is not a list; it must be followed up with arguments, like \(\text{Cons} \; 3 \; \text{Nil}\). | ||||
| \(\text{List} \; \text{Int}\) is in its simplest form.|\(\text{Cons} \; 3 \; \text{Nil}\) is in its simplest form. | ||||
| \(\text{List} \; \text{Int}\) is a type.|\(\text{Cons} \; 3 \; \text{Nil}\) is a value of type \(\text{List} \; \text{Int}\). | ||||
| 
 | ||||
| I hope that the similarities are quite striking. I claim that | ||||
| `List` is quite similar to a constructor `Cons`, except that it occurs | ||||
| @ -74,18 +74,18 @@ for functional programming) or <a href="https://coq.inria.fr/">Coq</a> (to see h | ||||
| propositions and proofs can be encoded in a dependently typed language). | ||||
| {{< /sidenote >}} | ||||
| our type constructors will only take zero or more types as input, and produce | ||||
| a type as output. In this case, writing \\(\\text{Type}\\) becomes quite repetitive, | ||||
| and we will adopt the convention of writing \\(*\\) instead. The types of such | ||||
| a type as output. In this case, writing \(\text{Type}\) becomes quite repetitive, | ||||
| and we will adopt the convention of writing \(*\) instead. The types of such | ||||
| constructors are called [kinds](https://en.wikipedia.org/wiki/Kind_(type_theory)). | ||||
| Let's look at a few examples, just to make sure we're on the same page: | ||||
| 
 | ||||
| * The kind of \\(\\text{Bool}\\) is \\(*\\): it does not accept any | ||||
| * The kind of \(\text{Bool}\) is \(*\): it does not accept any | ||||
| type arguments, and is a type in its own right. | ||||
| * The kind of \\(\\text{List}\\) is \\(*\\rightarrow *\\): it takes | ||||
| * The kind of \(\text{List}\) is \(*\rightarrow *\): it takes | ||||
| one argument (the type of the things inside the list), and creates | ||||
| a type from it. | ||||
| * If we define a pair as `data Pair a b = { MkPair a b }`, then its | ||||
| kind is \\(* \\rightarrow * \\rightarrow *\\), because it requires | ||||
| kind is \(* \rightarrow * \rightarrow *\), because it requires | ||||
| two parameters. | ||||
| 
 | ||||
| As one final observation, we note that effectively, all we're doing is | ||||
| @ -94,24 +94,24 @@ type. | ||||
| 
 | ||||
| Let's now enumerate all the possible forms that (mono)types can take in our system: | ||||
| 
 | ||||
| 1. A type can be a placeholder, like \\(a\\), \\(b\\), etc. | ||||
| 1. A type can be a placeholder, like \(a\), \(b\), etc. | ||||
| 2. A type can be a type constructor, applied to | ||||
| {{< sidenote "right" "zero-more-note" "zero ore more arguments," >}} | ||||
| It is convenient to treat regular types (like \(\text{Bool}\)) as | ||||
| type constructors of arity 0 (that is, type constructors with kind \(*\)). | ||||
| In effect, they take zero arguments and produce types (themselves). | ||||
| {{< /sidenote >}} such as \\(\\text{List} \\; \\text{Int}\\) or \\(\\text{Bool}\\). | ||||
| 3. A function from one type to another, like \\(\\text{List} \\; a \\rightarrow \\text{Int}\\). | ||||
| {{< /sidenote >}} such as \(\text{List} \; \text{Int}\) or \(\text{Bool}\). | ||||
| 3. A function from one type to another, like \(\text{List} \; a \rightarrow \text{Int}\). | ||||
| 
 | ||||
| Polytypes (type schemes) in our system can be all of the above, but may also include a "forall" | ||||
| quantifier at the front, generalizing the type (like \\(\\forall a \\; . \\; \\text{List} \\; a \\rightarrow \\text{Int}\\)). | ||||
| quantifier at the front, generalizing the type (like \(\forall a \; . \; \text{List} \; a \rightarrow \text{Int}\)). | ||||
| 
 | ||||
| Let's start implementing all of this. Why don't we start with the change to the syntax of our language? | ||||
| We have complicated the situation quite a bit. Let's take a look at the _old_ grammar | ||||
| for data type declarations (this is going back as far as [part 2]({{< relref "02_compiler_parsing.md" >}})). | ||||
| Here, \\(L\_D\\) is the nonterminal for the things that go between the curly braces of a data type | ||||
| declaration, \\(D\\) is the nonterminal representing a single constructor definition, | ||||
| and \\(L\_U\\) is a list of zero or more uppercase variable names: | ||||
| Here, \(L_D\) is the nonterminal for the things that go between the curly braces of a data type | ||||
| declaration, \(D\) is the nonterminal representing a single constructor definition, | ||||
| and \(L_U\) is a list of zero or more uppercase variable names: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \begin{aligned} | ||||
| @ -127,7 +127,7 @@ This grammar was actually too simple even for our monomorphically typed language | ||||
| Since functions are not represented using a single uppercase variable, it wasn't possible for us | ||||
| to define constructors that accept as arguments anything other than integers and user-defined | ||||
| data types. Now, we also need to modify this grammar to allow for constructor applications (which can be nested). | ||||
| To do all of these things, we will define a new nonterminal, \\(Y\\), for types: | ||||
| To do all of these things, we will define a new nonterminal, \(Y\), for types: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \begin{aligned} | ||||
| @ -136,8 +136,8 @@ Y & \rightarrow N | ||||
| \end{aligned} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| We make it right-recursive (because the \\(\\rightarrow\\) operator is right-associative). Next, we define | ||||
| a nonterminal for all types _except_ those constructed with the arrow, \\(N\\). | ||||
| We make it right-recursive (because the \(\rightarrow\) operator is right-associative). Next, we define | ||||
| a nonterminal for all types _except_ those constructed with the arrow, \(N\). | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \begin{aligned} | ||||
| @ -148,15 +148,15 @@ N & \rightarrow ( Y ) | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| The first of the above rules allows a type to be a constructor applied to zero or more arguments | ||||
| (generated by \\(L\_Y\\)). The second rule allows a type to be a placeholder type variable. Finally, | ||||
| (generated by \(L_Y\)). The second rule allows a type to be a placeholder type variable. Finally, | ||||
| the third rule allows for any type (including functions, again) to occur between parentheses. | ||||
| This is so that higher-order functions, like \\((a \rightarrow b) \rightarrow a \rightarrow a \\), | ||||
| This is so that higher-order functions, like \((a \rightarrow b) \rightarrow a \rightarrow a \), | ||||
| can be represented. | ||||
| 
 | ||||
| Unfortunately, the definition of \\(L\_Y\\) is not as straightforward as we imagine. We could define | ||||
| it as just a list of \\(Y\\) nonterminals, but this would make the grammar ambigous: something | ||||
| Unfortunately, the definition of \(L_Y\) is not as straightforward as we imagine. We could define | ||||
| it as just a list of \(Y\) nonterminals, but this would make the grammar ambigous: something | ||||
| like `List Maybe Int` could be interpreted as "`List`, applied to types `Maybe` and `Int`", and | ||||
| "`List`, applied to type `Maybe Int`". To avoid this, we define a "type list element" \\(Y'\\), | ||||
| "`List`, applied to type `Maybe Int`". To avoid this, we define a "type list element" \(Y'\), | ||||
| which does not take arguments: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -167,7 +167,7 @@ Y' & \rightarrow ( Y ) | ||||
| \end{aligned} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| We then make \\(L\_Y\\) a list of \\(Y'\\): | ||||
| We then make \(L_Y\) a list of \(Y'\): | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \begin{aligned} | ||||
| @ -177,7 +177,7 @@ L_Y & \rightarrow \epsilon | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| Finally, we update the rules for the data type declaration, as well as for a single | ||||
| constructor. In these new rules, we use \\(L\_T\\) to mean a list of type variables. | ||||
| constructor. In these new rules, we use \(L_T\) to mean a list of type variables. | ||||
| The rules are as follows: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -336,7 +336,7 @@ it will be once the type manager generates its first type variable, and things w | ||||
| wanted type constructors to be monomorphic (but generic, with type variables) we'd need to internally | ||||
| instnatiate fresh type variables for every user-defined type variable, and substitute them appropriately. | ||||
| {{< /sidenote >}}  | ||||
| as we have discussed above with \\(\\text{Nil}\\) and \\(\\text{Cons}\\). | ||||
| as we have discussed above with \(\text{Nil}\) and \(\text{Cons}\). | ||||
| To accomodate for this, we also add all type variables to the "forall" quantifier | ||||
| of a new type scheme, whose monotype is our newly assembled function type. This | ||||
| type scheme is what we store as the type of the constructor. | ||||
|  | ||||
| @ -150,7 +150,7 @@ our expression language, which makes case analysis very difficult. | ||||
| An obvious thing to do with an expression is to evaluate it. This will be | ||||
| important for our proofs, because it will establish a connection between | ||||
| expressions (created via `Expr`) and actual Agda objects that we need to | ||||
| reason about at the end of the day. The notation \\(\\llbracket e \\rrbracket\\) | ||||
| reason about at the end of the day. The notation \(\llbracket e \rrbracket\) | ||||
| is commonly used in PL circles for evaluation (it comes from | ||||
| [Denotational Semantics](https://en.wikipedia.org/wiki/Denotational_semantics)). | ||||
| Thus, my Agda evaluation function is written as follows: | ||||
| @ -188,8 +188,8 @@ the structure of these cases. Thus, examples include: | ||||
| * **Automatic derivation of function properties:** suppose you're interested | ||||
|   in working with continuous functions. You also know that the addition, | ||||
|   subtraction, and multiplication of two functions preserves continuity. | ||||
|   Of course, the constant function \\(x \\mapsto c\\) and the identity function | ||||
|   \\(x \\mapsto x\\) are continuous too. You may define an expression data type | ||||
|   Of course, the constant function \(x \mapsto c\) and the identity function | ||||
|   \(x \mapsto x\) are continuous too. You may define an expression data type | ||||
|   that has cases for these operations. Then, your evaluation function could | ||||
|   transform the expression into a plain function, and a proof on the | ||||
|   structure of the expression can be used to verify the resulting function's | ||||
|  | ||||
| @ -36,9 +36,9 @@ but whenever I read a paper, my eyes search for the rules first and foremost. | ||||
| 
 | ||||
| But to those just starting their PL journey, inference rules can be quite cryptic | ||||
| -- I know they were to me! The first level of difficulty are the symbols: we have | ||||
| lots of Greek (\\(\\Gamma\\) and \\(\\Delta\\) for environments, \\(\\tau\\) and perhaps \\(\\sigma\\) | ||||
| for types), and the occasional mathematical symbol (the "entails" symbol \\(\\vdash\\) is the most | ||||
| common, but for operational semantics we can have \\(\\leadsto\\) and \\(\\Downarrow\\)). | ||||
| lots of Greek (\(\Gamma\) and \(\Delta\) for environments, \(\tau\) and perhaps \(\sigma\) | ||||
| for types), and the occasional mathematical symbol (the "entails" symbol \(\vdash\) is the most | ||||
| common, but for operational semantics we can have \(\leadsto\) and \(\Downarrow\)). | ||||
| If you don't know what they mean, or if you're still getting used to them, symbols | ||||
| in judgements are difficult enough to parse. | ||||
| 
 | ||||
| @ -49,8 +49,8 @@ Inductive Constructions is a doozy, for instance. | ||||
| 
 | ||||
| {{< figure src="CIC.png" caption="The `match` inference rule from [Introduction to the Calculus of Inductive Constructions](https://inria.hal.science/hal-01094195/document) by Christine Paulin-Mohring" class="fullwide" >}} | ||||
| 
 | ||||
| Just look at the metavariables! We have \\(\\textit{pars}\\), \\(t_1\\) through \\(t_p\\),  | ||||
| \\(x_1\\) through \\(x_n\\), plain \\(x\\), and at least two other sets of variables. Not | ||||
| Just look at the metavariables! We have \(\textit{pars}\), \(t_1\) through \(t_p\),  | ||||
| \(x_1\) through \(x_n\), plain \(x\), and at least two other sets of variables. Not | ||||
| only this, but the rule requires at least some familiarity with [GADTs](https://en.wikipedia.org/wiki/Generalized_algebraic_data_type) to understand | ||||
| completely. | ||||
| 
 | ||||
| @ -85,8 +85,8 @@ Let me show you both, and try to explain the two. First, here's the wrong way: | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| This says that the type of adding two _variables_ of type `string` is still `string`. | ||||
| Here, \\(\\Gamma\\) is a _context_, which keeps track of which variable has what | ||||
| type. Writing \\(x : \\text{string} \\in \\Gamma\\) is the same as saying | ||||
| Here, \(\Gamma\) is a _context_, which keeps track of which variable has what | ||||
| type. Writing \(x : \text{string} \in \Gamma\) is the same as saying | ||||
| "we know the variable `x` has type `string`". The whole rule reads, | ||||
| 
 | ||||
| > If the variables `x` and `y` both have type `string`, | ||||
| @ -301,8 +301,8 @@ expression is rendered__. For one, this is a losing battle: we can't possibly | ||||
| keep up with all the notation that people use in PL literature, and even if | ||||
| we focused ourselves on only "beginner" notation, there wouldn't be one way to do it! | ||||
| Different PL papers and texts use slightly different variations of notation. | ||||
| For instance, I render my pairs as \\((a, b)\\), but the very first screenshot | ||||
| in this post demonstrates a PL paper that writes pairs as \\(\\langle a, b \\rangle\\). | ||||
| For instance, I render my pairs as \((a, b)\), but the very first screenshot | ||||
| in this post demonstrates a PL paper that writes pairs as \(\langle a, b \rangle\). | ||||
| Neither way (as far as I know!) is right or wrong. But if we hardcode one, | ||||
| we lose the ability to draw the other. | ||||
| 
 | ||||
|  | ||||
| @ -48,7 +48,7 @@ _expression_ in a programming language (like those in the form `fact(n)`) | ||||
| or a value in that same programming language (like `5`). | ||||
| 
 | ||||
| Dealing with values is rather simple. Most languages have finite numbers, | ||||
| usually with \\(2^{32}\\) values, which have type `int`, | ||||
| usually with \(2^{32}\) values, which have type `int`, | ||||
| `i32`, or something in a similar vein. Most languages also have | ||||
| strings, of which there are as many as you have memory to contain, | ||||
| and which have the type `string`, `String`, or occasionally | ||||
| @ -129,20 +129,20 @@ terminate; that is the [halting problem](https://en.wikipedia.org/wiki/Halting_p | ||||
| So, what do we do? | ||||
| 
 | ||||
| It turns out to be convenient -- formally -- to treat the result of a diverging computation | ||||
| as its own value. This value is usually called 'bottom', and written as \\(\\bot\\). | ||||
| as its own value. This value is usually called 'bottom', and written as \(\bot\). | ||||
| Since in most programming languages, you can write a nonterminating expression or | ||||
| function of any type, this 'bottom' is included in _all_ types. So in fact, the | ||||
| possible values of `unsigned int` are \\(\\bot, 0, 1, 2, ...\\) and so on. | ||||
| As you may have by now guessed, the same is true for a boolean: we have \\(\\bot\\), `true`, and `false`. | ||||
| possible values of `unsigned int` are \(\bot, 0, 1, 2, ...\) and so on. | ||||
| As you may have by now guessed, the same is true for a boolean: we have \(\bot\), `true`, and `false`. | ||||
| 
 | ||||
| ### Haskell and Bottom | ||||
| You may be thinking: | ||||
| 
 | ||||
| > Now he's done it; he's gone off the deep end with all that programming language | ||||
| theory. Tell me, Daniel, where the heck have you ever encountered \\(\\bot\\) in | ||||
| theory. Tell me, Daniel, where the heck have you ever encountered \(\bot\) in | ||||
| code? This question was for a software engineering interview, after all! | ||||
| 
 | ||||
| You're right; I haven't _specifically_ seen the symbol \\(\\bot\\) in my time | ||||
| You're right; I haven't _specifically_ seen the symbol \(\bot\) in my time | ||||
| programming. But I have frequently used an equivalent notation for the same idea: | ||||
| `undefined`. In fact, here's a possible definition of `undefined` in Haskell: | ||||
| 
 | ||||
| @ -152,8 +152,8 @@ undefined = undefined | ||||
| 
 | ||||
| Just like `meaningOfLife`, this is a divergent computation! What's more is that | ||||
| the type of this computation is, in Haskell, `a`. More explicitly -- and retreating | ||||
| to more mathematical notation -- we can write this type as: \\(\\forall \\alpha . \\alpha\\). | ||||
| That is, for any type \\(\\alpha\\), `undefined` has that type! This means | ||||
| to more mathematical notation -- we can write this type as: \(\forall \alpha . \alpha\). | ||||
| That is, for any type \(\alpha\), `undefined` has that type! This means | ||||
| `undefined` can take on _any_ type, and so, we can write: | ||||
| 
 | ||||
| ```Haskell | ||||
| @ -187,7 +187,7 @@ expression. What you're doing is a kind of academic autofellatio. | ||||
| 
 | ||||
| Alright, I can accept this criticism. Perhaps just calling a nonterminating | ||||
| function a value _is_ far-fetched (even though in [denotational semantics](https://en.wikipedia.org/wiki/Denotational_semantics) | ||||
| we _do_ extend types with \\(\\bot\\)). But denotational semantics are not | ||||
| we _do_ extend types with \(\bot\)). But denotational semantics are not | ||||
| the only place where types are implicitly extend with an extra value; | ||||
| let's look at Java. | ||||
| 
 | ||||
| @ -294,7 +294,7 @@ question. Its purpose can be one of two things: | ||||
| 
 | ||||
| * The interviewer expected a long-form response such as this one. | ||||
| This is a weird expectation for a software engineering candidate - | ||||
| how does knowing about \\(\\bot\\), `undefined`, or `null` help in | ||||
| how does knowing about \(\bot\), `undefined`, or `null` help in | ||||
| creating software, especially if this information is irrelevant | ||||
| to the company's language of choice? | ||||
| * The interviewer expected the simple answer. In that case, | ||||
|  | ||||
| @ -17,13 +17,13 @@ and that's what I reached for when making this attempt. | ||||
| #### Expressions and Intrinsics | ||||
| This is mostly the easy part. A UCC expression is one of three things: | ||||
| 
 | ||||
| * An "intrinsic", written \\(i\\), which is akin to a built-in function or command. | ||||
| * A "quote", written \\([e]\\), which takes a UCC expression \\(e\\) and moves it onto the stack (UCC is stack-based). | ||||
| * A composition of several expressions, written \\(e_1\\ e_2\\ \\ldots\\ e_n\\), which effectively evaluates them in order. | ||||
| * An "intrinsic", written \(i\), which is akin to a built-in function or command. | ||||
| * A "quote", written \([e]\), which takes a UCC expression \(e\) and moves it onto the stack (UCC is stack-based). | ||||
| * A composition of several expressions, written \(e_1\ e_2\ \ldots\ e_n\), which effectively evaluates them in order. | ||||
| 
 | ||||
| This is straightforward to define in Coq, but I'm going to make a little simplifying change. | ||||
| Instead of making "composition of \\(n\\) expressions" a core language feature, I'll only | ||||
| allow "composition of \\(e_1\\) and \\(e_2\\)", written \\(e_1\\ e_2\\). This change does not | ||||
| Instead of making "composition of \(n\) expressions" a core language feature, I'll only | ||||
| allow "composition of \(e_1\) and \(e_2\)", written \(e_1\ e_2\). This change does not | ||||
| in any way reduce the power of the language; we can still | ||||
| {{< sidenote "right" "assoc-note" "write \(e_1\ e_2\ \ldots\ e_n\) as \((e_1\ e_2)\ \ldots\ e_n\)." >}} | ||||
| The same expression can, of course, be written as \(e_1\ \ldots\ (e_{n-1}\ e_n)\). | ||||
| @ -42,10 +42,10 @@ of an inductive data type in Coq. | ||||
| 
 | ||||
| {{< codelines "Coq" "dawn/Dawn.v" 12 15 >}} | ||||
| 
 | ||||
| Why do we need `e_int`? We do because a token like \\(\\text{swap}\\) can be viewed | ||||
| as belonging to the set of intrinsics \\(i\\), or the set of expressions, \\(e\\). While writing | ||||
| Why do we need `e_int`? We do because a token like \(\text{swap}\) can be viewed | ||||
| as belonging to the set of intrinsics \(i\), or the set of expressions, \(e\). While writing | ||||
| down the rules in mathematical notation, what exactly the token means is inferred from context - clearly | ||||
| \\(\\text{swap}\\ \\text{drop}\\) is an expression built from two other expressions. In statically-typed | ||||
| \(\text{swap}\ \text{drop}\) is an expression built from two other expressions. In statically-typed | ||||
| functional languages like Coq or Haskell, however, the same expression can't belong to two different, | ||||
| arbitrary types. Thus, to turn an intrinsic into an expression, we need to wrap it up in a constructor, | ||||
| which we called `e_int` here. Other than that, `e_quote` accepts as argument another expression, `e` (the | ||||
| @ -69,12 +69,12 @@ Inductive value := | ||||
|     | v_quot (e : expr). | ||||
| ``` | ||||
| 
 | ||||
| Then, `v_quot (e_int swap)` would be the Coq translation of the expression \\([\\text{swap}]\\).  | ||||
| Then, `v_quot (e_int swap)` would be the Coq translation of the expression \([\text{swap}]\).  | ||||
| However, I didn't decide on this approach for two reasons: | ||||
| 
 | ||||
| * There are now two ways to write a quoted expression: either `v_quote e` to represent | ||||
|   a quoted expression that is a value, or `e_quote e` to represent a quoted expression | ||||
|   that is just an expression. In the extreme case, the value \\([[e]]\\) would | ||||
|   that is just an expression. In the extreme case, the value \([[e]]\) would | ||||
|   be represented by `v_quote (e_quote e)` - two different constructors for the same concept, | ||||
|   in the same expression! | ||||
| * When formalizing the lambda calculus, | ||||
| @ -88,9 +88,9 @@ but also such that here is no way to prove `IsValue (e_int swap)`, since _that_ | ||||
| a value. But what does "provable" mean, here? | ||||
| 
 | ||||
| By the [Curry-Howard correspondence](https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspondence), | ||||
| a predicate is just a function that takes _something_ and returns a type. Thus, if \\(\\text{Even}\\) | ||||
| is a predicate, then \\(\\text{Even}\\ 3\\) is actually a type. Since \\(\\text{Even}\\) takes | ||||
| numbers in, it is a predicate on numbers. Our \\(\\text{IsValue}\\) predicate will be a predicate | ||||
| a predicate is just a function that takes _something_ and returns a type. Thus, if \(\text{Even}\) | ||||
| is a predicate, then \(\text{Even}\ 3\) is actually a type. Since \(\text{Even}\) takes | ||||
| numbers in, it is a predicate on numbers. Our \(\text{IsValue}\) predicate will be a predicate | ||||
| on expressions, instead. In Coq, we can write this as: | ||||
| 
 | ||||
| {{< codelines "Coq" "dawn/Dawn.v" 19 19 >}} | ||||
| @ -178,8 +178,8 @@ Although the stacks are written in reverse order (which is just a consequence of | ||||
| I hope that the correspondence is fairly clear. If it's not, try reading this rule out loud: | ||||
| 
 | ||||
| > The rule `Sem_swap` says that for every two values `v` and `v'`, and for any stack `vs`, | ||||
|   evaluating `swap` in the original stack `v' :: v :: vs`, aka \\(\\langle V, v, v'\\rangle\\), | ||||
|   results in a final stack `v :: v' :: vs`, aka \\(\\langle V, v', v\\rangle\\). | ||||
|   evaluating `swap` in the original stack `v' :: v :: vs`, aka \(\langle V, v, v'\rangle\), | ||||
|   results in a final stack `v :: v' :: vs`, aka \(\langle V, v', v\rangle\). | ||||
| 
 | ||||
| With that in mind, here's a definition of a predicate `Sem_int`, the evaluation predicate | ||||
| for intrinsics: | ||||
| @ -189,8 +189,8 @@ for intrinsics: | ||||
| Hey, what's all this with `v_quote` and `projT1`? It's just a little bit of bookkeeping. | ||||
| Given a value -- a pair of an expression `e` and a proof `IsValue e` -- the function `projT1` | ||||
| just returns the expression `e`. That is, it's basically a way of converting a value back into | ||||
| an expression. The function `v_quote` takes us in the other direction: given an expression \\(e\\), | ||||
| it constructs a quoted expression \\([e]\\), and combines it with a proof that the newly constructed | ||||
| an expression. The function `v_quote` takes us in the other direction: given an expression \(e\), | ||||
| it constructs a quoted expression \([e]\), and combines it with a proof that the newly constructed | ||||
| quote is a value. | ||||
| 
 | ||||
| The above two function in combination help us define the `quote` intrinsic, which | ||||
| @ -222,8 +222,8 @@ true and false. Why don't we do the same thing? | ||||
| {{< foldtable >}} | ||||
| |UCC Spec| Coq encoding | | ||||
| |---|----| | ||||
| |\\(\\text{false}\\)=\\([\\text{drop}]\\)| {{< codelines "Coq" "dawn/Dawn.v" 41 42 >}} | ||||
| |\\(\\text{true}\\)=\\([\\text{swap} \\ \\text{drop}]\\)| {{< codelines "Coq" "dawn/Dawn.v" 44 45 >}} | ||||
| |\(\text{false}\)=\([\text{drop}]\)| {{< codelines "Coq" "dawn/Dawn.v" 41 42 >}} | ||||
| |\(\text{true}\)=\([\text{swap} \ \text{drop}]\)| {{< codelines "Coq" "dawn/Dawn.v" 44 45 >}} | ||||
| 
 | ||||
| Let's try prove that these two work as intended. | ||||
| 
 | ||||
| @ -234,38 +234,38 @@ I invite you to take a look at the "shape" of the proof: | ||||
| 
 | ||||
| * After the initial use of `intros`, which brings the variables `v`, `v`, and `vs` into | ||||
|   scope, we start by applying `Sem_e_comp`. Intuitively, this makes sense - at the top level, | ||||
|   our expression, \\(\\text{false}\\ \\text{apply}\\), | ||||
|   is a composition of two other expressions, \\(\\text{false}\\) and \\(\\text{apply}\\). | ||||
|   our expression, \(\text{false}\ \text{apply}\), | ||||
|   is a composition of two other expressions, \(\text{false}\) and \(\text{apply}\). | ||||
|   Because of this, we need to use the rule from our semantics that corresponds to composition. | ||||
| * The composition rule requires that we describe the individual effects on the stack of the | ||||
|   two constituent expressions (recall that the first expression takes us from the initial stack `v1` | ||||
|   to some intermediate stack `v2`, and the second expression takes us from that stack `v2` to the | ||||
|   final stack `v3`). Thus, we have two "bullet points": | ||||
|   * The first expression, \\(\\text{false}\\), is just a quoted expression. Thus, the rule | ||||
|   * The first expression, \(\text{false}\), is just a quoted expression. Thus, the rule | ||||
|     `Sem_e_quote` applies, and the contents of the quote are puhsed onto the stack. | ||||
|   * The second expression, \\(\\text{apply}\\), is an intrinsic, so we need to use the rule | ||||
|   * The second expression, \(\text{apply}\), is an intrinsic, so we need to use the rule | ||||
|     `Sem_e_int`, which handles the intrinsic case. This, in turn, requires that we show | ||||
|     the effect of the intrinsic itself; the `apply` intrinsic evaluates the quoted expression | ||||
|     on the stack. | ||||
|     The quoted expression contains the body of false, or \\(\\text{drop}\\). This is | ||||
|     once again an intrinsic, so we use `Sem_e_int`; the intrinsic in question is \\(\\text{drop}\\), | ||||
|     The quoted expression contains the body of false, or \(\text{drop}\). This is | ||||
|     once again an intrinsic, so we use `Sem_e_int`; the intrinsic in question is \(\text{drop}\), | ||||
|     so the `Sem_drop` rule takes care of that. | ||||
| 
 | ||||
| Following these steps, we arrive at the fact that evaluating `false` on the stack simply drops the top | ||||
| element, as specified. The proof for \\(\\text{true}\\) is very similar in spirit: | ||||
| element, as specified. The proof for \(\text{true}\) is very similar in spirit: | ||||
| 
 | ||||
| {{< codelines "Coq" "dawn/Dawn.v" 55 63 >}} | ||||
| 
 | ||||
| We can also formalize the \\(\\text{or}\\) operator: | ||||
| We can also formalize the \(\text{or}\) operator: | ||||
| 
 | ||||
| {{< foldtable >}} | ||||
| |UCC Spec| Coq encoding | | ||||
| |---|----| | ||||
| |\\(\\text{or}\\)=\\(\\text{clone}\\ \\text{apply}\\)| {{< codelines "Coq" "dawn/Dawn.v" 65 65 >}} | ||||
| |\(\text{or}\)=\(\text{clone}\ \text{apply}\)| {{< codelines "Coq" "dawn/Dawn.v" 65 65 >}} | ||||
| 
 | ||||
| We can write two top-level proofs about how this works: the first says that \\(\\text{or}\\), | ||||
| when the first argument is \\(\\text{false}\\), just returns the second argument (this is in agreement | ||||
| with the truth table, since \\(\\text{false}\\) is the identity element of \\(\\text{or}\\)). | ||||
| We can write two top-level proofs about how this works: the first says that \(\text{or}\), | ||||
| when the first argument is \(\text{false}\), just returns the second argument (this is in agreement | ||||
| with the truth table, since \(\text{false}\) is the identity element of \(\text{or}\)). | ||||
| The proof proceeds much like before: | ||||
| 
 | ||||
| {{< codelines "Coq" "dawn/Dawn.v" 67 73 >}} | ||||
| @ -276,19 +276,19 @@ Because of this, in this proof writing `apply Sem_apply...` is the same | ||||
| as `apply Sem_apply. apply Sem_e_int`. Since the `Sem_e_int` rule is used a lot, this makes for a | ||||
| very convenient shorthand. | ||||
| 
 | ||||
| Similarly, we prove that \\(\\text{or}\\) applied to \\(\\text{true}\\) always returns \\(\\text{true}\\). | ||||
| Similarly, we prove that \(\text{or}\) applied to \(\text{true}\) always returns \(\text{true}\). | ||||
| 
 | ||||
| {{< codelines "Coq" "dawn/Dawn.v" 75 83 >}} | ||||
| 
 | ||||
| Finally, the specific facts (like \\(\\text{false}\\ \\text{or}\\ \\text{false}\\) evaluating to \\(\\text{false}\\)) | ||||
| Finally, the specific facts (like \(\text{false}\ \text{or}\ \text{false}\) evaluating to \(\text{false}\)) | ||||
| can be expressed using our two new proofs, `or_false_v` and `or_true`. | ||||
| 
 | ||||
| {{< codelines "Coq" "dawn/Dawn.v" 85 88 >}} | ||||
| 
 | ||||
| ### Derived Expressions | ||||
| #### Quotes | ||||
| The UCC specification defines \\(\\text{quote}_n\\) to make it more convenient to quote | ||||
| multiple terms. For example, \\(\\text{quote}_2\\) composes and quotes the first two values | ||||
| The UCC specification defines \(\text{quote}_n\) to make it more convenient to quote | ||||
| multiple terms. For example, \(\text{quote}_2\) composes and quotes the first two values | ||||
| on the stack. This is defined in terms of other UCC expressions as follows: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -300,11 +300,11 @@ We can write this in Coq as follows: | ||||
| {{< codelines "Coq" "dawn/Dawn.v" 90 94 >}} | ||||
| 
 | ||||
| This definition diverges slightly from the one given in the UCC specification; particularly, | ||||
| UCC's spec mentions that \\(\\text{quote}_n\\) is only defined for \\(n \\geq 1\\).However, | ||||
| UCC's spec mentions that \(\text{quote}_n\) is only defined for \(n \geq 1\).However, | ||||
| this means that in our code, we'd have to somehow handle the error that would arise if the | ||||
| term \\(\\text{quote}\_0\\) is used. Instead, I defined `quote_n n` to simply mean | ||||
| \\(\\text{quote}\_{n+1}\\); thus, in Coq, no matter what `n` we use, we will have a valid | ||||
| expression, since `quote_n 0` will simply correspond to \\(\\text{quote}_1 = \\text{quote}\\). | ||||
| term \(\text{quote}_0\) is used. Instead, I defined `quote_n n` to simply mean | ||||
| \(\text{quote}_{n+1}\); thus, in Coq, no matter what `n` we use, we will have a valid | ||||
| expression, since `quote_n 0` will simply correspond to \(\text{quote}_1 = \text{quote}\). | ||||
| 
 | ||||
| We can now attempt to prove that this definition is correct by ensuring that the examples given | ||||
| in the specification are valid. We may thus write, | ||||
| @ -315,7 +315,7 @@ We used a new tactic here, `repeat`, but overall, the structure of the proof is | ||||
| the definition of `quote_n` consists of many intrinsics, and we apply the corresponding rules | ||||
| one-by-one until we arrive at the final stack. Writing this proof was kind of boring, since | ||||
| I just had to see which intrinsic is being used in each step, and then write a line of `apply` | ||||
| code to handle that intrinsic. This gets worse for \\(\\text{quote}_3\\): | ||||
| code to handle that intrinsic. This gets worse for \(\text{quote}_3\): | ||||
| 
 | ||||
| {{< codelines "Coq" "dawn/Dawn.v" 108 122 >}} | ||||
| 
 | ||||
| @ -325,10 +325,10 @@ to write proofs like this itself. Here's what I came up with: | ||||
| {{< codelines "Coq" "dawn/Dawn.v" 124 136 >}} | ||||
| 
 | ||||
| You don't have to understand the details, but in brief, this checks what kind of proof | ||||
| we're asking Coq to do (for instance, if we're trying to prove that a \\(\\text{swap}\\) | ||||
| we're asking Coq to do (for instance, if we're trying to prove that a \(\text{swap}\) | ||||
| instruction has a particular effect), and tries to apply a corresponding semantic rule. | ||||
| Thus, it will try `Sem_swap` if the expression is \\(\\text{swap}\\),  | ||||
| `Sem_clone` if the expression is \\(\\text{clone}\\), and so on. Then, the two proofs become: | ||||
| Thus, it will try `Sem_swap` if the expression is \(\text{swap}\),  | ||||
| `Sem_clone` if the expression is \(\text{clone}\), and so on. Then, the two proofs become: | ||||
| 
 | ||||
| {{< codelines "Coq" "dawn/Dawn.v" 138 144 >}} | ||||
| 
 | ||||
| @ -345,7 +345,7 @@ Or, in Coq, | ||||
| 
 | ||||
| {{< codelines "Coq" "dawn/Dawn.v" 148 149 >}} | ||||
| 
 | ||||
| This is the trick to how \\(\\text{rotate}_n\\) works: it creates a quote of \\(n\\) reordered and composed | ||||
| This is the trick to how \(\text{rotate}_n\) works: it creates a quote of \(n\) reordered and composed | ||||
| values on the stack, and then evaluates that quote. Since evaluating each value | ||||
| just places it on the stack, these values end up back on the stack, in the same order that they | ||||
| were in the quote. When writing the proof, `solve_basic ()` gets us almost all the way to the | ||||
| @ -357,7 +357,7 @@ placed back on the stack. | ||||
| 
 | ||||
| ### `e_comp` is Associative | ||||
| When composing three expressions, which way of inserting parentheses is correct? | ||||
| Is it \\((e_1\\ e_2)\\ e_3\\)? Or is it \\(e_1\\ (e_2\\ e_3)\\)? Well, both! | ||||
| Is it \((e_1\ e_2)\ e_3\)? Or is it \(e_1\ (e_2\ e_3)\)? Well, both! | ||||
| Expression composition is associative, which means that the order of the parentheses | ||||
| doesn't matter. We state this in the following theorem, which says that the two | ||||
| ways of writing the composition, if they evaluate to anything, evaluate to the same thing. | ||||
| @ -372,8 +372,8 @@ says: | ||||
| 
 | ||||
| I think that UCC is definitely getting there: formally defining the semantics outlined | ||||
| on the page was quite straightforward. We can now have complete confidence in the behavior | ||||
| of \\(\\text{true}\\), \\(\\text{false}\\), \\(\\text{or}\\), \\(\\text{quote}_n\\) and | ||||
| \\(\\text{rotate}_n\\). The proof of associativity is also enough to possibly argue for simplifying | ||||
| of \(\text{true}\), \(\text{false}\), \(\text{or}\), \(\text{quote}_n\) and | ||||
| \(\text{rotate}_n\). The proof of associativity is also enough to possibly argue for simplifying | ||||
| the core calculus' syntax even more. All of this we got from an official source, with only | ||||
| a little bit of tweaking to get from the written description of the language to code! I'm | ||||
| looking forward to reading the next post about the _multistack_ concatenative calculus. | ||||
|  | ||||
| @ -52,7 +52,7 @@ I replaced instances of `projT1` with instances of `value_to_expr`. | ||||
| At the end of my previous article, we ended up with a Coq encoding of the big-step | ||||
| [operational semantics](https://en.wikipedia.org/wiki/Operational_semantics) | ||||
| of UCC, as well as some proofs of correctness about the derived forms from | ||||
| the article (like \\(\\text{quote}_3\\) and \\(\\text{rotate}_3\\)). The trouble | ||||
| the article (like \(\text{quote}_3\) and \(\text{rotate}_3\)). The trouble | ||||
| is, despite having our operational semantics, we can't make our Coq | ||||
| code run anything. This is for several reasons: | ||||
| 
 | ||||
| @ -97,10 +97,10 @@ We can now write a function that tries to execute a single step given an express | ||||
| {{< codelines "Coq" "dawn/DawnEval.v" 11 27 >}} | ||||
| 
 | ||||
| Most intrinsics, by themselves, complete after just one step. For instance, a program | ||||
| consisting solely of \\(\\text{swap}\\) will either fail (if the stack doesn't have enough | ||||
| consisting solely of \(\text{swap}\) will either fail (if the stack doesn't have enough | ||||
| values), or it will swap the top two values and be done. We list only "correct" cases, | ||||
| and resort to a "catch-all" case on line 26 that returns an error. The one multi-step | ||||
| intrinsic is \\(\\text{apply}\\), which can evaluate an arbitrary expression from the stack. | ||||
| intrinsic is \(\text{apply}\), which can evaluate an arbitrary expression from the stack. | ||||
| In this case, the "one step" consists of popping the quoted value from the stack; the | ||||
| "remaining program" is precisely the expression that was popped. | ||||
| 
 | ||||
| @ -109,10 +109,10 @@ expression on the stack). The really interesting case is composition. Expression | ||||
| are evaluated left-to-right, so we first determine what kind of step the left expression (`e1`) | ||||
| can take. We may need more than one step to finish up with `e1`, so there's a good chance it | ||||
| returns a "rest program" `e1'` and a stack `vs'`. If this happens, to complete evaluation of | ||||
| \\(e_1\\ e_2\\), we need to first finish evaluating \\(e_1'\\), and then evaluate \\(e_2\\). | ||||
| Thus, the "rest of the program" is \\(e_1'\\ e_2\\), or `e_comp e1' e2`. On the other hand, | ||||
| \(e_1\ e_2\), we need to first finish evaluating \(e_1'\), and then evaluate \(e_2\). | ||||
| Thus, the "rest of the program" is \(e_1'\ e_2\), or `e_comp e1' e2`. On the other hand, | ||||
| if `e1` finished evaluating, we still need to evaluate `e2`, so our "rest of the program" | ||||
| is \\(e_2\\), or `e2`. If evaluating `e1` led to an error, then so did evaluating `e_comp e1 e2`, | ||||
| is \(e_2\), or `e2`. If evaluating `e1` led to an error, then so did evaluating `e_comp e1 e2`, | ||||
| and we return `err`. | ||||
| 
 | ||||
| ### Extracting Code | ||||
| @ -280,9 +280,9 @@ eval_chain nil (e_comp (e_comp true false) or) (true :: nil) | ||||
| ``` | ||||
| 
 | ||||
| This is the type of sequences (or chains) of steps corresponding to the | ||||
| evaluation of the program \\(\\text{true}\\ \\text{false}\\ \\text{or}\\), | ||||
| evaluation of the program \(\text{true}\ \text{false}\ \text{or}\), | ||||
| starting in an empty stack and evaluating to a stack with only | ||||
| \\(\\text{true}\\) on top. Thus to say that an expression evaluates to some | ||||
| \(\text{true}\) on top. Thus to say that an expression evaluates to some | ||||
| final stack `vs'`, in _some unknown number of steps_, it's sufficient to write: | ||||
| 
 | ||||
| ```Coq | ||||
| @ -366,9 +366,9 @@ data types, `Sem_expr` and `Sem_int`, each of which contains the other. | ||||
| Regular proofs by induction, which work on only one of the data types, | ||||
| break down because we can't make claims "by induction" about the _other_ | ||||
| type. For example, while going by induction on `Sem_expr`, we run into | ||||
| issues in the `e_int` case while handling \\(\\text{apply}\\). We know | ||||
| issues in the `e_int` case while handling \(\text{apply}\). We know | ||||
| a single step -- popping the value being run from the stack. But what then? | ||||
| The rule for \\(\\text{apply}\\) in `Sem_int` contains another `Sem_expr` | ||||
| The rule for \(\text{apply}\) in `Sem_int` contains another `Sem_expr` | ||||
| which confirms that the quoted value properly evaluates. But this other | ||||
| `Sem_expr` isn't directly a child of the "bigger" `Sem_expr`, and so we | ||||
| don't get an inductive hypothesis about it. We know nothing about it; we're stuck. | ||||
| @ -399,7 +399,7 @@ ways an intrinsic can be evaluated. Most intrinsics are quite boring; | ||||
| in our evaluator, they only need a single step, and their semantics | ||||
| rules guarantee that the stack has the exact kind of data that the evaluator | ||||
| expects. We dismiss such cases with `apply chain_final; auto`. The only | ||||
| time this doesn't work is when we encounter the \\(\\text{apply}\\) intrinsic; | ||||
| time this doesn't work is when we encounter the \(\text{apply}\) intrinsic; | ||||
| in that case, however, we can simply use the first theorem we proved. | ||||
| 
 | ||||
| #### A Quick Aside: Automation Using `Ltac2` | ||||
| @ -498,7 +498,7 @@ is used to run code for each one of these goals. In the non-empty list case, we | ||||
| apart its tail as necessary by recursively calling `destruct_n`. | ||||
| 
 | ||||
| That's pretty much it! We can now use our tactic from Coq like any other. Rewriting | ||||
| our earlier proof about \\(\\text{swap}\\), we now only need to handle the valid case: | ||||
| our earlier proof about \(\text{swap}\), we now only need to handle the valid case: | ||||
| 
 | ||||
| {{< codelines "Coq" "dawn/DawnEval.v" 248 254 >}} | ||||
| 
 | ||||
| @ -625,8 +625,8 @@ We now have a verified UCC evaluator in Haskell. What next? | ||||
|    as syntax sugar in our parser. Using a variable would be the same as simply | ||||
|    pasting in the variable's definition. This is pretty much what the Dawn article | ||||
|    seems to be doing. | ||||
| 3. We could prove more things. Can we confirm, once and for all, the correctness of \\(\\text{quote}_n\\), | ||||
|    for _any_ \\(n\\)? Is there is a generalized way of converting inductive data types into a UCC encoding? | ||||
| 3. We could prove more things. Can we confirm, once and for all, the correctness of \(\text{quote}_n\), | ||||
|    for _any_ \(n\)? Is there is a generalized way of converting inductive data types into a UCC encoding? | ||||
|    Or could we perhaps formally verify the following comment from Lobsters: | ||||
| 
 | ||||
|    > with the encoding of natural numbers given, n+1 contains the definition of n duplicated two times. | ||||
|  | ||||
| @ -7,7 +7,7 @@ tags: ["Coq"] | ||||
| --- | ||||
| 
 | ||||
| Recently, I wrote an article on [Formalizing Dawn's Core Calculus in Coq]({{< relref "./coq_dawn.md" >}}). | ||||
| One of the proofs (specifically, correctness of \\(\\text{quote}_3\\)) was the best candidate I've ever | ||||
| One of the proofs (specifically, correctness of \(\text{quote}_3\)) was the best candidate I've ever | ||||
| encountered for proof automation. I knew that proof automation was possible from the second book of Software | ||||
| Foundations, [Programming Language Foundations](https://softwarefoundations.cis.upenn.edu/plf-current/index.html). | ||||
| I went there to learn more, and started my little journey in picking up Coq's `Ltac2`. | ||||
|  | ||||
| @ -193,9 +193,9 @@ duplicate results (it's easier to check for duplicate definitions if you | ||||
| know a symbol is only returned from a search once). So, another feature of | ||||
| Dyno's scope search is an additional bitfield of what to _exclude_, which we | ||||
| set to be the previous search's filter. So if the first search looked for | ||||
| symbols matching description \\(A\\), and the second search is supposed to | ||||
| look for symbols matching description \\(B\\), __then really we do a search | ||||
| for \\(A \\land \\lnot B\\) (that is, \\(A\\) and not \\(B\\))__. | ||||
| symbols matching description \(A\), and the second search is supposed to | ||||
| look for symbols matching description \(B\), __then really we do a search | ||||
| for \(A \land \lnot B\) (that is, \(A\) and not \(B\))__. | ||||
| 
 | ||||
| {{< dialog >}} | ||||
| {{< message "question" "reader" >}} | ||||
| @ -612,7 +612,7 @@ set of flags in the `curFilter` bitfield. | ||||
| ### Modeling Search State | ||||
| 
 | ||||
| Next, I needed to model the behavior the I described earlier: searching for | ||||
| \\(A \\land \\lnot B\\), and taking the intersection of past searches | ||||
| \(A \land \lnot B\), and taking the intersection of past searches | ||||
| when running subsequent searches. | ||||
| 
 | ||||
| Dyno implemented this roughly as follows: | ||||
| @ -622,7 +622,7 @@ Dyno implemented this roughly as follows: | ||||
| 2. When a scope was searched for the first time, its `curFilter` / search | ||||
|    bitfield was stored into the mapping. | ||||
| 3. When a scope was searched after that, the previously-stored flags in the | ||||
|    mapping were excluded (that's the \\(A\\land\\lnot B\\) behavior), and | ||||
|    mapping were excluded (that's the \(A\land\lnot B\) behavior), and | ||||
|    the bitfield in the mapping was updated to be the intersection of | ||||
|    `curFilter` and the stored flags. | ||||
| 
 | ||||
| @ -723,7 +723,7 @@ should have? | ||||
| The `possibleState`, `updateOrSet`, and `excludeBitfield` | ||||
| lines encode the fact that a search occurred for `fs`. This must be a valid | ||||
| search, and the search state must be modified appropriately. Furthermore, | ||||
| at the time this search takes place, to make the \\(\\lnot B\\) portion of | ||||
| at the time this search takes place, to make the \(\lnot B\) portion of | ||||
| the algorithm work, the bitfield `exclude1` will be set based on the | ||||
| previous search state. | ||||
| 
 | ||||
|  | ||||
| @ -64,7 +64,7 @@ If we kept going with this process infinitely, we'd eventually have what we need | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| But hey, the stuff inside the first set of parentheses is still an infinite sequence of applications | ||||
| of the function \\(\\text{lengthF}\\), and we have just defined this to be \\(\\text{length}\\). Thus, | ||||
| of the function \(\text{lengthF}\), and we have just defined this to be \(\text{length}\). Thus, | ||||
| we can rewrite the above equation as: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -77,7 +77,7 @@ such a fixed point. It's defined like this: | ||||
| 
 | ||||
| {{< codelines "Haskell" "catamorphisms/Cata.hs" 16 16 >}} | ||||
| 
 | ||||
| This definition is as declarative as can be; `fix` returns the \\(x\\) such that \\(x = f(x)\\). With | ||||
| This definition is as declarative as can be; `fix` returns the \(x\) such that \(x = f(x)\). With | ||||
| this, we finally write: | ||||
| 
 | ||||
| {{< codelines "Haskell" "catamorphisms/Cata.hs" 18 18 >}} | ||||
| @ -95,7 +95,7 @@ This is a valid criticism, so I'd like to briefly talk about how `fix` is used i | ||||
| lambda calculus. | ||||
| 
 | ||||
| In the untyped typed lambda calculus, we can just define a term that behaves like `fix` does. The | ||||
| most common definition is the \\(Y\\) combinator, defined as follows: | ||||
| most common definition is the \(Y\) combinator, defined as follows: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| Y = \lambda f. (\lambda x. f (x x)) (\lambda x. f (x x )) | ||||
| @ -107,7 +107,7 @@ When applied to a function, this combinator goes through the following evaluatio | ||||
| Y f = f (Y f) = f (f (Y f)) =\ ... | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| This is the exact sort of infinite series of function applications that we saw above with \\(\\text{lengthF}\\). | ||||
| This is the exact sort of infinite series of function applications that we saw above with \(\text{lengthF}\). | ||||
| 
 | ||||
| ### Recursive Data Types | ||||
| We have now seen how we can rewrite a recursive function as a fixed point of some non-recursive function. | ||||
| @ -152,7 +152,7 @@ Looking past the constructors and accessors, we might write the above in pseudo- | ||||
| newtype Fix f = f (Fix f) | ||||
| ``` | ||||
| 
 | ||||
| This is just like the lambda calculus \\(Y\\) combinator above! Unfortunately, we _do_ have to | ||||
| This is just like the lambda calculus \(Y\) combinator above! Unfortunately, we _do_ have to | ||||
| deal with the cruft induced by the constructors here. Thus, to write down the list `[1,2,3]` | ||||
| using `MyListF`, we'd have to produce the following: | ||||
| 
 | ||||
| @ -163,7 +163,7 @@ This is actually done in practice when using some approaches to help address the | ||||
| it's quite unpleasant to write code in this way, so we'll set it aside. | ||||
| 
 | ||||
| Let's go back to our infinite chain of type applications. We've a similar pattern before,  | ||||
| with \\(\\text{length}\\) and \\(\\text{lengthF}\\). Just like we did then, it seems like | ||||
| with \(\text{length}\) and \(\text{lengthF}\). Just like we did then, it seems like | ||||
| we might be able to write something like the following: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -193,8 +193,8 @@ The two mutual inverses \(f\) and \(g\) fall out of the definition of the <code> | ||||
| data type! If we didn't have to deal with the constructor cruft, this would be more | ||||
| ergonomic than writing our own <code>myIn</code> and <code>myOut</code> functions. | ||||
| {{< /sidenote >}} | ||||
| that take you from one type to the other (and vice versa), such that applying \\(f\\) after \\(g\\), | ||||
| or \\(g\\) after \\(f\\), gets you right back where you started. That is, \\(f\\) and \\(g\\) | ||||
| that take you from one type to the other (and vice versa), such that applying \(f\) after \(g\), | ||||
| or \(g\) after \(f\), gets you right back where you started. That is, \(f\) and \(g\) | ||||
| need to be each other's inverses. For our specific case, let's call the two functions `myOut` | ||||
| and `myIn` (I'm matching the naming in [this paper](https://maartenfokkinga.github.io/utwente/mmf91m.pdf)). They are not hard to define: | ||||
| 
 | ||||
|  | ||||
| @ -9,7 +9,7 @@ For Oregon State University's CS 381, Programming Language Fundamentals, Haskell | ||||
| 
 | ||||
| ### Prelude: Language Server Protocol | ||||
| 
 | ||||
| Language Server Protocol (LSP) is an attempt to simplify the code analysis feature in various IDEs. Before LSP, each individual IDE had to provide its own functionality for analyzing user code, and couldn't re-use the code analysis features of another IDE. Thus, for every IDE, for every language, a new code analyzer had to be developed. For \\(m\\) IDEs and \\(n\\) languages, there would need to be \\(m \times n\\) code analyzers. This is less than ideal. LSP solves this problem by moving the task of analyzing code into an external process, called a _Language Server_. The language server examines code, finds errors and warnings, and figures out the names of the various variables and functions that the source file contains. It then communicates this information to a _Language Client_, which is usually the IDE. The language client is responsible for presenting the information to the user in any way it deems fit. | ||||
| Language Server Protocol (LSP) is an attempt to simplify the code analysis feature in various IDEs. Before LSP, each individual IDE had to provide its own functionality for analyzing user code, and couldn't re-use the code analysis features of another IDE. Thus, for every IDE, for every language, a new code analyzer had to be developed. For \(m\) IDEs and \(n\) languages, there would need to be \(m \times n\) code analyzers. This is less than ideal. LSP solves this problem by moving the task of analyzing code into an external process, called a _Language Server_. The language server examines code, finds errors and warnings, and figures out the names of the various variables and functions that the source file contains. It then communicates this information to a _Language Client_, which is usually the IDE. The language client is responsible for presenting the information to the user in any way it deems fit. | ||||
| 
 | ||||
| Because LSP is a _protocol_, any server is able to communicate with any client. Once someone has written a language client for, say, Haskell, this client can be used by any IDE that supports LSP, which means that IDE instantly gains Haskell support. Similarly, once an IDE works as an LSP client, it can use any language server, and thus is immediately able to support all languages for which there is a language client. | ||||
| 
 | ||||
|  | ||||
| @ -40,7 +40,7 @@ starting with integers. | ||||
| 
 | ||||
| #### Integers | ||||
| Addition is an associative binary operation. Furthermore, it's well-known that adding zero to a | ||||
| number leaves that number intact: \\(0+n = n + 0 = n\\). So we might define a `Monoid` instance for | ||||
| number leaves that number intact: \(0+n = n + 0 = n\). So we might define a `Monoid` instance for | ||||
| numbers as follows. Note that we actually provide `(<>)` via the `Semigroup` class, | ||||
| which _just_ requires the associative binary operation, and serves as a superclass for `Monoid`. | ||||
| 
 | ||||
| @ -54,7 +54,7 @@ instance Monoid Int where | ||||
| 
 | ||||
| Cool and good. But hey, there are other binary operations on integers! What about | ||||
| multiplication? It is also associative, and again it is well-known that multiplying | ||||
| anything by one leaves that number as it was: \\(1\*n = n\*1 = n\\). The corresponding | ||||
| anything by one leaves that number as it was: \(1\*n = n\*1 = n\). The corresponding | ||||
| `Monoid` instance would be something like the following: | ||||
| 
 | ||||
| ```Haskell | ||||
|  | ||||
| @ -135,7 +135,7 @@ Just like in our previous examples, we simply replace `f` with `double` during a | ||||
| We've now created our multiply-by-four function! We can create other functions the same way. For instance, if we had a function `halve`, we could create a function to divide a number by 4 by applying our "apply-twice" function to it. | ||||
| 
 | ||||
| ### Church Encoded Integers | ||||
| We can represent numbers using just abstraction and application. This is called Church encoding. In church encoding, a number is a function that takes another function, and applies it that many times to a value. 0 would take a function and a value, and apply it no times, returning the value it was given. 1 would take a function `f` and a value `v`, and return the result of applying that function `f` one time: `f v`. We've already seen 2! It's a function that applies a function to a value twice, `f (f v)`. Let's write these numbers in lambda calculus: 0 becomes \\(\lambda \ f \ . \ \lambda \ x \ . \ x\\), 1 becomes \\(\lambda \ f \ . \ \lambda \ x \ . \ f \ x\\), and 2 becomes the familiar \\(\lambda \ f \ . \lambda \ x \ . \ f \ (f \ x)\\). | ||||
| We can represent numbers using just abstraction and application. This is called Church encoding. In church encoding, a number is a function that takes another function, and applies it that many times to a value. 0 would take a function and a value, and apply it no times, returning the value it was given. 1 would take a function `f` and a value `v`, and return the result of applying that function `f` one time: `f v`. We've already seen 2! It's a function that applies a function to a value twice, `f (f v)`. Let's write these numbers in lambda calculus: 0 becomes \(\lambda \ f \ . \ \lambda \ x \ . \ x\), 1 becomes \(\lambda \ f \ . \ \lambda \ x \ . \ f \ x\), and 2 becomes the familiar \(\lambda \ f \ . \lambda \ x \ . \ f \ (f \ x)\). | ||||
| 
 | ||||
| Now, let's try represent addition. Addition of two numbers `a` and `b` would be done by taking a function `f` and applying it the first number of times, and then applying it the second number more times. Since addition must take in numbers `a` and `b`, which are functions of two variables, and return a number, we will end up with | ||||
| something like this: | ||||
|  | ||||
| @ -49,7 +49,7 @@ in common with each other and with the early iterations of my website - | ||||
| they use client-side rendering for static content. In my opinion, this is absurd. | ||||
| Every time that you visit a website that uses MathJax or KaTeX, any mathematical | ||||
| notation arrives to your machine in the form of LaTex markup. For instance, | ||||
| \\(e^{-\\frac{x}{2}}\\) looks like `e^{-\frac{x}{2}}`. The rendering | ||||
| \(e^{-\frac{x}{2}}\) looks like `e^{-\frac{x}{2}}`. The rendering | ||||
| software (MathJax or KaTeX) then takes this markup and converts it into | ||||
| HTML and CSS that your browser can display. Just like my old website, | ||||
| all of this happens __every time the page is loaded__. This isn't an uncommon | ||||
|  | ||||
| @ -76,15 +76,15 @@ If a number doesn't cleanly divide another (we're sticking to integers here), | ||||
| what's left behind is the remainder. For instance, dividing 7 by 3 leaves us with a remainder 1. | ||||
| On the other hand, if the remainder is zero, then that means that our dividend is divisible by the | ||||
| divisor (what a mouthful). In mathematics, we typically use | ||||
| \\(a|b\\) to say \\(a\\) divides \\(b\\), or, as we have seen above, that the remainder of dividing | ||||
| \\(b\\) by \\(a\\) is zero. | ||||
| \(a|b\) to say \(a\) divides \(b\), or, as we have seen above, that the remainder of dividing | ||||
| \(b\) by \(a\) is zero. | ||||
| 
 | ||||
| Working with remainders actually comes up pretty frequently in discrete math. A well-known | ||||
| example I'm aware of is the [RSA algorithm](https://en.wikipedia.org/wiki/RSA_(cryptosystem)), | ||||
| which works with remainders resulting from dividing by a product of two large prime numbers. | ||||
| But what's a good way to write, in numbers and symbols, the claim that "\\(a\\) divides \\(b\\) | ||||
| with remainder \\(r\\)"? Well, we know that dividing yields a quotient (possibly zero) and a remainder | ||||
| (also possibly zero). Let's call the quotient \\(q\\). | ||||
| But what's a good way to write, in numbers and symbols, the claim that "\(a\) divides \(b\) | ||||
| with remainder \(r\)"? Well, we know that dividing yields a quotient (possibly zero) and a remainder | ||||
| (also possibly zero). Let's call the quotient \(q\). | ||||
| {{< sidenote "right" "r-less-note" "Then, we know that when dividing \(b\) by \(a\) we have:" >}} | ||||
| It's important to point out that for the equation in question to represent division | ||||
| with quotient \(q\) and remainder \(r\), it must be that \(r\) is less than \(a\). | ||||
| @ -111,7 +111,7 @@ larger than \(q\). | ||||
| 
 | ||||
| We only really care about the remainder here, not the quotient, since it's the remainder | ||||
| that determines if something is divisible or not. From the form of the second equation, we can | ||||
| deduce that \\(b-r\\) is divisible by \\(a\\) (it's literally equal to \\(a\\) times \\(q\\), | ||||
| deduce that \(b-r\) is divisible by \(a\) (it's literally equal to \(a\) times \(q\), | ||||
| so it must be divisible). Thus, we can write: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -134,7 +134,7 @@ section](#adding-two-congruences) to see how it works): | ||||
| a+c \equiv b+d\ (\text{mod}\ k). | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| Multiplying both sides by the same number (call it \\(n\\)) also works (once | ||||
| Multiplying both sides by the same number (call it \(n\)) also works (once | ||||
| again, you can find the proof in [this section below](#multiplying-both-sides-of-a-congruence)). | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -151,7 +151,7 @@ we have: | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| From this, we can deduce that multiplying by 10, when it comes to remainders from dividing by 3, | ||||
| is the same as multiplying by 1. We can clearly see this by multiplying both sides by \\(n\\). | ||||
| is the same as multiplying by 1. We can clearly see this by multiplying both sides by \(n\). | ||||
| In our notation: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -166,8 +166,8 @@ Multiplying by that number is _also_ equivalent to multiplying by 1! | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| We can put this to good use. Let's take a large number that's divisible by 3. This number | ||||
| will be made of multiple digits, like \\(d_2d_1d_0\\). Note that I do __not__ mean multiplication | ||||
| here, but specifically that each \\(d_i\\) is a number between 0 and 9 in a particular place | ||||
| will be made of multiple digits, like \(d_2d_1d_0\). Note that I do __not__ mean multiplication | ||||
| here, but specifically that each \(d_i\) is a number between 0 and 9 in a particular place | ||||
| in the number -- it's a digit. Now, we can write: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -178,11 +178,11 @@ in the number -- it's a digit. Now, we can write: | ||||
| \end{aligned} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| We have just found that \\(d_2+d_1+d_0 \\equiv 0\\ (\\text{mod}\ 3)\\), or that the sum of the digits | ||||
| We have just found that \(d_2+d_1+d_0 \equiv 0\ (\text{mod}\ 3)\), or that the sum of the digits | ||||
| is also divisible by 3. The logic we use works in the other direction, too: if the sum of the digits | ||||
| is divisible, then so is the actual number. | ||||
| 
 | ||||
| There's only one property of the number 3 we used for this reasoning: that \\(10 \\equiv 1\\ (\\text{mod}\\ 3)\\). But it so happens that there's another number that has this property: 9. This means | ||||
| There's only one property of the number 3 we used for this reasoning: that \(10 \equiv 1\ (\text{mod}\ 3)\). But it so happens that there's another number that has this property: 9. This means | ||||
| that to check if a number is divisible by _nine_, we can also check if the sum of the digits is | ||||
| divisible by 9. Try it on 18, 27, 81, and 198. | ||||
| 
 | ||||
| @ -210,7 +210,7 @@ does this work, and why does it always loop around? Why don't we ever spiral far | ||||
| from the center? | ||||
| 
 | ||||
| First, let's take a closer look at our sequence of multiples. Suppose we're working with multiples | ||||
| of some number \\(n\\). Let's write \\(a_i\\) for the \\(i\\)th multiple. Then, we end up with: | ||||
| of some number \(n\). Let's write \(a_i\) for the \(i\)th multiple. Then, we end up with: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \begin{aligned} | ||||
| @ -224,9 +224,9 @@ of some number \\(n\\). Let's write \\(a_i\\) for the \\(i\\)th multiple. Then, | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| This is actually called an [arithmetic sequence](https://mathworld.wolfram.com/ArithmeticProgression.html); | ||||
| for each multiple, the number increases by \\(n\\). | ||||
| for each multiple, the number increases by \(n\). | ||||
| 
 | ||||
| Here's a first seemingly trivial point: at some time, the remainder of \\(a_i\\) will repeat. | ||||
| Here's a first seemingly trivial point: at some time, the remainder of \(a_i\) will repeat. | ||||
| There are only so many remainders when dividing by nine: specifically, the only possible remainders | ||||
| are the numbers 0 through 8. We can invoke the [pigeonhole principle](https://en.wikipedia.org/wiki/Pigeonhole_principle) and say that after 9 multiples, we will have to have looped. Another way | ||||
| of seeing this is as follows: | ||||
| @ -240,7 +240,7 @@ of seeing this is as follows: | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| The 10th multiple is equivalent to n, and will thus have the same remainder. The looping may | ||||
| happen earlier: the simplest case is if we pick 9 as our \\(n\\), in which case the remainder | ||||
| happen earlier: the simplest case is if we pick 9 as our \(n\), in which case the remainder | ||||
| will always be 0. | ||||
| 
 | ||||
| Repeating remainders alone do not guarantee that we will return to the center. The repeating sequence 1,2,3,4 | ||||
| @ -280,15 +280,15 @@ Okay, so we want to avoid cycles with lengths divisible by four. What does it me | ||||
|     \end{aligned} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| If we could divide both sides by \\(k\\), we could go one more step: | ||||
| If we could divide both sides by \(k\), we could go one more step: | ||||
| 
 | ||||
| {{< latex >}} | ||||
|     n \equiv 0\ (\text{mod}\ 9) \\ | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| That is, \\(n\\) would be divisible by 9! This would contradict our choice of \\(n\\) to be | ||||
| between 2 and 8. What went wrong? Turns out, it's that last step: we can't always divide by \\(k\\). | ||||
| Some values of \\(k\\) are special, and it's only _those_ values that can serve as cycle lengths | ||||
| That is, \(n\) would be divisible by 9! This would contradict our choice of \(n\) to be | ||||
| between 2 and 8. What went wrong? Turns out, it's that last step: we can't always divide by \(k\). | ||||
| Some values of \(k\) are special, and it's only _those_ values that can serve as cycle lengths | ||||
| without causing a contradiction. So, what are they? | ||||
| 
 | ||||
| They're values that have a common factor with 9 (an incomplete explanation is in | ||||
| @ -314,8 +314,8 @@ we want to avoid those numbers that would allow for cycles of length 4 (or of a | ||||
| If we didn't avoid them, we might run into infinite loops, where our pencil might end up moving | ||||
| further and further from the center. | ||||
| 
 | ||||
| Actually, let's revisit that. When we were playing with paths of length \\(k\\) while dividing by 9, | ||||
| we noted that the only _possible_ values of \\(k\\) are those that share a common factor with 9, | ||||
| Actually, let's revisit that. When we were playing with paths of length \(k\) while dividing by 9, | ||||
| we noted that the only _possible_ values of \(k\) are those that share a common factor with 9, | ||||
| specifically 3, 6 and 9. But that's not quite as strong as it could be: try as you might, but | ||||
| you will not find a cycle of length 6 when dividing by 9. The same is true if we pick 6 instead of 9, | ||||
| and try to find a cycle of length 4. Even though 4 _does_ have a common factor with 6, and thus | ||||
| @ -326,7 +326,7 @@ So what is it that _really_ determines if there can be cycles or not? | ||||
| Let's do some more playing around. What are the actual cycle lengths when we divide by 9? | ||||
| For all but two numbers, the cycle lengths are 9. The two special numbers are 6 and 3, and they end up | ||||
| with a cycle length of 3. From this, we can say that the cycle length seems to depend on whether or | ||||
| not our \\(n\\) has any common factors with the divisor. | ||||
| not our \(n\) has any common factors with the divisor. | ||||
| 
 | ||||
| Let's explore this some more with a different divisor, say 12. We fill find that 8 has a cycle length | ||||
| of 3, 7 has a cycle length of 12, 9 has a cycle length of 4. What's | ||||
| @ -340,50 +340,50 @@ for the length of a cycle: | ||||
| k = \frac{d}{\text{gcd}(d,n)} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| Where \\(d\\) is our divisor, which has been 9 until just recently, and \\(\\text{gcd}(d,n)\\) | ||||
| is the greatest common factor of \\(d\\) and \\(n\\). This equation is in agreement | ||||
| with our experiment for \\(d = 9\\), too. Why might this be? Recall that sequences with | ||||
| period \\(k\\) imply the following congruence: | ||||
| Where \(d\) is our divisor, which has been 9 until just recently, and \(\text{gcd}(d,n)\) | ||||
| is the greatest common factor of \(d\) and \(n\). This equation is in agreement | ||||
| with our experiment for \(d = 9\), too. Why might this be? Recall that sequences with | ||||
| period \(k\) imply the following congruence: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| kn \equiv 0\ (\text{mod}\ d) | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| Here I've replaced 9 with \\(d\\), since we're trying to make it work for _any_ divisor, not just 9. | ||||
| Now, suppose the greatest common divisor of \\(n\\) and \\(d\\) is some number \\(f\\). Then, | ||||
| since this number divides \\(n\\) and \\(d\\), we can write \\(n=fm\\) for some \\(m\\), and | ||||
| \\(d=fg\\) for some \\(g\\). We can rewrite our congruence as follows: | ||||
| Here I've replaced 9 with \(d\), since we're trying to make it work for _any_ divisor, not just 9. | ||||
| Now, suppose the greatest common divisor of \(n\) and \(d\) is some number \(f\). Then, | ||||
| since this number divides \(n\) and \(d\), we can write \(n=fm\) for some \(m\), and | ||||
| \(d=fg\) for some \(g\). We can rewrite our congruence as follows: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| kfm \equiv 0\ (\text{mod}\ fg) | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| We can simplify this a little bit. Recall that what this congruence really means is that the | ||||
| difference of \\(kfm\\) and \\(0\\), which is just \\(kfm\\), is divisible by \\(fg\\): | ||||
| difference of \(kfm\) and \(0\), which is just \(kfm\), is divisible by \(fg\): | ||||
| 
 | ||||
| {{< latex >}} | ||||
| fg|kfm | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| But if \\(fg\\) divides \\(kfm\\), it must be that \\(g\\) divides \\(km\\)! This, in turn, means | ||||
| But if \(fg\) divides \(kfm\), it must be that \(g\) divides \(km\)! This, in turn, means | ||||
| we can write: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| g|km | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| Can we distill this statement even further? It turns out that we can. Remember that we got \\(g\\) | ||||
| and \\(m\\) by dividing \\(d\\) and \\(n\\) by their greatest common factor, \\(f\\). This, in | ||||
| turn, means that \\(g\\) and \\(m\\) have no more common factors that aren't equal to 1 (see | ||||
| [this section below](#numbers-divided-by-their-textgcd-have-no-common-factors)). From this, in turn, we can deduce that \\(m\\) is not | ||||
| relevant to \\(g\\) dividing \\(km\\), and we get: | ||||
| Can we distill this statement even further? It turns out that we can. Remember that we got \(g\) | ||||
| and \(m\) by dividing \(d\) and \(n\) by their greatest common factor, \(f\). This, in | ||||
| turn, means that \(g\) and \(m\) have no more common factors that aren't equal to 1 (see | ||||
| [this section below](#numbers-divided-by-their-textgcd-have-no-common-factors)). From this, in turn, we can deduce that \(m\) is not | ||||
| relevant to \(g\) dividing \(km\), and we get: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| g|k | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| That is, we get that \\(k\\) must be divisible by \\(g\\). Recall that we got \\(g\\) by dividing | ||||
| \\(d\\) by \\(f\\), which is our largest common factor -- aka \\(\\text{gcd}(d,n)\\). We can thus | ||||
| That is, we get that \(k\) must be divisible by \(g\). Recall that we got \(g\) by dividing | ||||
| \(d\) by \(f\), which is our largest common factor -- aka \(\text{gcd}(d,n)\). We can thus | ||||
| write: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -391,26 +391,26 @@ write: | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| Let's stop and appreciate this result. We have found a condition that is required for a sequnce | ||||
| of remainders from dividing by \\(d\\) (which was 9 in the original problem) to repeat after \\(k\\) | ||||
| numbers. Furthermore, all of our steps can be performed in reverse, which means that if a \\(k\\) | ||||
| of remainders from dividing by \(d\) (which was 9 in the original problem) to repeat after \(k\) | ||||
| numbers. Furthermore, all of our steps can be performed in reverse, which means that if a \(k\) | ||||
| matches this conditon, we can work backwards and determine that a sequence of numbers has | ||||
| to repeat after \\(k\\) steps. | ||||
| to repeat after \(k\) steps. | ||||
| 
 | ||||
| Multiple \\(k\\)s will match this condition, and that's not surprising. If a sequence repeats after 5 steps, | ||||
| Multiple \(k\)s will match this condition, and that's not surprising. If a sequence repeats after 5 steps, | ||||
| it also repeats after 10, 15, and so on. We're interested in the first time our sequences repeat after | ||||
| taking any steps, which means we have to pick the smallest possible non-zero value of \\(k\\). The smallest | ||||
| number divisible by \\(d/\\text{gcd}(d,n)\\) is \\(d/\\text{gcd}(d,n)\\) itself. We thus confirm | ||||
| taking any steps, which means we have to pick the smallest possible non-zero value of \(k\). The smallest | ||||
| number divisible by \(d/\text{gcd}(d,n)\) is \(d/\text{gcd}(d,n)\) itself. We thus confirm | ||||
| our hypothesis: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| k = \frac{d}{\text{gcd}(d,n)} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| Lastly, recall that our patterns would spiral away from the center whenever a \\(k\\) is a multiple of 4. Now that we know what | ||||
| \\(k\\) is, we can restate this as "\\(d/\\text{gcd}(d,n)\\) is divisible by 4". But if we pick | ||||
| \\(n=d-1\\), the greatest common factor has to be \\(1\\) (see [this section below](#divisors-of-n-and-n-1)), so we can even further simplify this "\\(d\\) is divisible by 4". | ||||
| Lastly, recall that our patterns would spiral away from the center whenever a \(k\) is a multiple of 4. Now that we know what | ||||
| \(k\) is, we can restate this as "\(d/\text{gcd}(d,n)\) is divisible by 4". But if we pick | ||||
| \(n=d-1\), the greatest common factor has to be \(1\) (see [this section below](#divisors-of-n-and-n-1)), so we can even further simplify this "\(d\) is divisible by 4". | ||||
| Thus, we can state simply that any divisor divisible by 4 is off-limits, as it will induce loops. | ||||
| For example, pick \\(d=4\\). Running our algorithm | ||||
| For example, pick \(d=4\). Running our algorithm | ||||
| {{< sidenote "right" "constructive-note" "for \(n=d-1=3\)," >}} | ||||
| Did you catch that? From our work above, we didn't just find a condition that would prevent spirals; | ||||
| we also found the precise number that would result in a spiral if this condition were violated! | ||||
| @ -426,7 +426,7 @@ spiral: | ||||
| 
 | ||||
| {{< figure src="pattern_3_4.svg" caption="Spiral generated by the number 3 with divisor 4." class="tiny" alt="Spiral generated by the number 3 by summing digits." >}} | ||||
| 
 | ||||
| Let's try again. Pick \\(d=8\\); then, for \\(n=d-1=7\\), we also get a spiral: | ||||
| Let's try again. Pick \(d=8\); then, for \(n=d-1=7\), we also get a spiral: | ||||
| 
 | ||||
| {{< figure src="pattern_7_8.svg" caption="Spiral generated by the number 7 with divisor 8." class="tiny" alt="Spiral generated by the number 7 by summing digits." >}} | ||||
| 
 | ||||
| @ -455,17 +455,17 @@ Finally, base-30: | ||||
| ### Generalizing to Arbitrary Numbers of Directions | ||||
| What if we didn't turn 90 degrees each time? What, if, instead, we turned 120 degrees (so that | ||||
| turning 3 times, not 4, would leave you facing the same direction you started)? We can pretty easily | ||||
| do that, too. Let's call this number of turns \\(c\\). Up until now, we had \\(c=4\\). | ||||
| do that, too. Let's call this number of turns \(c\). Up until now, we had \(c=4\). | ||||
| 
 | ||||
| First, let's update our condition. Before, we had "\\(d\\) cannot be divisible by 4". Now, | ||||
| we aren't constraining ourselves to only 4, but rather using a generic variable \\(c\\). | ||||
| We then end up with "\\(d\\) cannot be divisible by \\(c\\)". For instance, suppose we kept | ||||
| First, let's update our condition. Before, we had "\(d\) cannot be divisible by 4". Now, | ||||
| we aren't constraining ourselves to only 4, but rather using a generic variable \(c\). | ||||
| We then end up with "\(d\) cannot be divisible by \(c\)". For instance, suppose we kept | ||||
| our divisor as 9 for the time being, but started turning 3 times instead of 4. This | ||||
| violates our divisibility condtion, and we once again end up with a spiral: | ||||
| 
 | ||||
| {{< figure src="pattern_8_9_t3.svg" caption="Pattern generated by the number 8 in base 10 while turning 3 times." class="tiny" alt="Pattern generated by the number 3 by summing digits and turning 120 degrees." >}} | ||||
| 
 | ||||
| If, on the other hand, we pick \\(d=8\\) and \\(c=3\\), we get patterns for all numbers just like we hoped. | ||||
| If, on the other hand, we pick \(d=8\) and \(c=3\), we get patterns for all numbers just like we hoped. | ||||
| Here's one such pattern: | ||||
| 
 | ||||
| {{< figure src="pattern_7_8_t3.svg" caption="Pattern generated by the number 7 in base 9 while turning 3 times." class="tiny" alt="Pattern generated by the number 7 by summing digits in base 9 and turning 120 degrees." >}} | ||||
| @ -529,7 +529,7 @@ up facing up again. If we turned 480 degrees (that is, two turns of 240 degrees | ||||
| 360 can be safely ignored, since it puts us where we started; only the 120 degrees that remain | ||||
| are needed to figure out our final bearing. In short, the final direction we're facing is | ||||
| the remainder from dividing by 360. We already know how to formulate this using modular arithmetic: | ||||
| if we turn \\(t\\) degrees \\(k\\) times, and end up at final bearing (remainder) \\(b\\), this | ||||
| if we turn \(t\) degrees \(k\) times, and end up at final bearing (remainder) \(b\), this | ||||
| is captured by: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -544,18 +544,18 @@ Of course, if we end up facing the same way we started, we get the familiar equi | ||||
| 
 | ||||
| Even though the variables in this equivalence mean different things now than they did last | ||||
| time we saw it, the mathematical properties remain the same. For instance, we can say that | ||||
| after \\(360/\\text{gcd}(360, t)\\) turns, we'll end up facing the way that we started. | ||||
| after \(360/\text{gcd}(360, t)\) turns, we'll end up facing the way that we started. | ||||
| 
 | ||||
| So far, so good. What I don't like about this, though, is that we have all of these | ||||
| numbers of degrees all over our equations: 72 degrees, 144 degrees, and so forth. However, | ||||
| something like 73 degrees (if there are five possible directions) is just not a valid bearing, | ||||
| and nor is 71. We have so many possible degrees (360 of them, to be exact), but we're only | ||||
| using a handful! That's wasteful. Instead, observe that for \\(c\\) possible turns, | ||||
| the smallest possible turn angle is \\(360/c\\). Let's call this angle \\(\\theta\\) (theta). | ||||
| Now, notice that we always turn in multiples of \\(\\theta\\): a single turn moves us \\(\\theta\\) | ||||
| degrees, two turns move us \\(2\\theta\\) degrees, and so on. If we define \\(r\\) to be | ||||
| using a handful! That's wasteful. Instead, observe that for \(c\) possible turns, | ||||
| the smallest possible turn angle is \(360/c\). Let's call this angle \(\theta\) (theta). | ||||
| Now, notice that we always turn in multiples of \(\theta\): a single turn moves us \(\theta\) | ||||
| degrees, two turns move us \(2\theta\) degrees, and so on. If we define \(r\) to be | ||||
| the number of turns that we find ourselves rotated by after a single cycle, | ||||
| we have \\(t=r\\theta\\), and our turning equation can be written as: | ||||
| we have \(t=r\theta\), and our turning equation can be written as: | ||||
| 
 | ||||
| {{< latex >}} | ||||
|     kr\theta \equiv 0\ (\text{mod}\ c\theta) | ||||
| @ -570,58 +570,58 @@ Now, once again, recall that the above equivalence is just notation for the foll | ||||
|     \end{aligned} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| And finally, observing that \\(kr=kr-0\\), we have: | ||||
| And finally, observing that \(kr=kr-0\), we have: | ||||
| 
 | ||||
| {{< latex >}} | ||||
|     kr \equiv 0\ (\text{mod}\ c) | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| This equivalence says the same thing as our earlier one; however, instead of being in terms | ||||
| of degrees, it's in terms of the number of turns \\(c\\) and the turns-per-cycle \\(r\\). | ||||
| Now, recall once again that the smallest number of steps \\(k>0\\) for which this equivalence holds is | ||||
| \\(k = c/\\text{gcd}(c,r)\\). | ||||
| of degrees, it's in terms of the number of turns \(c\) and the turns-per-cycle \(r\). | ||||
| Now, recall once again that the smallest number of steps \(k>0\) for which this equivalence holds is | ||||
| \(k = c/\text{gcd}(c,r)\). | ||||
| 
 | ||||
| We're close now: we have a sequence of \\(k\\) steps that will lead us back to the beginning. | ||||
| What's left is to show that these \\(k\\) steps are evenly distributed throughout our circle, | ||||
| We're close now: we have a sequence of \(k\) steps that will lead us back to the beginning. | ||||
| What's left is to show that these \(k\) steps are evenly distributed throughout our circle, | ||||
| which is the key property that makes it possible for us to make a polygon out of them (and | ||||
| thus end up back where we started). | ||||
| 
 | ||||
| To show this, say that we have a largest common divisor \\(f=\\text{gcd}(c,r)\\), and that \\(c=fe\\) and \\(r=fs\\). We can once again "divide through" by \\(f\\), and | ||||
| To show this, say that we have a largest common divisor \(f=\text{gcd}(c,r)\), and that \(c=fe\) and \(r=fs\). We can once again "divide through" by \(f\), and | ||||
| get: | ||||
| 
 | ||||
| {{< latex >}} | ||||
|     ks \equiv 0\ (\text{mod}\ e) | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| Now, we know that \\(\\text{gcd}(e,s)=1\\) ([see this section below](#numbers-divided-by-their-textgcd-have-no-common-factors)), and thus: | ||||
| Now, we know that \(\text{gcd}(e,s)=1\) ([see this section below](#numbers-divided-by-their-textgcd-have-no-common-factors)), and thus: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| k = e/\text{gcd}(e,s) = e | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| That is, our cycle will repeat after \\(e\\) remainders. But wait, we've only got \\(e\\) possible | ||||
| remainders: the numbers \\(0\\) through \\(e-1\\)! Thus, for a cycle to repeat after \\(e\\) remainders, | ||||
| all possible remainders must occur. For a concrete example, take \\(e=5\\); our remainders will | ||||
| be the set \\(\\{0,1,2,3,4\\}\\). Now, let's "multiply back through" | ||||
| by \\(f\\): | ||||
| That is, our cycle will repeat after \(e\) remainders. But wait, we've only got \(e\) possible | ||||
| remainders: the numbers \(0\) through \(e-1\)! Thus, for a cycle to repeat after \(e\) remainders, | ||||
| all possible remainders must occur. For a concrete example, take \(e=5\); our remainders will | ||||
| be the set \(\{0,1,2,3,4\}\). Now, let's "multiply back through" | ||||
| by \(f\): | ||||
| 
 | ||||
| {{< latex >}} | ||||
|     kfs \equiv 0\ (\text{mod}\ fe) | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| We still have \\(e\\) possible remainders, but this time they are multiplied by \\(f\\). | ||||
| For example, taking \\(e\\) to once again be equal to \\(5\\), we have the set of possible remainders | ||||
| \\(\\{0, f, 2f, 3f, 4f\\}\\). The important bit is that these remainders are all evenly spaced, and | ||||
| that space between them is \\(f=\\text{gcd}(c,r)\\). | ||||
| We still have \(e\) possible remainders, but this time they are multiplied by \(f\). | ||||
| For example, taking \(e\) to once again be equal to \(5\), we have the set of possible remainders | ||||
| \(\{0, f, 2f, 3f, 4f\}\). The important bit is that these remainders are all evenly spaced, and | ||||
| that space between them is \(f=\text{gcd}(c,r)\). | ||||
| 
 | ||||
| Let's recap: we have confirmed that for \\(c\\) possible turns (4 in our original formulation), | ||||
| and \\(r\\) turns at a time, we will always loop after \\(k=c/\\text{gcd}(c,r)\\) steps, | ||||
| evenly spaced out at \\(\\text{gcd}(c,r)\\) turns. No specific properties from \\(c\\) or \\(r\\) | ||||
| Let's recap: we have confirmed that for \(c\) possible turns (4 in our original formulation), | ||||
| and \(r\) turns at a time, we will always loop after \(k=c/\text{gcd}(c,r)\) steps, | ||||
| evenly spaced out at \(\text{gcd}(c,r)\) turns. No specific properties from \(c\) or \(r\) | ||||
| are needed for this to work. Finally, recall from the previous | ||||
| section that \\(r\\) is zero (and thus, our pattern breaks down) whenever the divisor \\(d\\) (9 in our original formulation) is itself | ||||
| divisible by \\(c\\). And so, __as long as we pick a system with \\(c\\) possible directions | ||||
| and divisor \\(d\\), we will always loop back and create a pattern as long as \\(c\\nmid d\\) (\\(c\\) | ||||
| does not divide \\(d\\))__. | ||||
| section that \(r\) is zero (and thus, our pattern breaks down) whenever the divisor \(d\) (9 in our original formulation) is itself | ||||
| divisible by \(c\). And so, __as long as we pick a system with \(c\) possible directions | ||||
| and divisor \(d\), we will always loop back and create a pattern as long as \(c\nmid d\) (\(c\) | ||||
| does not divide \(d\))__. | ||||
| 
 | ||||
| Let's try it out! There's a few pictures below. When reading the captions, keep in mind that the _base_ | ||||
| is one more than the _divisor_ (we started with numbers in the usual base 10, but divided by 9). | ||||
| @ -652,12 +652,12 @@ up a lot of space (and were interrupting the flow of the explanation). They are | ||||
| ### Referenced Proofs | ||||
| 
 | ||||
| #### Adding Two Congruences | ||||
| __Claim__: If for some numbers \\(a\\), \\(b\\), \\(c\\), \\(d\\), and \\(k\\), we have | ||||
| \\(a \\equiv b\\ (\\text{mod}\\ k)\\) and \\(c \\equiv d\\ (\\text{mod}\\ k)\\), then | ||||
| it's also true that \\(a+c \\equiv b+d\\ (\\text{mod}\\ k)\\). | ||||
| __Claim__: If for some numbers \(a\), \(b\), \(c\), \(d\), and \(k\), we have | ||||
| \(a \equiv b\ (\text{mod}\ k)\) and \(c \equiv d\ (\text{mod}\ k)\), then | ||||
| it's also true that \(a+c \equiv b+d\ (\text{mod}\ k)\). | ||||
| 
 | ||||
| __Proof__: By definition, we have \\(k|(a-b)\\) and \\(k|(c-d)\\). This, in turn, means | ||||
| that for some \\(i\\) and \\(j\\), \\(a-b=ik\\) and \\(c-d=jk\\). Add both sides to get: | ||||
| __Proof__: By definition, we have \(k|(a-b)\) and \(k|(c-d)\). This, in turn, means | ||||
| that for some \(i\) and \(j\), \(a-b=ik\) and \(c-d=jk\). Add both sides to get: | ||||
| {{< latex >}} | ||||
|     \begin{aligned} | ||||
|      & (a-b)+(c-d) = ik+jk \\ | ||||
| @ -666,63 +666,63 @@ that for some \\(i\\) and \\(j\\), \\(a-b=ik\\) and \\(c-d=jk\\). Add both sides | ||||
|     \Rightarrow\ & a+c \equiv b+d\ (\text{mod}\ k) \\ | ||||
|     \end{aligned} | ||||
| {{< /latex >}} | ||||
| \\(\\blacksquare\\) | ||||
| \(\blacksquare\) | ||||
| 
 | ||||
| #### Multiplying Both Sides of a Congruence | ||||
| __Claim__: If for some numbers \\(a\\), \\(b\\), \\(n\\) and \\(k\\), we have | ||||
| \\(a \\equiv b\\ (\\text{mod}\\ k)\\) then we also have that \\(an \\equiv bn\\ (\\text{mod}\\ k)\\). | ||||
| __Claim__: If for some numbers \(a\), \(b\), \(n\) and \(k\), we have | ||||
| \(a \equiv b\ (\text{mod}\ k)\) then we also have that \(an \equiv bn\ (\text{mod}\ k)\). | ||||
| 
 | ||||
| __Proof__: By definition, we have \\(k|(a-b)\\). Since multiplying \\(a-b\\) but \\(n\\) cannot | ||||
| make it _not_ divisible by \\(k\\), we also have \\(k|\\left[n(a-b)\\right]\\). Distributing | ||||
| \\(n\\), we have \\(k|(na-nb)\\). By definition, this means \\(na\\equiv nb\\ (\\text{mod}\\ k)\\). | ||||
| __Proof__: By definition, we have \(k|(a-b)\). Since multiplying \(a-b\) but \(n\) cannot | ||||
| make it _not_ divisible by \(k\), we also have \(k|\left[n(a-b)\right]\). Distributing | ||||
| \(n\), we have \(k|(na-nb)\). By definition, this means \(na\equiv nb\ (\text{mod}\ k)\). | ||||
| 
 | ||||
| \\(\\blacksquare\\) | ||||
| \(\blacksquare\) | ||||
| 
 | ||||
| #### Invertible Numbers \\(\\text{mod}\\ d\\) Share no Factors with \\(d\\) | ||||
| __Claim__: A number \\(k\\) is only invertible (can be divided by) in \\(\\text{mod}\\ d\\) if \\(k\\) | ||||
| and \\(d\\) share no common factors (except 1). | ||||
| __Claim__: A number \(k\) is only invertible (can be divided by) in \(\text{mod}\ d\) if \(k\) | ||||
| and \(d\) share no common factors (except 1). | ||||
| 
 | ||||
| __Proof__: Write \\(\\text{gcd}(k,d)\\) for the greatest common factor divisor of \\(k\\) and \\(d\\). | ||||
| Another important fact (not proven here, but see something [like this](https://sharmaeklavya2.github.io/theoremdep/nodes/number-theory/gcd/gcd-is-min-lincomb.html)), is that if \\(\\text{gcd}(k,d) = r\\), | ||||
| then the smallest possible number that can be made by adding and subtracting \\(k\\)s and \\(d\\)s | ||||
| is \\(r\\). That is, for some \\(i\\) and \\(j\\), the smallest possible positive value of \\(ik + jd\\) is \\(r\\). | ||||
| __Proof__: Write \(\text{gcd}(k,d)\) for the greatest common factor divisor of \(k\) and \(d\). | ||||
| Another important fact (not proven here, but see something [like this](https://sharmaeklavya2.github.io/theoremdep/nodes/number-theory/gcd/gcd-is-min-lincomb.html)), is that if \(\text{gcd}(k,d) = r\), | ||||
| then the smallest possible number that can be made by adding and subtracting \(k\)s and \(d\)s | ||||
| is \(r\). That is, for some \(i\) and \(j\), the smallest possible positive value of \(ik + jd\) is \(r\). | ||||
| 
 | ||||
| Now, note that \\(d \\equiv 0\\ (\\text{mod}\\ d)\\). Multiplying both sides by \\(j\\), get | ||||
| \\(jd\\equiv 0\\ (\\text{mod}\\ d)\\). This, in turn, means that the smallest possible | ||||
| value of \\(ik+jd \\equiv ik\\) is \\(r\\). If \\(r\\) is bigger than 1 (i.e., if | ||||
| \\(k\\) and \\(d\\) have common factors), then we can't pick \\(i\\) such that \\(ik\\equiv1\\), | ||||
| since we know that \\(r>1\\) is the least possible value we can make. There is therefore no | ||||
| multiplicative inverse to \\(k\\). Alternatively worded, we cannot divide by \\(k\\). | ||||
| Now, note that \(d \equiv 0\ (\text{mod}\ d)\). Multiplying both sides by \(j\), get | ||||
| \(jd\equiv 0\ (\text{mod}\ d)\). This, in turn, means that the smallest possible | ||||
| value of \(ik+jd \equiv ik\) is \(r\). If \(r\) is bigger than 1 (i.e., if | ||||
| \(k\) and \(d\) have common factors), then we can't pick \(i\) such that \(ik\equiv1\), | ||||
| since we know that \(r>1\) is the least possible value we can make. There is therefore no | ||||
| multiplicative inverse to \(k\). Alternatively worded, we cannot divide by \(k\). | ||||
| 
 | ||||
| \\(\\blacksquare\\) | ||||
| \(\blacksquare\) | ||||
| 
 | ||||
| #### Numbers Divided by Their \\(\\text{gcd}\\) Have No Common Factors | ||||
| __Claim__: For any two numbers \\(a\\) and \\(b\\) and their largest common factor \\(f\\), | ||||
| if \\(a=fc\\) and \\(b=fd\\), then \\(c\\) and \\(d\\) have no common factors other than 1 (i.e., | ||||
| \\(\\text{gcd}(c,d)=1\\)). | ||||
| __Claim__: For any two numbers \(a\) and \(b\) and their largest common factor \(f\), | ||||
| if \(a=fc\) and \(b=fd\), then \(c\) and \(d\) have no common factors other than 1 (i.e., | ||||
| \(\text{gcd}(c,d)=1\)). | ||||
| 
 | ||||
| __Proof__: Suppose that \\(c\\) and \\(d\\) do have sommon factor, \\(e\\neq1\\). In that case, we have | ||||
| \\(c=ei\\) and \\(d=ej\\) for some \\(i\\) and \\(j\\). Then, we have \\(a=fei\\), and \\(b=fej\\). | ||||
| From this, it's clear that both \\(a\\) and \\(b\\) are divisible by \\(fe\\). Since \\(e\\) | ||||
| is greater than \\(1\\), \\(fe\\) is greater than \\(f\\). But our assumptions state that | ||||
| \\(f\\) is the greatest common divisor of \\(a\\) and \\(b\\)! We have arrived at a contradiction. | ||||
| __Proof__: Suppose that \(c\) and \(d\) do have sommon factor, \(e\neq1\). In that case, we have | ||||
| \(c=ei\) and \(d=ej\) for some \(i\) and \(j\). Then, we have \(a=fei\), and \(b=fej\). | ||||
| From this, it's clear that both \(a\) and \(b\) are divisible by \(fe\). Since \(e\) | ||||
| is greater than \(1\), \(fe\) is greater than \(f\). But our assumptions state that | ||||
| \(f\) is the greatest common divisor of \(a\) and \(b\)! We have arrived at a contradiction. | ||||
| 
 | ||||
| Thus, \\(c\\) and \\(d\\) cannot have a common factor other than 1. | ||||
| Thus, \(c\) and \(d\) cannot have a common factor other than 1. | ||||
| 
 | ||||
| \\(\\blacksquare\\) | ||||
| \(\blacksquare\) | ||||
| 
 | ||||
| #### Divisors of \\(n\\) and \\(n-1\\). | ||||
| __Claim__: For any \\(n\\), \\(\\text{gcd}(n,n-1)=1\\). That is, \\(n\\) and \\(n-1\\) share | ||||
| __Claim__: For any \(n\), \(\text{gcd}(n,n-1)=1\). That is, \(n\) and \(n-1\) share | ||||
| no common divisors. | ||||
| 
 | ||||
| __Proof__: Suppose some number \\(f\\) divides both \\(n\\) and \\(n-1\\). | ||||
| In that case, we can write \\(n=af\\), and \\((n-1)=bf\\) for some \\(a\\) and \\(b\\). | ||||
| __Proof__: Suppose some number \(f\) divides both \(n\) and \(n-1\). | ||||
| In that case, we can write \(n=af\), and \((n-1)=bf\) for some \(a\) and \(b\). | ||||
| Subtracting one equation from the other: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| 1 = (a-b)f | ||||
| {{< /latex >}} | ||||
| But this means that 1 is divisible by \\(f\\)! That's only possible if \\(f=1\\). Thus, the only | ||||
| number that divides \\(n\\) and \\(n-1\\) is 1; that's our greatest common factor. | ||||
| But this means that 1 is divisible by \(f\)! That's only possible if \(f=1\). Thus, the only | ||||
| number that divides \(n\) and \(n-1\) is 1; that's our greatest common factor. | ||||
| 
 | ||||
| \\(\\blacksquare\\) | ||||
| \(\blacksquare\) | ||||
|  | ||||
| @ -58,7 +58,7 @@ two binomials! Here's the corresponding multiplication: | ||||
| 
 | ||||
| It's not just binomials that correspond to our combining paths between cities. | ||||
| We can represent any combination of trips of various lengths as a polynomial. | ||||
| Each term \\(ax^n\\) represents \\(a\\) trips of length \\(n\\). As we just | ||||
| Each term \(ax^n\) represents \(a\) trips of length \(n\). As we just | ||||
| saw, multiplying two polynomials corresponds to "sequencing" the trips they | ||||
| represent -- matching each trip in one with each of the trips in the other, | ||||
| and totaling them up. | ||||
| @ -66,8 +66,8 @@ and totaling them up. | ||||
| What about adding polynomials, what does that correspond to? The answer there | ||||
| is actually quite simple: if two polynomials both represent (distinct) lists of | ||||
| trips from A to B, then adding them just combines the list. If I know one trip | ||||
| that takes two hours (\\(x^2\\)) and someone else knows a shortcut (\\(x\\\)), | ||||
| then we can combine that knowledge (\\(x^2+x\\)). | ||||
| that takes two hours (\(x^2\)) and someone else knows a shortcut (\(x\)), | ||||
| then we can combine that knowledge (\(x^2+x\)). | ||||
| 
 | ||||
| {{< dialog >}} | ||||
| {{< message "question" "reader" >}} | ||||
| @ -119,16 +119,16 @@ example should do. | ||||
| 
 | ||||
| The first thing we do is _distribute_ the multiplication over the addition, on | ||||
| the left. We then do that again, on the right this time. After this, we finally | ||||
| get some terms, but they aren't properly grouped together; an \\(x\\) is at the | ||||
| front, and a \\(-x\\) is at the very back. We use the fact that addition is | ||||
| _commutative_ (\\(a+b=b+a\\)) and _associative_ (\\(a+(b+c)=(a+b)+c\\)) to | ||||
| rearrange the equation, grouping the \\(x\\) and its negation together. This | ||||
| gives us \\((1-1)x=0x=0\\). That last step is important: we've used the fact | ||||
| get some terms, but they aren't properly grouped together; an \(x\) is at the | ||||
| front, and a \(-x\) is at the very back. We use the fact that addition is | ||||
| _commutative_ (\(a+b=b+a\)) and _associative_ (\(a+(b+c)=(a+b)+c\)) to | ||||
| rearrange the equation, grouping the \(x\) and its negation together. This | ||||
| gives us \((1-1)x=0x=0\). That last step is important: we've used the fact | ||||
| that multiplication by zero gives zero. Another important property (though | ||||
| we didn't use it here) is that multiplication has to be associative, too. | ||||
| 
 | ||||
| So, what if we didn't use numbers, but rather any _thing_ with two | ||||
| operations, one kind of like \\((\\times)\\) and one kind of like \\((+)\\)? | ||||
| operations, one kind of like \((\times)\) and one kind of like \((+)\)? | ||||
| 
 | ||||
| {{< dialog >}} | ||||
| {{< message "question" "reader" >}} | ||||
| @ -167,8 +167,8 @@ that "things with addition and multiplication that work in the way we | ||||
| described" have an established name in math - they're called semirings. | ||||
| 
 | ||||
| A __semiring__ is a set equipped with two operations, one called | ||||
| "multiplicative" (and thus carrying the symbol \\(\\times)\\) and one | ||||
| called "additive" (and thus written as \\(+\\)). Both of these operations | ||||
| "multiplicative" (and thus carrying the symbol \(\times)\) and one | ||||
| called "additive" (and thus written as \(+\)). Both of these operations | ||||
| need to have an "identity element". The identity element for multiplication | ||||
| is usually | ||||
| {{< sidenote "right" "written-as-note" "written as \(1\)," >}} | ||||
| @ -180,10 +180,10 @@ other more "esoteric" things, using numbers to stand for special elements | ||||
| seems to help use intuition. | ||||
| {{< /sidenote >}} | ||||
| and the identity element for addition is written | ||||
| as \\(0\\). Furthermore, a few equations hold. I'll present them in groups. | ||||
| First, multiplication is associative and multiplying by \\(1\\) does nothing; | ||||
| as \(0\). Furthermore, a few equations hold. I'll present them in groups. | ||||
| First, multiplication is associative and multiplying by \(1\) does nothing; | ||||
| in mathematical terms, the set forms a [monoid](https://mathworld.wolfram.com/Monoid.html) | ||||
| with multiplication and \\(1\\). | ||||
| with multiplication and \(1\). | ||||
| {{< latex >}} | ||||
| \begin{array}{cl} | ||||
|     (a\times b)\times c = a\times(b\times c) & \text{(multiplication associative)}\\ | ||||
| @ -191,9 +191,9 @@ with multiplication and \\(1\\). | ||||
| \end{array} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| Similarly, addition is associative and adding \\(0\\) does nothing. | ||||
| Similarly, addition is associative and adding \(0\) does nothing. | ||||
| Addition must also be commutative; in other words, the set forms a | ||||
| commutative monoid with addition and \\(0\\). | ||||
| commutative monoid with addition and \(0\). | ||||
| {{< latex >}} | ||||
| \begin{array}{cl} | ||||
|     (a+b)+c = a+(b+c) & \text{(addition associative)}\\ | ||||
| @ -219,9 +219,9 @@ another in a particular number of hours. There are, however, other semirings | ||||
| we can use that yield interesting results, even though we continue to add | ||||
| and multiply polynomials. | ||||
| 
 | ||||
| One last thing before we look at other semirings: given a semiring \\(R\\), | ||||
| the polynomials using that \\(R\\), and written in terms of the variable | ||||
| \\(x\\), are denoted as \\(R[x]\\). | ||||
| One last thing before we look at other semirings: given a semiring \(R\), | ||||
| the polynomials using that \(R\), and written in terms of the variable | ||||
| \(x\), are denoted as \(R[x]\). | ||||
| 
 | ||||
| 
 | ||||
| #### The Semiring of Booleans, \\(\\mathbb{B}\\) | ||||
| @ -239,7 +239,7 @@ is true, and false otherwise. | ||||
| \end{array} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| For addition, the identity element -- our \\(0\\) -- is \\(\\text{false}\\). | ||||
| For addition, the identity element -- our \(0\) -- is \(\text{false}\). | ||||
| 
 | ||||
| Correspondingly, multiplication is the "and" operation (aka `&&`), in which the | ||||
| result is false if either operand is false, and true otherwise. | ||||
| @ -252,23 +252,23 @@ result is false if either operand is false, and true otherwise. | ||||
| \end{array} | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| For multiplication, the identity element -- the \\(1\\) -- is \\(\\text{true}\\). | ||||
| For multiplication, the identity element -- the \(1\) -- is \(\text{true}\). | ||||
| 
 | ||||
| It's not hard to see that _both_ operations are commutative - the first and | ||||
| second equations for addition, for instance, can be combined to get | ||||
| \\(\\text{true}+b=b+\\text{true}\\), and the third equation clearly shows | ||||
| \(\text{true}+b=b+\text{true}\), and the third equation clearly shows | ||||
| commutativity when both operands are false. The other properties are | ||||
| easy enough to verify by simple case analysis (there are 8 cases to consider). | ||||
| The set of booleans is usually denoted as \\(\\mathbb{B}\\), which means | ||||
| polynomials using booleans are denoted by \\(\\mathbb{B}[x]\\). | ||||
| The set of booleans is usually denoted as \(\mathbb{B}\), which means | ||||
| polynomials using booleans are denoted by \(\mathbb{B}[x]\). | ||||
| 
 | ||||
| Let's try some examples. We can't count how many ways there are to get from | ||||
| A to B in a certain number of hours anymore: booleans aren't numbers! | ||||
| Instead, what we _can_ do is track _whether or not_ there is a way to get | ||||
| from A to B in a certain number of hours (call it \\(n\\)). If we can, | ||||
| we write that as \\(\text{true}\ x^n = 1x^n = x^n\\). If we can't, we write | ||||
| that as \\(\\text{false}\ x^n = 0x^n = 0\\). The polynomials corresponding | ||||
| to our introductory problem are \\(x^2+x^1\\) and \\(x^3+x^2\\). Multiplying | ||||
| from A to B in a certain number of hours (call it \(n\)). If we can, | ||||
| we write that as \(\text{true}\ x^n = 1x^n = x^n\). If we can't, we write | ||||
| that as \(\text{false}\ x^n = 0x^n = 0\). The polynomials corresponding | ||||
| to our introductory problem are \(x^2+x^1\) and \(x^3+x^2\). Multiplying | ||||
| them out gives: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -306,7 +306,7 @@ that we lost information, rather than gained it, but switching to | ||||
| boolean polynomials: we can always recover a boolean polynomial from the | ||||
| natural number one, but not the other way around. | ||||
| {{< /sidenote >}} | ||||
| (which were \\(\\mathbb{N}[x]\\), polynomials over natural numbers \\(\\mathbb{N} = \\{ 0, 1, 2, ... \\}\\)), so it's unclear why we'd prefer them. However, | ||||
| (which were \(\mathbb{N}[x]\), polynomials over natural numbers \(\mathbb{N} = \{ 0, 1, 2, ... \}\)), so it's unclear why we'd prefer them. However, | ||||
| we're just warming up - there are more interesting semirings for us to | ||||
| consider! | ||||
| 
 | ||||
| @ -316,12 +316,12 @@ equivalent". If we're giving directions, though, we might benefit | ||||
| from knowing not just that there _is_ a way, but what roads that | ||||
| way is made up of! | ||||
| 
 | ||||
| To this end, we define the set of paths, \\(\\Pi\\). This set will consist | ||||
| of the empty path (which we will denote \\(\\circ\\), why not?), street | ||||
| names (e.g. \\(\\text{Mullholland Dr.}\\) or \\(\\text{Sunset Blvd.}\\)), and | ||||
| concatenations of paths, written using \\(\\rightarrow\\). For instance, | ||||
| a path that first takes us on \\(\\text{Highway}\\) and then on | ||||
| \\(\\text{Exit 4b}\\) will be written as: | ||||
| To this end, we define the set of paths, \(\Pi\). This set will consist | ||||
| of the empty path (which we will denote \(\circ\), why not?), street | ||||
| names (e.g. \(\text{Mullholland Dr.}\) or \(\text{Sunset Blvd.}\)), and | ||||
| concatenations of paths, written using \(\rightarrow\). For instance, | ||||
| a path that first takes us on \(\text{Highway}\) and then on | ||||
| \(\text{Exit 4b}\) will be written as: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \text{Highway}\rightarrow\text{Exit 4b} | ||||
| @ -329,7 +329,7 @@ a path that first takes us on \\(\\text{Highway}\\) and then on | ||||
| 
 | ||||
| Furthermore, it's not too much of a stretch to say that adding an empty path | ||||
| to the front or the back of another path doesn't change it. If we use | ||||
| the letter \\(\\pi\\) to denote a path, this means the following equation: | ||||
| the letter \(\pi\) to denote a path, this means the following equation: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \circ \rightarrow \pi = \pi = \pi \rightarrow \circ | ||||
| @ -347,8 +347,8 @@ different ways to get from one place to another. This is an excellent | ||||
| use case for sets! | ||||
| 
 | ||||
| Our next semiring will be that of _sets of paths_. Some example elements | ||||
| of this semiring are \\(\\varnothing\\), also known as the empty set, | ||||
| \\(\\{\\circ\\}\\), the set containing only the empty path, and the set | ||||
| of this semiring are \(\varnothing\), also known as the empty set, | ||||
| \(\{\circ\}\), the set containing only the empty path, and the set | ||||
| containing a path via the highway, and another path via the suburbs: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -364,16 +364,16 @@ A + B \triangleq A \cup B | ||||
| {{< /latex >}} | ||||
| 
 | ||||
| It's well known (and not hard to verify) that set union is commutative | ||||
| and associative. The additive identity \\(0\\) is simply the empty set | ||||
| \\(\\varnothing\\). Intuitively, adding "no paths" to another set of | ||||
| and associative. The additive identity \(0\) is simply the empty set | ||||
| \(\varnothing\). Intuitively, adding "no paths" to another set of | ||||
| paths doesn't add anything, and thus leaves that other set unchanged. | ||||
| 
 | ||||
| Multiplication is a little bit more interesting, and uses the path | ||||
| concatenation operation we defined earlier. We will use this | ||||
| operation to describe path sequencing; given two sets of paths, | ||||
| \\(A\\) and \\(B\\), we'll create a new set of paths | ||||
| consisting of each path from \\(A\\) concatenated with each | ||||
| path from \\(B\\): | ||||
| \(A\) and \(B\), we'll create a new set of paths | ||||
| consisting of each path from \(A\) concatenated with each | ||||
| path from \(B\): | ||||
| 
 | ||||
| {{< latex >}} | ||||
| A \times B \triangleq \{ a \rightarrow b\ |\ a \in A, b \in B \} | ||||
| @ -394,8 +394,8 @@ A \times (B \times C) & = & \{ a \rightarrow (b \rightarrow c)\ |\ a \in A, b \i | ||||
| What's the multiplicative identity? Well, since multiplication concatenates | ||||
| all the combinations of paths from two sets, we could try making a set of | ||||
| elements that don't do anything when concatenating. Sound familiar? It should, | ||||
| that's \\(\\circ\\), the empty path element! We thus define our multiplicative | ||||
| identity as \\(\\{\\circ\\}\\), and verify that it is indeed the identity: | ||||
| that's \(\circ\), the empty path element! We thus define our multiplicative | ||||
| identity as \(\{\circ\}\), and verify that it is indeed the identity: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \begin{gathered} | ||||
| @ -409,11 +409,11 @@ sets of paths, either; I won't do that here, though. Finally, let's take | ||||
| a look at an example. Like before, we'll try make one that corresponds to | ||||
| our introductory description of paths from A to B and from B to C. Now we need | ||||
| to be a little bit creative, and come up with names for all these different | ||||
| roads between our hypothetical cities. Let's say that \\(\\text{Highway A}\\) | ||||
| and \\(\\text{Highway B}\\) are the two paths from A to B that take two hours | ||||
| each, and then \\(\\text{Shortcut}\\) is the path that takes one hour. As for | ||||
| paths from B to C, let's just call them \\(\\text{Long}\\) for the three-hour | ||||
| path, and \\(\\text{Short}\\) for the two-hour path. Our two polynomials | ||||
| roads between our hypothetical cities. Let's say that \(\text{Highway A}\) | ||||
| and \(\text{Highway B}\) are the two paths from A to B that take two hours | ||||
| each, and then \(\text{Shortcut}\) is the path that takes one hour. As for | ||||
| paths from B to C, let's just call them \(\text{Long}\) for the three-hour | ||||
| path, and \(\text{Short}\) for the two-hour path. Our two polynomials | ||||
| are then: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| @ -440,12 +440,12 @@ I only have one last semiring left to show you. It's a fun semiring though, | ||||
| as even its name might suggest: we'll take a look at a _tropical semiring_. | ||||
| 
 | ||||
| In this semiring, we go back to numbers; particularly, real numbers (e.g., | ||||
| \\(1.34\\), \\(163\\), \\(e\\), that kind of thing). We even use addition -- | ||||
| \(1.34\), \(163\), \(e\), that kind of thing). We even use addition -- | ||||
| sort of. In the tropical semiring, addition serves as the _multiplicative_ | ||||
| operation! This is even confusing to write, so I'm going to switch up notation; | ||||
| in the rest of this section, I'll use \\(\\otimes\\) to represent the | ||||
| multiplicative operation in semirings, and \\(\\oplus\\) to represent the | ||||
| additive one. The symbols \\(\\times\\) and \\(+\\) will be used to represent | ||||
| in the rest of this section, I'll use \(\otimes\) to represent the | ||||
| multiplicative operation in semirings, and \(\oplus\) to represent the | ||||
| additive one. The symbols \(\times\) and \(+\) will be used to represent | ||||
| the regular operations on real numbers. With that, the operations on our | ||||
| tropical semiring over real numbers are defined as follows: | ||||
| 
 | ||||
| @ -461,17 +461,17 @@ the duration of the trip, you'd like to track the distance you must travel for | ||||
| each route (shorter routes do sometimes have more traffic!). Let's watch what | ||||
| happens when we add and multiply polynomials over this semiring. | ||||
| When we add terms with the same power but different coefficients, like | ||||
| \\(ax\oplus bx\\), we end up with a term \\(\min(a,b)x\\). In other words, | ||||
| \(ax\oplus bx\), we end up with a term \(\min(a,b)x\). In other words, | ||||
| for each trip duration, we pick the shortest length. When we multiply two | ||||
| polynomials, like \\(ax\otimes bx\\), we get \\((a+b)x\\); in other words, | ||||
| polynomials, like \(ax\otimes bx\), we get \((a+b)x\); in other words, | ||||
| when sequencing two trips, we add up the distances to get the combined | ||||
| distance, just like we'd expect. | ||||
| 
 | ||||
| We can, of course, come up with a polynomial to match our initial example. | ||||
| Say that the trips from A to B are represented by \\(2.0x^2\oplus1.5x\\\) (the | ||||
| shortest two-hour trip is \\(2\\) units of distance long, and the one-hour | ||||
| trip is \\(1.5\\) units long), and that the trips from B to C are represented | ||||
| by \\(4.0x^3\oplus1.0x^2\\). Multiplying the two polynomials out gives: | ||||
| Say that the trips from A to B are represented by \(2.0x^2\oplus1.5x\) (the | ||||
| shortest two-hour trip is \(2\) units of distance long, and the one-hour | ||||
| trip is \(1.5\) units long), and that the trips from B to C are represented | ||||
| by \(4.0x^3\oplus1.0x^2\). Multiplying the two polynomials out gives: | ||||
| 
 | ||||
| {{< latex >}} | ||||
| \begin{array}{rcl} | ||||
| @ -484,7 +484,7 @@ The only time we used the additive operation in this case was to pick between | ||||
| two trips of equal druation but different length (two-hour trip from A to B | ||||
| followed by a two-hour trip from B to C, or one-hour trip from A to C followed | ||||
| by a three-hour trip from B to C). The first trip wins out, since it requires | ||||
| only \\(3.0\\) units of distance. | ||||
| only \(3.0\) units of distance. | ||||
| 
 | ||||
| ### Anything but Routes | ||||
| So far, all I've done can be reduced to variations on a theme: keeping track | ||||
|  | ||||
| @ -48,22 +48,22 @@ concrete syntax. How about something like: | ||||
| 
 | ||||
| Let's informally define the meanings of each of the described commands: | ||||
| 
 | ||||
| 1. \\(\\text{Pop} \\; n\\): Removes the top \\(n\\) elements from the stack. | ||||
| 2. \\(\\text{Slide} \\; n \\): Removes the top \\(n\\) elements __after the first element on the stack__. | ||||
| 1. \(\text{Pop} \; n\): Removes the top \(n\) elements from the stack. | ||||
| 2. \(\text{Slide} \; n \): Removes the top \(n\) elements __after the first element on the stack__. | ||||
| The first element is not removed. | ||||
| 2. \\(\\text{Offset} \\; n \\): Pushes an element from the stack onto the stack, again. When \\(n=0\\), | ||||
| the top element is pushed, when \\(n=1\\), the second element is pushed, and so on. | ||||
| 3. \\(\\text{Eq}\\): Compares two numbers on top of the stack for equality. The numbers are removed, | ||||
| 2. \(\text{Offset} \; n \): Pushes an element from the stack onto the stack, again. When \(n=0\), | ||||
| the top element is pushed, when \(n=1\), the second element is pushed, and so on. | ||||
| 3. \(\text{Eq}\): Compares two numbers on top of the stack for equality. The numbers are removed, | ||||
| and replaced with a boolean indicating whether or not they are equal. | ||||
| 4. \\(\\text{PushI} \\; i \\): Pushes an integer \\(i\\) onto the stack. | ||||
| 5. \\(\\text{Add}\\): Adds two numbers on top of the stack. The two numbers are removed, | ||||
| 4. \(\text{PushI} \; i \): Pushes an integer \(i\) onto the stack. | ||||
| 5. \(\text{Add}\): Adds two numbers on top of the stack. The two numbers are removed, | ||||
| and replaced with their sum. | ||||
| 6. \\(\\text{Mul}\\): Multiplies two numbers on top of the stack. The two numbers are removed, | ||||
| 6. \(\text{Mul}\): Multiplies two numbers on top of the stack. The two numbers are removed, | ||||
| and replaced with their product. | ||||
| 7. \\(\\textbf{if}\\)/\\(\\textbf{else}\\): Runs the first list of commands if the boolean "true" is | ||||
| 7. \(\textbf{if}\)/\(\textbf{else}\): Runs the first list of commands if the boolean "true" is | ||||
| on top of the stack, and the second list of commands if the boolean is "false". | ||||
| 8. \\(\\textbf{func}\\): pushes a function with the given commands onto the stack.  | ||||
| 9. \\(\\text{Call}\\): calls the function at the top of the stack. The function is removed, | ||||
| 8. \(\textbf{func}\): pushes a function with the given commands onto the stack.  | ||||
| 9. \(\text{Call}\): calls the function at the top of the stack. The function is removed, | ||||
| and its body is then executed. | ||||
| 
 | ||||
| Great! Let's now write some dummy programs in our language (and switch to code blocks | ||||
| @ -106,7 +106,7 @@ of all the computational chaos. We will adopt calling conventions. | ||||
| When I say calling convention, I mean that every time we call a function, we do it in a | ||||
| methodical way. There are many possible such methods, but I propose the following: | ||||
| 
 | ||||
| 1. Since \\(\\text{Call}\\) requires that the function you're calling is at the top | ||||
| 1. Since \(\text{Call}\) requires that the function you're calling is at the top | ||||
| of the stack, we stick with that. | ||||
| 2. If the function expects arguments, we push them on the stack right before the function. The | ||||
| first argument of the function should be second from the top of the stack (i.e., | ||||
| @ -132,7 +132,7 @@ that it is called correctly, of course -- it will receive an integer | ||||
| on top of the stack. That may not, and likely will not, be the only thing on the stack. | ||||
| However, to stick by convention 4, we pretend that the stack is empty, and that | ||||
| trying to manipulate it will result in an error. So, we can start by imagining | ||||
| an empty stack, with an integer \\(x\\) on top: | ||||
| an empty stack, with an integer \(x\) on top: | ||||
| 
 | ||||
| {{< stack >}} | ||||
|     {{< stack_element >}}{{< /stack_element >}} | ||||
| @ -141,7 +141,7 @@ an empty stack, with an integer \\(x\\) on top: | ||||
|     {{< stack_element >}}\(x\){{< /stack_element >}} | ||||
| {{< /stack >}} | ||||
| 
 | ||||
| Then, \\(\\text{PushI} \\; 0\\) will push 0 onto the stack: | ||||
| Then, \(\text{PushI} \; 0\) will push 0 onto the stack: | ||||
| 
 | ||||
| {{< stack >}} | ||||
|     {{< stack_element >}}{{< /stack_element >}} | ||||
| @ -150,7 +150,7 @@ Then, \\(\\text{PushI} \\; 0\\) will push 0 onto the stack: | ||||
|     {{< stack_element >}}\(x\){{< /stack_element >}} | ||||
| {{< /stack >}} | ||||
| 
 | ||||
| \\(\\text{Slide} \\; 1\\) will then remove the 1 element after the top element: \\(x\\). | ||||
| \(\text{Slide} \; 1\) will then remove the 1 element after the top element: \(x\). | ||||
| We end up with the following stack: | ||||
| 
 | ||||
| {{< stack >}} | ||||
| @ -176,14 +176,14 @@ The function must be on top of the stack, as per the semantics of our language | ||||
| (and, I suppose, convention 1). Because of this, we have to push it last. | ||||
| It only takes one argument, which we push on the stack first (so that it ends up | ||||
| below the function, as per convention 2). When both are pushed, we use | ||||
| \\(\\text{Call}\\) to execute the function, which will proceed as we've seen above. | ||||
| \(\text{Call}\) to execute the function, which will proceed as we've seen above. | ||||
| 
 | ||||
| ### Get Ahold of Yourself! | ||||
| How should a function call itself? The fact that functions reside on the stack, | ||||
| and can therefore be manipulated in the same way as any stack elements. This | ||||
| opens up an opportunity for us: we can pass the function as an argument | ||||
| to itself! Then, when it needs to make a recursive call, all it must do | ||||
| is \\(\\text{Offset}\\) itself onto the top of the stack, then \\(\\text{Call}\\), | ||||
| is \(\text{Offset}\) itself onto the top of the stack, then \(\text{Call}\), | ||||
| and voila!  | ||||
| 
 | ||||
| Talk is great, of course, but talking doesn't give us any examples. Let's | ||||
| @ -191,15 +191,15 @@ walk through an example of writing a recursive function this way. Let's | ||||
| try [factorial](https://en.wikipedia.org/wiki/Factorial)! | ||||
| 
 | ||||
| The "easy" implementation of factorial is split into two cases: | ||||
| the base case, when \\(0! = 1\\) is computed, and the recursive case, | ||||
| in which we multiply the input number \\(n\\) by the result | ||||
| of computing factorial for \\(n-1\\). Accordingly, we will use | ||||
| the \\(\\textbf{if}\\)/\\(\\text{else}\\) command. We will | ||||
| the base case, when \(0! = 1\) is computed, and the recursive case, | ||||
| in which we multiply the input number \(n\) by the result | ||||
| of computing factorial for \(n-1\). Accordingly, we will use | ||||
| the \(\textbf{if}\)/\(\text{else}\) command. We will | ||||
| make our function take two arguments, with the number input | ||||
| as the first ("top") argument, and the function itself as | ||||
| the second argument. Importantly, we do not want to destroy the input | ||||
| number by running \\(\\text{Eq}\\) directly on it. Instead, | ||||
| we first copy it using \\(\\text{Offset} \\; 0\\), then | ||||
| number by running \(\text{Eq}\) directly on it. Instead, | ||||
| we first copy it using \(\text{Offset} \; 0\), then | ||||
| compare it to 0: | ||||
| 
 | ||||
| ``` | ||||
| @ -218,7 +218,7 @@ on the stack: | ||||
|     {{< stack_element >}}factorial{{< /stack_element >}} | ||||
| {{< /stack >}} | ||||
| 
 | ||||
| Then, \\(\\text{Offset} \\; 0\\) duplicates the first argument | ||||
| Then, \(\text{Offset} \; 0\) duplicates the first argument | ||||
| (the number): | ||||
| 
 | ||||
| {{< stack >}} | ||||
| @ -237,7 +237,7 @@ Next, 0 is pushed onto the stack: | ||||
|     {{< stack_element >}}factorial{{< /stack_element >}} | ||||
| {{< /stack >}} | ||||
| 
 | ||||
| Finally, \\(\\text{Eq}\\) performs the equality check: | ||||
| Finally, \(\text{Eq}\) performs the equality check: | ||||
| 
 | ||||
| {{< stack >}} | ||||
|     {{< stack_element >}}{{< /stack_element >}} | ||||
| @ -265,7 +265,7 @@ As before, we push the desired answer onto the stack: | ||||
|     {{< stack_element >}}factorial{{< /stack_element >}} | ||||
| {{< /stack >}} | ||||
| 
 | ||||
| Then, to follow convention 3, we must get rid of the arguments. We do this by using \\(\\text{Slide}\\): | ||||
| Then, to follow convention 3, we must get rid of the arguments. We do this by using \(\text{Slide}\): | ||||
| 
 | ||||
| {{< stack >}} | ||||
|     {{< stack_element >}}{{< /stack_element >}} | ||||
| @ -274,7 +274,7 @@ Then, to follow convention 3, we must get rid of the arguments. We do this by us | ||||
|     {{< stack_element >}}1{{< /stack_element >}} | ||||
| {{< /stack >}} | ||||
| 
 | ||||
| Great! The \\(\\textbf{if}\\) branch is now done, and we're left with the correct answer on the stack. | ||||
| Great! The \(\textbf{if}\) branch is now done, and we're left with the correct answer on the stack. | ||||
| Excellent! | ||||
| 
 | ||||
| It's the recursive case that's more interesting. To make the recursive call, we must carefully | ||||
| @ -294,7 +294,7 @@ The result is as follows: | ||||
|     {{< stack_element >}}factorial{{< /stack_element >}} | ||||
| {{< /stack >}} | ||||
| 
 | ||||
| Next, we must compute \\(n-1\\). This is pretty standard stuff: | ||||
| Next, we must compute \(n-1\). This is pretty standard stuff: | ||||
| 
 | ||||
| ``` | ||||
| Offset 1 | ||||
| @ -303,7 +303,7 @@ Add | ||||
| ``` | ||||
| 
 | ||||
| Why these three instructions? Well, with the function now on the top of the stack, the number argument is somewhat | ||||
| buried, and thus, we need to use \\(\\text{Offset} \\; 1\\) to get to it: | ||||
| buried, and thus, we need to use \(\text{Offset} \; 1\) to get to it: | ||||
| 
 | ||||
| {{< stack >}} | ||||
|     {{< stack_element >}}\(n\){{< /stack_element >}} | ||||
| @ -347,7 +347,7 @@ Call | ||||
| 
 | ||||
| If the function behaves as promised, this will remove the top 3 elements | ||||
| from the stack. The top element, which is the function itself, will | ||||
| be removed by the \\(\\text{Call}\\) operator. The two next two elements | ||||
| be removed by the \(\text{Call}\) operator. The two next two elements | ||||
| will be removed from the stack and replaced with the result of the function | ||||
| as per convention 2. The rest of the stack will remain untouched as | ||||
| per convention 4. We thus expect the stack to look as follows: | ||||
| @ -368,7 +368,7 @@ Mul | ||||
| Slide 1 | ||||
| ``` | ||||
| 
 | ||||
| The multiplication leaves us with \\(n(n-1)! = n!\\) on top of the stack, | ||||
| The multiplication leaves us with \(n(n-1)! = n!\) on top of the stack, | ||||
| and the function argument below it: | ||||
| 
 | ||||
| {{< stack >}} | ||||
| @ -378,7 +378,7 @@ and the function argument below it: | ||||
|     {{< stack_element >}}factorial{{< /stack_element >}} | ||||
| {{< /stack >}} | ||||
| 
 | ||||
| We then use \\(\\text{Slide}\\) so that only the factorial is on the | ||||
| We then use \(\text{Slide}\) so that only the factorial is on the | ||||
| stack, satisfying convention 3: | ||||
| 
 | ||||
| {{< stack >}} | ||||
| @ -410,7 +410,7 @@ if { | ||||
| } | ||||
| ``` | ||||
| 
 | ||||
| We can now invoke this function to compute \\(5!\\) as follows: | ||||
| We can now invoke this function to compute \(5!\) as follows: | ||||
| 
 | ||||
| ``` | ||||
| func { ... } | ||||
|  | ||||
| @ -41,7 +41,7 @@ The root node of this tree is 5, its left child is 2, and its right child is 7. | ||||
| 
 | ||||
| In this tree, the root node is 1, and the right child is 2. None of the nodes have a left child. Looking through this tree would be as slow as looking through a list - we'd have to look through all the numbers before we find 9. No good. | ||||
| 
 | ||||
| __Although the average efficiency of a Binary Search Tree is \\(O(\log n)\\), meaning that for a tree with \\(n\\) nodes, it will only need to examine up to \\(logn\\) of these nodes, it can nonetheless be as inefficient as \\(O(n)\\), meaning that it will have to examine every node in the tree.__ | ||||
| __Although the average efficiency of a Binary Search Tree is \(O(\log n)\), meaning that for a tree with \(n\) nodes, it will only need to examine up to \(logn\) of these nodes, it can nonetheless be as inefficient as \(O(n)\), meaning that it will have to examine every node in the tree.__ | ||||
| 
 | ||||
| This isn't good enough, and many clever algorithms have been invented to speed up the lookup of the tree by making sure that it remains _balanced_ - that is, it _isn't_ arranged like a simple list. Some of these algorithms include [Red-Black Trees](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree), [AVL Trees](https://en.wikipedia.org/wiki/AVL_tree), and, of course, B-Trees. | ||||
| 
 | ||||
| @ -71,7 +71,7 @@ The "nodes" in the BTreeDB are called "blocks" and are one of three types - "ind | ||||
| 
 | ||||
| To be able to read a Starbound BTree, the first thing that needs to be done is to read the general information about the tree. For this, we read the very beginning of the file, called the _header_. [GitHub user blixt has made an excellent table of values that make up the header](https://github.com/blixt/py-starbound/blob/master/FORMATS.md#btreedb5). The ones that matter to us are `Whether to use root node #2 instead` (hereby referred to as "swap", `Root node #1 block index` and `Root node #2 block index`. We also need to know the key size and block size of the B-Tree to be able to jump around in the file. | ||||
| 
 | ||||
| Once the header has been parsed (this is an exercise left up to the reader), the next step is to find the root node. This is following exactly the general lookup algorithm for a B-Tree. The index in the file (by bytes) of a block is \\(headerSize + blockSize \times n\\), where \\(n\\) is the block number. Depending on the value of `swap` (whether to use the second root node), we use either the index of root node 1 or 2 for \\(n\\), and move to that position. | ||||
| Once the header has been parsed (this is an exercise left up to the reader), the next step is to find the root node. This is following exactly the general lookup algorithm for a B-Tree. The index in the file (by bytes) of a block is \(headerSize + blockSize \times n\), where \(n\) is the block number. Depending on the value of `swap` (whether to use the second root node), we use either the index of root node 1 or 2 for \(n\), and move to that position. | ||||
| 
 | ||||
| Next, we proceed to identify the node that we've found. The first two bytes in that node are either the ASCII values of 'F', 'L', or 'I', representing, you guessed it, "Free", "Leaf", or "Index". If the node is an index node, we need to search it for the next node we need to examine. To do so, we first read two values from the node, two 32-bit integers. The first is the number of keys in the block, and the next is the block number of the first child node. | ||||
| 
 | ||||
| @ -89,7 +89,7 @@ As you can see, the number of children is one more than the number of keys, and | ||||
| 
 | ||||
| Simply, if the value we're searching for is bigger than the first key only, we go to the second child, if it's bigger than the second key, we go to the third child, etc. If our value is not bigger than any of the keys, we go to the 1st child. After we move to the index of the new child, we once again examine its type, and if it's still "II", we repeat the process. | ||||
| 
 | ||||
| Once we reach a "leaf" node, we're very close. After the two ASCII characters describing its type, the leaf node will contain a 32-bit integer describing the number of key-data pairs it has. Each key-data pair is made up of the key, a variable integer describing the length of the data, and the data itself. We examine one pair after another, carefully making sure that we don't exceed the end of the block, located at \\(headerSize + blockSize \times (n + 1) - 4\\). The reason that we subtract four from this equation is to save space for the address of the next block. As I mentioned above, leaf nodes, if their data is bigger than the block size, contain the block number of the next leaf node to which we can continue if we reach the end of the current leaf. If we do reach the end of the current leaf, we read the 32-bit integer telling us the number of the next block, jump to its index, and, after reading the first two bytes to ensure it's actually a leaf, continue reading our data. Once that's done, voila! We have our bytes. | ||||
| Once we reach a "leaf" node, we're very close. After the two ASCII characters describing its type, the leaf node will contain a 32-bit integer describing the number of key-data pairs it has. Each key-data pair is made up of the key, a variable integer describing the length of the data, and the data itself. We examine one pair after another, carefully making sure that we don't exceed the end of the block, located at \(headerSize + blockSize \times (n + 1) - 4\). The reason that we subtract four from this equation is to save space for the address of the next block. As I mentioned above, leaf nodes, if their data is bigger than the block size, contain the block number of the next leaf node to which we can continue if we reach the end of the current leaf. If we do reach the end of the current leaf, we read the 32-bit integer telling us the number of the next block, jump to its index, and, after reading the first two bytes to ensure it's actually a leaf, continue reading our data. Once that's done, voila! We have our bytes. | ||||
| 
 | ||||
| If we're using this method to read a Starbound world file, we need to also know that the data is zlib-compressed. I won't go far into the details, as it appears like both Python and Java provide libraries for the decompression of such data. | ||||
| 
 | ||||
|  | ||||
| @ -130,16 +130,16 @@ to values of the types. | ||||
| To get settled into this idea, let's look at a few 'well-known' examples: | ||||
| 
 | ||||
| * `(A,B)`, the tuple of two types `A` and `B` is equivalent to the | ||||
| proposition \\(A \land B\\), which means \\(A\\) and \\(B\\). Intuitively, | ||||
| to provide a proof of \\(A \land B\\), we have to provide the proofs of | ||||
| \\(A\\) and \\(B\\). | ||||
| proposition \(A \land B\), which means \(A\) and \(B\). Intuitively, | ||||
| to provide a proof of \(A \land B\), we have to provide the proofs of | ||||
| \(A\) and \(B\). | ||||
| * `Either A B`, which contains one of `A` or `B`, is equivalent | ||||
| to the proposition \\(A \lor B\\), which means \\(A\\) or \\(B\\). | ||||
| Intuitively, to provide a proof that either \\(A\\) or \\(B\\) | ||||
| to the proposition \(A \lor B\), which means \(A\) or \(B\). | ||||
| Intuitively, to provide a proof that either \(A\) or \(B\) | ||||
| is true, we need to provide one of them. | ||||
| * `A -> B`, the type of a function from `A` to `B`, is equivalent | ||||
| to the proposition \\(A \rightarrow B\\), which reads \\(A\\) | ||||
| implies \\(B\\). We can think of a function `A -> B` as creating | ||||
| to the proposition \(A \rightarrow B\), which reads \(A\) | ||||
| implies \(B\). We can think of a function `A -> B` as creating | ||||
| a proof of `B` given a proof of `A`.  | ||||
| 
 | ||||
| Now, consider Idris' unit type `()`: | ||||
| @ -150,7 +150,7 @@ data () = () | ||||
| 
 | ||||
| This type takes no arguments, and there's only one way to construct | ||||
| it. We can create a value of type `()` at any time, by just writing `()`. | ||||
| This type is equivalent to \\(\\text{true}\\): only one proof of it exists, | ||||
| This type is equivalent to \(\text{true}\): only one proof of it exists, | ||||
| and it requires no premises. It just is. | ||||
| 
 | ||||
| Consider also the type `Void`, which too is present in Idris: | ||||
| @ -163,7 +163,7 @@ data Void = -- Nothing | ||||
| The type `Void` has no constructors: it's impossible | ||||
| to create a value of this type, and therefore, it's  | ||||
| impossible to create a proof of `Void`. Thus, as you may have guessed, `Void` | ||||
| is equivalent to \\(\\text{false}\\). | ||||
| is equivalent to \(\text{false}\). | ||||
| 
 | ||||
| Finally, we get to a more complicated example: | ||||
| 
 | ||||
|  | ||||
		Loading…
	
		Reference in New Issue
	
	Block a user