Start work on the draft about semantics

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
2024-08-18 15:05:39 -10:00 · 2024-08-18 15:05:39 -10:00 · 3e97fdcfea
commit 3e97fdcfea
parent 40007c427e
1 changed files with 199 additions and 0 deletions
--- a/content/blog/05_spa_agda_semantics.md
+++ b/content/blog/05_spa_agda_semantics.md
@ -0,0 +1,199 @@
+---
+title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 5: Our Programming Language"
+series: "Static Program Analysis in Agda"
+description: "In this post, I define the language that well serve as the object of our vartious analyses"
+date: 2024-08-10T17:37:43-07:00
+tags: ["Agda", "Programming Languages"]
+draft: true
+---
+
+In the previous several posts, I've formalized the notion of lattices, which
+are an essential ingredient to formalizing the analyses in Anders Møller's
+lecutre notes. However, there can be no program analysis without a program
+to analyze! In this post, I will define the (very simple) language that we
+will be analyzing. An essential aspect of the language is its
+[semantics](https://en.wikipedia.org/wiki/Semantics_(computer_science), which
+simply speaking explains what each feature of the language does. At the end
+of the previous article, I gave the following _inference rule_ which defined
+(partially) how the `if`-`else` statement in the language works.
+
+{{< latex >}}
+\frac{\rho_1, e \Downarrow z \quad \neg (z = 0) \quad \rho_1,s_1 \Downarrow \rho_2}
+     {\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2}
+{{< /latex >}}
+
+Like I mentioned then, this rule reads as follows:
+
+> If the condition of an `if`-`else` statement evaluates to a nonzero value,
+> then to evaluate the statement, you evaluate its `then` branch.
+
+Another similar --- but crucially, not equivlalent -- rule is the following:
+
+{{< latex >}}
+\frac{\rho_1, e \Downarrow z \quad z = 1 \quad \rho_1,s_1 \Downarrow \rho_2}
+     {\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2}
+{{< /latex >}}
+
+This time, the English interpretation of the rule is as follows:
+
+> If the condition of an `if`-`else` statement evaluates to one,
+> then to evaluate the statement, you evaluate its `then` branch.
+
+These rules are certainly not equivalent. For instance, the former allows
+the "then" branch to be executed when the condition is `2` two; however, in
+the latter, the value of the conditional must be `1`. If our analysis were
+_flow-sensitive_ (our first few will not be), then this difference would change
+its output when determining the signs of the following program:
+
+```
+x = 2
+if x {
+  y = - 1
+} else {
+  y = 1
+}
+```
+
+Using the first, more "relaxed" rule, the condition would be considered "true",
+and the sign of `y` would be `-`. On the other hand, using the second,
+"stricter" rule, the sign of `y` would be `+`. I stress that in this case,
+I am showing a flow-sensitive analysis (one that can understand control flow
+and make more speciifc predictions); for our simplest analyses, we will not
+be aiming for flow-sensitivity. There is plenty of work to do even then.
+
+The point of showing these two distinct rules is that we need to be very precise
+about how the language will behave, because our analyses depend on that behavior.
+
+Let's not get ahead of ourselves, though. I've motivated the need for semantics,
+but there is much groundwork to be laid before we delve into the precise rules
+of our language. After all, to define the language's semantics, we need to
+have a language.
+
+### The Syntax of Our Simple Language
+
+I've shown a couple of examples our our language now, and there won't be that
+much more to it. We can start with _expressions_: things that evaluate to
+something. Some examples of expressions are `1`, `x`, and `2-(x+y)`. For our
+specific language, the precise set of possible expressions can be given
+by the following [Context-Free Grammar](https://en.wikipedia.org/wiki/Context-free_grammar):
+
+{{< latex >}}
+\begin{array}{rcll}
+    e & ::= & x & \text{(variables)} \\
+      &  |  & z & \text{(integer literals)} \\
+      &  |  & e + e & \text{(addition)} \\
+      &  |  & e - e & \text{(subtraction)}
+\end{array}
+{{< /latex >}}
+
+The above can be read as follows:
+
+> An expression \(e\) is one of the following things:
+> 1. Some variable \(x\) [importantly \(x\) is a placeholder for _any_ variable,
+>    which could be `x` or `y` in our program code; specifically, \(x\) is
+>    a [_metavariable_](https://en.wikipedia.org/wiki/Metavariable).]
+> 2. Some integer \(z\) [once again, \(z\) can be any integer, like 1, -42, etc.].
+> 3. The addition of two other expressions [which could themselves be additions etc.].
+> 4. The subtraction of two other expressions [which could also themselves be additions, subtractions, etc.].
+
+Since expressions can be nested within other expressions --- which is necessary
+to allow complicated code like `2-(x+y)` above --- they form a tree. Each node
+is one of the elements of the grammar above (variable, addition, etc.). If
+a node contains sub-expressions (like addition and subtraction do), then
+these sub-expressions form sub-trees of the given node. This data structure
+is called an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
+
+Notably, though `2-(x+y)` has parentheses, our grammar above does not include
+include them as a case. The reason for this is that the structure of an
+abstract syntax tree is sufficient to encode the order in which the operations
+should be evaluated.
+
+{{< todo >}}
+Probably two drawings of differently-associated ASTs here.
+{{< /todo >}}
+
+To an Agda programmer, the one-of-four-things definition above should read
+quite similarly to the definition of an algebraic data type. Indeed, this
+is how we can encode the abstract syntax tree of expressions:
+
+{{< codelines "Agda" "agda-spa/Language/Base.agda" 12 16 >}}
+
+The only departure from the grammar above is that I had to invent constructors
+for the variable and integer cases, since Agda doesn't support implicit coercions.
+This adds a little bit of extra overhead, requiring, for example, that we write
+numbers as `# 42` instead of `42`.
+
+Having defined expressions, the next thing on the menu is _statements_. Unlike
+expressions, which just produce values, statements "do something"; an example
+of a statement might be the following Python line:
+
+```Python
+print("Hello, world!")
+```
+
+The `print` function doesn't produce any value, but it does perform an action;
+it prints its argument to the console!
+
+For the formalization, it turns out to be convenient to separate "simple"
+statements from "complex" ones. Pragmatically speaking, the difference is that
+between the "simple" and the "complex" is control flow; simple statements
+will be guaranteed to always execute one-by-one, without any decisions or jumps.
+The reason for this will become clearer in subsequent posts; I will foreshadow
+a bit by saying that consecutive simple statements can be placed into a single
+[basic block](https://en.wikipedia.org/wiki/Basic_block).
+
+The following is a group of three simple statements:
+
+```
+x = 1
+y = x + 2
+noop
+```
+
+These will always be executed in the same order, exactly once. Here, `noop`
+is a conveneint type of statement that simply does nothing.
+
+On the other hand, the following statement is not simple:
+
+```
+while x {
+  x = x - 1
+}
+```
+
+it's not simple because it makes decisions about how the code should be executed;
+if `x` is nonzero, it will try executing the statement in the body of the loop
+(`x = x - 1`). Otherwise, it would skip evaluating that statement, and carry on
+with subsequent code.
+
+I first define simple statements using the `BasicStmt` type:
+
+{{< codelines "Agda" "agda-spa/Language/Base.agda" 18 20 >}}
+
+Complex statements are just called `Stmt`; they include loops, conditionals and
+sequences (
+{{< sidenote "right" "then-note" "\(s_1\ \text{then}\ s_2\)" >}}
+The standard notation for sequencing in imperative languages is
+\(s_1; s_2\). However, Agda gives special meaning to the semicolon,
+and I couldn't find any passable symbolic alternatives.
+{{< /sidenote >}} is a sequence where \(s_2\) is evaluated after \(s_1\)).
+Complex statements subsume simple statements, which I model using the constructor
+`⟨_⟩`.
+
+{{< codelines "Agda" "agda-spa/Language/Base.agda" 25 29 >}}
+
+For an example of using this encoding, take the following simple program:
+
+```
+var = 1
+if var {
+  x = 1
+}
+```
+
+The Agda version is:
+
+{{< codelines "Agda" "agda-spa/Main.agda" 27 34 >}}
+
+Notice how we used `noop` to express the fact that the `else` branch of the
+conditional does nothing.