diff --git a/content/blog/05_spa_agda_semantics.md b/content/blog/05_spa_agda_semantics.md new file mode 100644 index 0000000..33aa7ad --- /dev/null +++ b/content/blog/05_spa_agda_semantics.md @@ -0,0 +1,199 @@ +--- +title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 5: Our Programming Language" +series: "Static Program Analysis in Agda" +description: "In this post, I define the language that well serve as the object of our vartious analyses" +date: 2024-08-10T17:37:43-07:00 +tags: ["Agda", "Programming Languages"] +draft: true +--- + +In the previous several posts, I've formalized the notion of lattices, which +are an essential ingredient to formalizing the analyses in Anders Møller's +lecutre notes. However, there can be no program analysis without a program +to analyze! In this post, I will define the (very simple) language that we +will be analyzing. An essential aspect of the language is its +[semantics](https://en.wikipedia.org/wiki/Semantics_(computer_science), which +simply speaking explains what each feature of the language does. At the end +of the previous article, I gave the following _inference rule_ which defined +(partially) how the `if`-`else` statement in the language works. + +{{< latex >}} +\frac{\rho_1, e \Downarrow z \quad \neg (z = 0) \quad \rho_1,s_1 \Downarrow \rho_2} + {\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2} +{{< /latex >}} + +Like I mentioned then, this rule reads as follows: + +> If the condition of an `if`-`else` statement evaluates to a nonzero value, +> then to evaluate the statement, you evaluate its `then` branch. + +Another similar --- but crucially, not equivlalent -- rule is the following: + +{{< latex >}} +\frac{\rho_1, e \Downarrow z \quad z = 1 \quad \rho_1,s_1 \Downarrow \rho_2} + {\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2} +{{< /latex >}} + +This time, the English interpretation of the rule is as follows: + +> If the condition of an `if`-`else` statement evaluates to one, +> then to evaluate the statement, you evaluate its `then` branch. + +These rules are certainly not equivalent. For instance, the former allows +the "then" branch to be executed when the condition is `2` two; however, in +the latter, the value of the conditional must be `1`. If our analysis were +_flow-sensitive_ (our first few will not be), then this difference would change +its output when determining the signs of the following program: + +``` +x = 2 +if x { + y = - 1 +} else { + y = 1 +} +``` + +Using the first, more "relaxed" rule, the condition would be considered "true", +and the sign of `y` would be `-`. On the other hand, using the second, +"stricter" rule, the sign of `y` would be `+`. I stress that in this case, +I am showing a flow-sensitive analysis (one that can understand control flow +and make more speciifc predictions); for our simplest analyses, we will not +be aiming for flow-sensitivity. There is plenty of work to do even then. + +The point of showing these two distinct rules is that we need to be very precise +about how the language will behave, because our analyses depend on that behavior. + +Let's not get ahead of ourselves, though. I've motivated the need for semantics, +but there is much groundwork to be laid before we delve into the precise rules +of our language. After all, to define the language's semantics, we need to +have a language. + +### The Syntax of Our Simple Language + +I've shown a couple of examples our our language now, and there won't be that +much more to it. We can start with _expressions_: things that evaluate to +something. Some examples of expressions are `1`, `x`, and `2-(x+y)`. For our +specific language, the precise set of possible expressions can be given +by the following [Context-Free Grammar](https://en.wikipedia.org/wiki/Context-free_grammar): + +{{< latex >}} +\begin{array}{rcll} + e & ::= & x & \text{(variables)} \\ + & | & z & \text{(integer literals)} \\ + & | & e + e & \text{(addition)} \\ + & | & e - e & \text{(subtraction)} +\end{array} +{{< /latex >}} + +The above can be read as follows: + +> An expression \(e\) is one of the following things: +> 1. Some variable \(x\) [importantly \(x\) is a placeholder for _any_ variable, +> which could be `x` or `y` in our program code; specifically, \(x\) is +> a [_metavariable_](https://en.wikipedia.org/wiki/Metavariable).] +> 2. Some integer \(z\) [once again, \(z\) can be any integer, like 1, -42, etc.]. +> 3. The addition of two other expressions [which could themselves be additions etc.]. +> 4. The subtraction of two other expressions [which could also themselves be additions, subtractions, etc.]. + +Since expressions can be nested within other expressions --- which is necessary +to allow complicated code like `2-(x+y)` above --- they form a tree. Each node +is one of the elements of the grammar above (variable, addition, etc.). If +a node contains sub-expressions (like addition and subtraction do), then +these sub-expressions form sub-trees of the given node. This data structure +is called an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree). + +Notably, though `2-(x+y)` has parentheses, our grammar above does not include +include them as a case. The reason for this is that the structure of an +abstract syntax tree is sufficient to encode the order in which the operations +should be evaluated. + +{{< todo >}} +Probably two drawings of differently-associated ASTs here. +{{< /todo >}} + +To an Agda programmer, the one-of-four-things definition above should read +quite similarly to the definition of an algebraic data type. Indeed, this +is how we can encode the abstract syntax tree of expressions: + +{{< codelines "Agda" "agda-spa/Language/Base.agda" 12 16 >}} + +The only departure from the grammar above is that I had to invent constructors +for the variable and integer cases, since Agda doesn't support implicit coercions. +This adds a little bit of extra overhead, requiring, for example, that we write +numbers as `# 42` instead of `42`. + +Having defined expressions, the next thing on the menu is _statements_. Unlike +expressions, which just produce values, statements "do something"; an example +of a statement might be the following Python line: + +```Python +print("Hello, world!") +``` + +The `print` function doesn't produce any value, but it does perform an action; +it prints its argument to the console! + +For the formalization, it turns out to be convenient to separate "simple" +statements from "complex" ones. Pragmatically speaking, the difference is that +between the "simple" and the "complex" is control flow; simple statements +will be guaranteed to always execute one-by-one, without any decisions or jumps. +The reason for this will become clearer in subsequent posts; I will foreshadow +a bit by saying that consecutive simple statements can be placed into a single +[basic block](https://en.wikipedia.org/wiki/Basic_block). + +The following is a group of three simple statements: + +``` +x = 1 +y = x + 2 +noop +``` + +These will always be executed in the same order, exactly once. Here, `noop` +is a conveneint type of statement that simply does nothing. + +On the other hand, the following statement is not simple: + +``` +while x { + x = x - 1 +} +``` + +it's not simple because it makes decisions about how the code should be executed; +if `x` is nonzero, it will try executing the statement in the body of the loop +(`x = x - 1`). Otherwise, it would skip evaluating that statement, and carry on +with subsequent code. + +I first define simple statements using the `BasicStmt` type: + +{{< codelines "Agda" "agda-spa/Language/Base.agda" 18 20 >}} + +Complex statements are just called `Stmt`; they include loops, conditionals and +sequences ( +{{< sidenote "right" "then-note" "\(s_1\ \text{then}\ s_2\)" >}} +The standard notation for sequencing in imperative languages is +\(s_1; s_2\). However, Agda gives special meaning to the semicolon, +and I couldn't find any passable symbolic alternatives. +{{< /sidenote >}} is a sequence where \(s_2\) is evaluated after \(s_1\)). +Complex statements subsume simple statements, which I model using the constructor +`⟨_⟩`. + +{{< codelines "Agda" "agda-spa/Language/Base.agda" 25 29 >}} + +For an example of using this encoding, take the following simple program: + +``` +var = 1 +if var { + x = 1 +} +``` + +The Agda version is: + +{{< codelines "Agda" "agda-spa/Main.agda" 27 34 >}} + +Notice how we used `noop` to express the fact that the `else` branch of the +conditional does nothing.