Start work on the draft about semantics
Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
This commit is contained in:
parent
40007c427e
commit
3e97fdcfea
199
content/blog/05_spa_agda_semantics.md
Normal file
199
content/blog/05_spa_agda_semantics.md
Normal file
|
@ -0,0 +1,199 @@
|
|||
---
|
||||
title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 5: Our Programming Language"
|
||||
series: "Static Program Analysis in Agda"
|
||||
description: "In this post, I define the language that well serve as the object of our vartious analyses"
|
||||
date: 2024-08-10T17:37:43-07:00
|
||||
tags: ["Agda", "Programming Languages"]
|
||||
draft: true
|
||||
---
|
||||
|
||||
In the previous several posts, I've formalized the notion of lattices, which
|
||||
are an essential ingredient to formalizing the analyses in Anders Møller's
|
||||
lecutre notes. However, there can be no program analysis without a program
|
||||
to analyze! In this post, I will define the (very simple) language that we
|
||||
will be analyzing. An essential aspect of the language is its
|
||||
[semantics](https://en.wikipedia.org/wiki/Semantics_(computer_science), which
|
||||
simply speaking explains what each feature of the language does. At the end
|
||||
of the previous article, I gave the following _inference rule_ which defined
|
||||
(partially) how the `if`-`else` statement in the language works.
|
||||
|
||||
{{< latex >}}
|
||||
\frac{\rho_1, e \Downarrow z \quad \neg (z = 0) \quad \rho_1,s_1 \Downarrow \rho_2}
|
||||
{\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2}
|
||||
{{< /latex >}}
|
||||
|
||||
Like I mentioned then, this rule reads as follows:
|
||||
|
||||
> If the condition of an `if`-`else` statement evaluates to a nonzero value,
|
||||
> then to evaluate the statement, you evaluate its `then` branch.
|
||||
|
||||
Another similar --- but crucially, not equivlalent -- rule is the following:
|
||||
|
||||
{{< latex >}}
|
||||
\frac{\rho_1, e \Downarrow z \quad z = 1 \quad \rho_1,s_1 \Downarrow \rho_2}
|
||||
{\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2}
|
||||
{{< /latex >}}
|
||||
|
||||
This time, the English interpretation of the rule is as follows:
|
||||
|
||||
> If the condition of an `if`-`else` statement evaluates to one,
|
||||
> then to evaluate the statement, you evaluate its `then` branch.
|
||||
|
||||
These rules are certainly not equivalent. For instance, the former allows
|
||||
the "then" branch to be executed when the condition is `2` two; however, in
|
||||
the latter, the value of the conditional must be `1`. If our analysis were
|
||||
_flow-sensitive_ (our first few will not be), then this difference would change
|
||||
its output when determining the signs of the following program:
|
||||
|
||||
```
|
||||
x = 2
|
||||
if x {
|
||||
y = - 1
|
||||
} else {
|
||||
y = 1
|
||||
}
|
||||
```
|
||||
|
||||
Using the first, more "relaxed" rule, the condition would be considered "true",
|
||||
and the sign of `y` would be `-`. On the other hand, using the second,
|
||||
"stricter" rule, the sign of `y` would be `+`. I stress that in this case,
|
||||
I am showing a flow-sensitive analysis (one that can understand control flow
|
||||
and make more speciifc predictions); for our simplest analyses, we will not
|
||||
be aiming for flow-sensitivity. There is plenty of work to do even then.
|
||||
|
||||
The point of showing these two distinct rules is that we need to be very precise
|
||||
about how the language will behave, because our analyses depend on that behavior.
|
||||
|
||||
Let's not get ahead of ourselves, though. I've motivated the need for semantics,
|
||||
but there is much groundwork to be laid before we delve into the precise rules
|
||||
of our language. After all, to define the language's semantics, we need to
|
||||
have a language.
|
||||
|
||||
### The Syntax of Our Simple Language
|
||||
|
||||
I've shown a couple of examples our our language now, and there won't be that
|
||||
much more to it. We can start with _expressions_: things that evaluate to
|
||||
something. Some examples of expressions are `1`, `x`, and `2-(x+y)`. For our
|
||||
specific language, the precise set of possible expressions can be given
|
||||
by the following [Context-Free Grammar](https://en.wikipedia.org/wiki/Context-free_grammar):
|
||||
|
||||
{{< latex >}}
|
||||
\begin{array}{rcll}
|
||||
e & ::= & x & \text{(variables)} \\
|
||||
& | & z & \text{(integer literals)} \\
|
||||
& | & e + e & \text{(addition)} \\
|
||||
& | & e - e & \text{(subtraction)}
|
||||
\end{array}
|
||||
{{< /latex >}}
|
||||
|
||||
The above can be read as follows:
|
||||
|
||||
> An expression \(e\) is one of the following things:
|
||||
> 1. Some variable \(x\) [importantly \(x\) is a placeholder for _any_ variable,
|
||||
> which could be `x` or `y` in our program code; specifically, \(x\) is
|
||||
> a [_metavariable_](https://en.wikipedia.org/wiki/Metavariable).]
|
||||
> 2. Some integer \(z\) [once again, \(z\) can be any integer, like 1, -42, etc.].
|
||||
> 3. The addition of two other expressions [which could themselves be additions etc.].
|
||||
> 4. The subtraction of two other expressions [which could also themselves be additions, subtractions, etc.].
|
||||
|
||||
Since expressions can be nested within other expressions --- which is necessary
|
||||
to allow complicated code like `2-(x+y)` above --- they form a tree. Each node
|
||||
is one of the elements of the grammar above (variable, addition, etc.). If
|
||||
a node contains sub-expressions (like addition and subtraction do), then
|
||||
these sub-expressions form sub-trees of the given node. This data structure
|
||||
is called an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
|
||||
|
||||
Notably, though `2-(x+y)` has parentheses, our grammar above does not include
|
||||
include them as a case. The reason for this is that the structure of an
|
||||
abstract syntax tree is sufficient to encode the order in which the operations
|
||||
should be evaluated.
|
||||
|
||||
{{< todo >}}
|
||||
Probably two drawings of differently-associated ASTs here.
|
||||
{{< /todo >}}
|
||||
|
||||
To an Agda programmer, the one-of-four-things definition above should read
|
||||
quite similarly to the definition of an algebraic data type. Indeed, this
|
||||
is how we can encode the abstract syntax tree of expressions:
|
||||
|
||||
{{< codelines "Agda" "agda-spa/Language/Base.agda" 12 16 >}}
|
||||
|
||||
The only departure from the grammar above is that I had to invent constructors
|
||||
for the variable and integer cases, since Agda doesn't support implicit coercions.
|
||||
This adds a little bit of extra overhead, requiring, for example, that we write
|
||||
numbers as `# 42` instead of `42`.
|
||||
|
||||
Having defined expressions, the next thing on the menu is _statements_. Unlike
|
||||
expressions, which just produce values, statements "do something"; an example
|
||||
of a statement might be the following Python line:
|
||||
|
||||
```Python
|
||||
print("Hello, world!")
|
||||
```
|
||||
|
||||
The `print` function doesn't produce any value, but it does perform an action;
|
||||
it prints its argument to the console!
|
||||
|
||||
For the formalization, it turns out to be convenient to separate "simple"
|
||||
statements from "complex" ones. Pragmatically speaking, the difference is that
|
||||
between the "simple" and the "complex" is control flow; simple statements
|
||||
will be guaranteed to always execute one-by-one, without any decisions or jumps.
|
||||
The reason for this will become clearer in subsequent posts; I will foreshadow
|
||||
a bit by saying that consecutive simple statements can be placed into a single
|
||||
[basic block](https://en.wikipedia.org/wiki/Basic_block).
|
||||
|
||||
The following is a group of three simple statements:
|
||||
|
||||
```
|
||||
x = 1
|
||||
y = x + 2
|
||||
noop
|
||||
```
|
||||
|
||||
These will always be executed in the same order, exactly once. Here, `noop`
|
||||
is a conveneint type of statement that simply does nothing.
|
||||
|
||||
On the other hand, the following statement is not simple:
|
||||
|
||||
```
|
||||
while x {
|
||||
x = x - 1
|
||||
}
|
||||
```
|
||||
|
||||
it's not simple because it makes decisions about how the code should be executed;
|
||||
if `x` is nonzero, it will try executing the statement in the body of the loop
|
||||
(`x = x - 1`). Otherwise, it would skip evaluating that statement, and carry on
|
||||
with subsequent code.
|
||||
|
||||
I first define simple statements using the `BasicStmt` type:
|
||||
|
||||
{{< codelines "Agda" "agda-spa/Language/Base.agda" 18 20 >}}
|
||||
|
||||
Complex statements are just called `Stmt`; they include loops, conditionals and
|
||||
sequences (
|
||||
{{< sidenote "right" "then-note" "\(s_1\ \text{then}\ s_2\)" >}}
|
||||
The standard notation for sequencing in imperative languages is
|
||||
\(s_1; s_2\). However, Agda gives special meaning to the semicolon,
|
||||
and I couldn't find any passable symbolic alternatives.
|
||||
{{< /sidenote >}} is a sequence where \(s_2\) is evaluated after \(s_1\)).
|
||||
Complex statements subsume simple statements, which I model using the constructor
|
||||
`⟨_⟩`.
|
||||
|
||||
{{< codelines "Agda" "agda-spa/Language/Base.agda" 25 29 >}}
|
||||
|
||||
For an example of using this encoding, take the following simple program:
|
||||
|
||||
```
|
||||
var = 1
|
||||
if var {
|
||||
x = 1
|
||||
}
|
||||
```
|
||||
|
||||
The Agda version is:
|
||||
|
||||
{{< codelines "Agda" "agda-spa/Main.agda" 27 34 >}}
|
||||
|
||||
Notice how we used `noop` to express the fact that the `else` branch of the
|
||||
conditional does nothing.
|
Loading…
Reference in New Issue
Block a user