Compare commits

...

2 Commits

Author SHA1 Message Date
697f083237 Reference more future posts in SPA intro
Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
2024-08-18 15:05:55 -10:00
3e97fdcfea Start work on the draft about semantics
Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
2024-08-18 15:05:39 -10:00
2 changed files with 208 additions and 1 deletions

View File

@ -88,7 +88,12 @@ I'd like to cover the following major topics, spending a couple of posts on each
which is not at all compiler-centric. We need to connect these two, showing
that the CFGs we produce "make sense" for our language, and that given
CFGs that make sense, our analysis produces results that match the language's
execution.
execution. To do so, I write about the language and its semantics
in {{< draftlink "Part 5: Our Programming Language" "05_spa_agda_semantics" >}},
then about building control flow graphs for the language in
{{< draftlink "Part 6: Control Flow Graphs" "06_spa_agda_cfg" >}}.
I then write about combining these two representations in
{{< draftlink "Part 7: Connecting Semantics and Control Flow Graphs" "07_spa_agda_semantics_and_cfg" >}}.
### Navigation
@ -98,3 +103,6 @@ Here are the posts that Ive written so far for this series:
* {{< draftlink "Combining Lattices" "02_spa_agda_combining_lattices" >}}
* {{< draftlink "Lattices of Finite Height" "03_spa_agda_fixed_height" >}}
* {{< draftlink "The Fixed-Point Algorithm" "04_spa_agda_fixedpoint" >}}
* {{< draftlink "Our Programming Language" "05_spa_agda_semantics" >}}
* {{< draftlink "Control Flow Graphs" "06_spa_agda_cfg" >}}.
* {{< draftlink "Connecting Semantics and Control Flow Graphs" "07_spa_agda_semantics_and_cfg" >}}.

View File

@ -0,0 +1,199 @@
---
title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 5: Our Programming Language"
series: "Static Program Analysis in Agda"
description: "In this post, I define the language that well serve as the object of our vartious analyses"
date: 2024-08-10T17:37:43-07:00
tags: ["Agda", "Programming Languages"]
draft: true
---
In the previous several posts, I've formalized the notion of lattices, which
are an essential ingredient to formalizing the analyses in Anders Møller's
lecutre notes. However, there can be no program analysis without a program
to analyze! In this post, I will define the (very simple) language that we
will be analyzing. An essential aspect of the language is its
[semantics](https://en.wikipedia.org/wiki/Semantics_(computer_science), which
simply speaking explains what each feature of the language does. At the end
of the previous article, I gave the following _inference rule_ which defined
(partially) how the `if`-`else` statement in the language works.
{{< latex >}}
\frac{\rho_1, e \Downarrow z \quad \neg (z = 0) \quad \rho_1,s_1 \Downarrow \rho_2}
{\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2}
{{< /latex >}}
Like I mentioned then, this rule reads as follows:
> If the condition of an `if`-`else` statement evaluates to a nonzero value,
> then to evaluate the statement, you evaluate its `then` branch.
Another similar --- but crucially, not equivlalent -- rule is the following:
{{< latex >}}
\frac{\rho_1, e \Downarrow z \quad z = 1 \quad \rho_1,s_1 \Downarrow \rho_2}
{\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2}
{{< /latex >}}
This time, the English interpretation of the rule is as follows:
> If the condition of an `if`-`else` statement evaluates to one,
> then to evaluate the statement, you evaluate its `then` branch.
These rules are certainly not equivalent. For instance, the former allows
the "then" branch to be executed when the condition is `2` two; however, in
the latter, the value of the conditional must be `1`. If our analysis were
_flow-sensitive_ (our first few will not be), then this difference would change
its output when determining the signs of the following program:
```
x = 2
if x {
y = - 1
} else {
y = 1
}
```
Using the first, more "relaxed" rule, the condition would be considered "true",
and the sign of `y` would be `-`. On the other hand, using the second,
"stricter" rule, the sign of `y` would be `+`. I stress that in this case,
I am showing a flow-sensitive analysis (one that can understand control flow
and make more speciifc predictions); for our simplest analyses, we will not
be aiming for flow-sensitivity. There is plenty of work to do even then.
The point of showing these two distinct rules is that we need to be very precise
about how the language will behave, because our analyses depend on that behavior.
Let's not get ahead of ourselves, though. I've motivated the need for semantics,
but there is much groundwork to be laid before we delve into the precise rules
of our language. After all, to define the language's semantics, we need to
have a language.
### The Syntax of Our Simple Language
I've shown a couple of examples our our language now, and there won't be that
much more to it. We can start with _expressions_: things that evaluate to
something. Some examples of expressions are `1`, `x`, and `2-(x+y)`. For our
specific language, the precise set of possible expressions can be given
by the following [Context-Free Grammar](https://en.wikipedia.org/wiki/Context-free_grammar):
{{< latex >}}
\begin{array}{rcll}
e & ::= & x & \text{(variables)} \\
& | & z & \text{(integer literals)} \\
& | & e + e & \text{(addition)} \\
& | & e - e & \text{(subtraction)}
\end{array}
{{< /latex >}}
The above can be read as follows:
> An expression \(e\) is one of the following things:
> 1. Some variable \(x\) [importantly \(x\) is a placeholder for _any_ variable,
> which could be `x` or `y` in our program code; specifically, \(x\) is
> a [_metavariable_](https://en.wikipedia.org/wiki/Metavariable).]
> 2. Some integer \(z\) [once again, \(z\) can be any integer, like 1, -42, etc.].
> 3. The addition of two other expressions [which could themselves be additions etc.].
> 4. The subtraction of two other expressions [which could also themselves be additions, subtractions, etc.].
Since expressions can be nested within other expressions --- which is necessary
to allow complicated code like `2-(x+y)` above --- they form a tree. Each node
is one of the elements of the grammar above (variable, addition, etc.). If
a node contains sub-expressions (like addition and subtraction do), then
these sub-expressions form sub-trees of the given node. This data structure
is called an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
Notably, though `2-(x+y)` has parentheses, our grammar above does not include
include them as a case. The reason for this is that the structure of an
abstract syntax tree is sufficient to encode the order in which the operations
should be evaluated.
{{< todo >}}
Probably two drawings of differently-associated ASTs here.
{{< /todo >}}
To an Agda programmer, the one-of-four-things definition above should read
quite similarly to the definition of an algebraic data type. Indeed, this
is how we can encode the abstract syntax tree of expressions:
{{< codelines "Agda" "agda-spa/Language/Base.agda" 12 16 >}}
The only departure from the grammar above is that I had to invent constructors
for the variable and integer cases, since Agda doesn't support implicit coercions.
This adds a little bit of extra overhead, requiring, for example, that we write
numbers as `# 42` instead of `42`.
Having defined expressions, the next thing on the menu is _statements_. Unlike
expressions, which just produce values, statements "do something"; an example
of a statement might be the following Python line:
```Python
print("Hello, world!")
```
The `print` function doesn't produce any value, but it does perform an action;
it prints its argument to the console!
For the formalization, it turns out to be convenient to separate "simple"
statements from "complex" ones. Pragmatically speaking, the difference is that
between the "simple" and the "complex" is control flow; simple statements
will be guaranteed to always execute one-by-one, without any decisions or jumps.
The reason for this will become clearer in subsequent posts; I will foreshadow
a bit by saying that consecutive simple statements can be placed into a single
[basic block](https://en.wikipedia.org/wiki/Basic_block).
The following is a group of three simple statements:
```
x = 1
y = x + 2
noop
```
These will always be executed in the same order, exactly once. Here, `noop`
is a conveneint type of statement that simply does nothing.
On the other hand, the following statement is not simple:
```
while x {
x = x - 1
}
```
it's not simple because it makes decisions about how the code should be executed;
if `x` is nonzero, it will try executing the statement in the body of the loop
(`x = x - 1`). Otherwise, it would skip evaluating that statement, and carry on
with subsequent code.
I first define simple statements using the `BasicStmt` type:
{{< codelines "Agda" "agda-spa/Language/Base.agda" 18 20 >}}
Complex statements are just called `Stmt`; they include loops, conditionals and
sequences (
{{< sidenote "right" "then-note" "\(s_1\ \text{then}\ s_2\)" >}}
The standard notation for sequencing in imperative languages is
\(s_1; s_2\). However, Agda gives special meaning to the semicolon,
and I couldn't find any passable symbolic alternatives.
{{< /sidenote >}} is a sequence where \(s_2\) is evaluated after \(s_1\)).
Complex statements subsume simple statements, which I model using the constructor
`⟨_⟩`.
{{< codelines "Agda" "agda-spa/Language/Base.agda" 25 29 >}}
For an example of using this encoding, take the following simple program:
```
var = 1
if var {
x = 1
}
```
The Agda version is:
{{< codelines "Agda" "agda-spa/Main.agda" 27 34 >}}
Notice how we used `noop` to express the fact that the `else` branch of the
conditional does nothing.