|
|
@ -2,9 +2,8 @@
|
|
|
|
title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 5: Our Programming Language"
|
|
|
|
title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 5: Our Programming Language"
|
|
|
|
series: "Static Program Analysis in Agda"
|
|
|
|
series: "Static Program Analysis in Agda"
|
|
|
|
description: "In this post, I define the language that well serve as the object of our vartious analyses"
|
|
|
|
description: "In this post, I define the language that well serve as the object of our vartious analyses"
|
|
|
|
date: 2024-08-10T17:37:43-07:00
|
|
|
|
date: 2024-11-03T17:50:27-08:00
|
|
|
|
tags: ["Agda", "Programming Languages"]
|
|
|
|
tags: ["Agda", "Programming Languages"]
|
|
|
|
draft: true
|
|
|
|
|
|
|
|
custom_js: ["parser.js"]
|
|
|
|
custom_js: ["parser.js"]
|
|
|
|
bergamot:
|
|
|
|
bergamot:
|
|
|
|
render_presets:
|
|
|
|
render_presets:
|
|
|
@ -117,11 +116,31 @@ is called an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_synta
|
|
|
|
Notably, though `2-(x+y)` has parentheses, our grammar above does not include
|
|
|
|
Notably, though `2-(x+y)` has parentheses, our grammar above does not include
|
|
|
|
include them as a case. The reason for this is that the structure of an
|
|
|
|
include them as a case. The reason for this is that the structure of an
|
|
|
|
abstract syntax tree is sufficient to encode the order in which the operations
|
|
|
|
abstract syntax tree is sufficient to encode the order in which the operations
|
|
|
|
should be evaluated.
|
|
|
|
should be evaluated. Since I lack a nice way of drawing ASTs, I will use
|
|
|
|
|
|
|
|
an ASCII drawing to show an example.
|
|
|
|
|
|
|
|
|
|
|
|
{{< todo >}}
|
|
|
|
```
|
|
|
|
Probably two drawings of differently-associated ASTs here.
|
|
|
|
Expression: 2 - (x+y)
|
|
|
|
{{< /todo >}}
|
|
|
|
(-)
|
|
|
|
|
|
|
|
/ \
|
|
|
|
|
|
|
|
2 (+)
|
|
|
|
|
|
|
|
/ \
|
|
|
|
|
|
|
|
x y
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Expression: (2-x) + y
|
|
|
|
|
|
|
|
(+)
|
|
|
|
|
|
|
|
/ \
|
|
|
|
|
|
|
|
(-) y
|
|
|
|
|
|
|
|
/ \
|
|
|
|
|
|
|
|
2 x
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Above, in the first AST, `(+)` is a child of the `(-)` node, which means
|
|
|
|
|
|
|
|
that it's a sub-expression. As a result, that subexpression is evaluated first,
|
|
|
|
|
|
|
|
before evaluating `(-)`, and so, the AST expresents `2-(x+y)`. In the other
|
|
|
|
|
|
|
|
example, `(-)` is a child of `(+)`, and is therefore evaluated first. The resulting
|
|
|
|
|
|
|
|
association encoded by that AST is `(2-x)+y`.
|
|
|
|
|
|
|
|
|
|
|
|
To an Agda programmer, the one-of-four-things definition above should read
|
|
|
|
To an Agda programmer, the one-of-four-things definition above should read
|
|
|
|
quite similarly to the definition of an algebraic data type. Indeed, this
|
|
|
|
quite similarly to the definition of an algebraic data type. Indeed, this
|
|
|
@ -249,7 +268,7 @@ thus be written as follows:
|
|
|
|
{{< /latex >}}
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
|
|
Now, on to the actual rules for how to evaluate expressions. Most simply,
|
|
|
|
Now, on to the actual rules for how to evaluate expressions. Most simply,
|
|
|
|
integer literals `1` just evaluate to themselves.
|
|
|
|
integer literals like `1` just evaluate to themselves.
|
|
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
{{< latex >}}
|
|
|
|
\frac{n \in \text{Int}}{\rho, n \Downarrow n}
|
|
|
|
\frac{n \in \text{Int}}{\rho, n \Downarrow n}
|
|
|
@ -285,6 +304,9 @@ the exact rules that were used in computing a particular value. The variables
|
|
|
|
`x`, `y`, and `z` are pre-defined for your convenience.
|
|
|
|
`x`, `y`, and `z` are pre-defined for your convenience.
|
|
|
|
|
|
|
|
|
|
|
|
{{< bergamot_widget id="expr-widget" query="" prompt="eval(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?v)" modes="Expression:Expression" >}}
|
|
|
|
{{< bergamot_widget id="expr-widget" query="" prompt="eval(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?v)" modes="Expression:Expression" >}}
|
|
|
|
|
|
|
|
hidden section "" {
|
|
|
|
|
|
|
|
Eq @ eq(?x, ?x) <-;
|
|
|
|
|
|
|
|
}
|
|
|
|
section "" {
|
|
|
|
section "" {
|
|
|
|
EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
|
|
|
|
EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
|
|
|
|
EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
|
|
|
|
EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
|
|
|
@ -295,7 +317,7 @@ section "" {
|
|
|
|
}
|
|
|
|
}
|
|
|
|
hidden section "" {
|
|
|
|
hidden section "" {
|
|
|
|
EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
|
|
|
|
EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
|
|
|
|
EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(symeq(?x, ?y));
|
|
|
|
EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(eq(?x, ?y));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
{{< /bergamot_widget >}}
|
|
|
|
{{< /bergamot_widget >}}
|
|
|
|
|
|
|
|
|
|
|
@ -313,7 +335,7 @@ I showed above.
|
|
|
|
#### Simple Statements
|
|
|
|
#### Simple Statements
|
|
|
|
The main difference between formalizing (simple and "normal") statements is
|
|
|
|
The main difference between formalizing (simple and "normal") statements is
|
|
|
|
that they modify the environment. If `x` has one value, writing `x = x + 1` will
|
|
|
|
that they modify the environment. If `x` has one value, writing `x = x + 1` will
|
|
|
|
certainly change that value. On the other hands, statements don't produce values.
|
|
|
|
certainly change that value. On the other hand, statements don't produce values.
|
|
|
|
So, we will be writing claims like \(\rho_1 , \textit{bs} \Rightarrow \rho_2\)
|
|
|
|
So, we will be writing claims like \(\rho_1 , \textit{bs} \Rightarrow \rho_2\)
|
|
|
|
to say that the basic statement \(\textit{bs}\), when starting in environment
|
|
|
|
to say that the basic statement \(\textit{bs}\), when starting in environment
|
|
|
|
\(\rho_1\), will produce environment \(\rho_2\). Here's an example:
|
|
|
|
\(\rho_1\), will produce environment \(\rho_2\). Here's an example:
|
|
|
@ -349,6 +371,9 @@ a Bergamot instance, this time with simple statements. Try out `noop` or some
|
|
|
|
sort of variable assignment, like `x = x + 1`.
|
|
|
|
sort of variable assignment, like `x = x + 1`.
|
|
|
|
|
|
|
|
|
|
|
|
{{< bergamot_widget id="basic-stmt-widget" query="" prompt="stepbasic(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?env)" modes="Basic Statement:Basic Statement" >}}
|
|
|
|
{{< bergamot_widget id="basic-stmt-widget" query="" prompt="stepbasic(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?env)" modes="Basic Statement:Basic Statement" >}}
|
|
|
|
|
|
|
|
hidden section "" {
|
|
|
|
|
|
|
|
Eq @ eq(?x, ?x) <-;
|
|
|
|
|
|
|
|
}
|
|
|
|
hidden section "" {
|
|
|
|
hidden section "" {
|
|
|
|
EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
|
|
|
|
EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
|
|
|
|
EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
|
|
|
|
EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
|
|
|
@ -363,7 +388,7 @@ section "" {
|
|
|
|
}
|
|
|
|
}
|
|
|
|
hidden section "" {
|
|
|
|
hidden section "" {
|
|
|
|
EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
|
|
|
|
EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
|
|
|
|
EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(symeq(?x, ?y));
|
|
|
|
EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(eq(?x, ?y));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
{{< /bergamot_widget >}}
|
|
|
|
{{< /bergamot_widget >}}
|
|
|
|
|
|
|
|
|
|
|
@ -412,14 +437,52 @@ to evaluate the _expression_ that serves as the condition. We then had an
|
|
|
|
additional premise that requires the truthiness of the resulting value \(v\).
|
|
|
|
additional premise that requires the truthiness of the resulting value \(v\).
|
|
|
|
The rule for evaluating a conditional with a "false" branch is very similar.
|
|
|
|
The rule for evaluating a conditional with a "false" branch is very similar.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
{{< latex >}}
|
|
|
|
\frac
|
|
|
|
\frac
|
|
|
|
{ \rho_1 , e \Downarrow v \quad v = 0 \quad \rho_1, s_2 \Rightarrow \rho_2}
|
|
|
|
{ \rho_1 , e \Downarrow v \quad v = 0 \quad \rho_1, s_2 \Rightarrow \rho_2}
|
|
|
|
{ \rho_1, \textbf{if}\ e\ \{ s_1 \}\ \textbf{else}\ \{ s_2 \}\ \Rightarrow \rho_2 }
|
|
|
|
{ \rho_1, \textbf{if}\ e\ \{ s_1 \}\ \textbf{else}\ \{ s_2 \}\ \Rightarrow \rho_2 }
|
|
|
|
{{< /latex >}}
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Now that we have rules for conditional statements, it will be surprisingly easy
|
|
|
|
|
|
|
|
to define the rules for `while` loops. A `while` loop will also have two rules,
|
|
|
|
|
|
|
|
one for when its condition is truthy and one for when it's falsey. However,
|
|
|
|
|
|
|
|
unlike the "false" case, a while loop will do nothing, leaving the environment
|
|
|
|
|
|
|
|
unchanged:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
|
|
|
|
\frac
|
|
|
|
|
|
|
|
{ \rho_1 , e \Downarrow v \quad v = 0 }
|
|
|
|
|
|
|
|
{ \rho_1 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_1 }
|
|
|
|
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The trickiest rule is for when the condition of a `while` loop is true.
|
|
|
|
|
|
|
|
We evaluate the body once, starting in environment \(\rho_1\) and finishing
|
|
|
|
|
|
|
|
in \(\rho_2\), but then we're not done. In fact, we have to go back to the top,
|
|
|
|
|
|
|
|
and check the condition again, starting over. As a result, we include another
|
|
|
|
|
|
|
|
premise, that tells us that evaluating the loop starting at \(\rho_2\), we
|
|
|
|
|
|
|
|
eventually end in state \(\rho_3\). This encodes the "rest of the iterations"
|
|
|
|
|
|
|
|
in addition to the one we just performed. The environment \(\rho_3\) is our
|
|
|
|
|
|
|
|
final state, so that's what we use in the rule's conclusion.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
|
|
|
|
\frac
|
|
|
|
|
|
|
|
{ \rho_1 , e \Downarrow v \quad v \neq 0 \quad \rho_1 , s \Rightarrow \rho_2 \quad \rho_2 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_3 }
|
|
|
|
|
|
|
|
{ \rho_1 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_3 }
|
|
|
|
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
And that's it! We have now seen every rule that defines the little object language
|
|
|
|
|
|
|
|
I've been using for my Agda work. Below is a Bergamot widget that implements
|
|
|
|
|
|
|
|
these rules. Try the following program, which computes the `x`th power of two,
|
|
|
|
|
|
|
|
and stores it in `y`:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
x = 5; y = 1; while (x) { y = y + y; x = x - 1 }
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
{{< bergamot_widget id="stmt-widget" query="" prompt="step(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?env)" modes="Statement:Statement" >}}
|
|
|
|
{{< bergamot_widget id="stmt-widget" query="" prompt="step(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?env)" modes="Statement:Statement" >}}
|
|
|
|
|
|
|
|
hidden section "" {
|
|
|
|
|
|
|
|
Eq @ eq(?x, ?x) <-;
|
|
|
|
|
|
|
|
}
|
|
|
|
hidden section "" {
|
|
|
|
hidden section "" {
|
|
|
|
EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
|
|
|
|
EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
|
|
|
|
EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
|
|
|
|
EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
|
|
|
@ -440,14 +503,47 @@ hidden section "" {
|
|
|
|
StepLiftBasic @ step(?rho_1, ?s, ?rho_2) <- stepbasic(?rho_1, ?s, ?rho_2);
|
|
|
|
StepLiftBasic @ step(?rho_1, ?s, ?rho_2) <- stepbasic(?rho_1, ?s, ?rho_2);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
section "" {
|
|
|
|
section "" {
|
|
|
|
StepIfTrue @ step(?rho_1, if(?e, ?s_1, ?s_2), ?rho_2) <- eval(?rho_1, ?e, ?v), not(numeq(?v, 0)), step(?rho_1, ?s_1, ?rho_2);
|
|
|
|
StepIfTrue @ step(?rho_1, if(?e, ?s_1, ?s_2), ?rho_2) <- eval(?rho_1, ?e, ?v), not(eq(?v, 0)), step(?rho_1, ?s_1, ?rho_2);
|
|
|
|
StepIfFalse @ step(?rho_1, if(?e, ?s_1, ?s_2), ?rho_2) <- eval(?rho_1, ?e, ?v), numeq(?v, 0), step(?rho_1, ?s_2, ?rho_2);
|
|
|
|
StepIfFalse @ step(?rho_1, if(?e, ?s_1, ?s_2), ?rho_2) <- eval(?rho_1, ?e, ?v), eq(?v, 0), step(?rho_1, ?s_2, ?rho_2);
|
|
|
|
StepWhileTrue @ step(?rho_1, while(?e, ?s), ?rho_3) <- eval(?rho_1, ?e, ?v), not(numeq(?v, 0)), step(?rho_1, ?s, ?rho_2), step(?rho_2, while(?e, ?s), ?rho_3);
|
|
|
|
StepWhileTrue @ step(?rho_1, while(?e, ?s), ?rho_3) <- eval(?rho_1, ?e, ?v), not(eq(?v, 0)), step(?rho_1, ?s, ?rho_2), step(?rho_2, while(?e, ?s), ?rho_3);
|
|
|
|
StepWhileFalse @ step(?rho_1, while(?e, ?s), ?rho_1) <- eval(?rho_1, ?e, ?v), numeq(?v, 0);
|
|
|
|
StepWhileFalse @ step(?rho_1, while(?e, ?s), ?rho_1) <- eval(?rho_1, ?e, ?v), eq(?v, 0);
|
|
|
|
StepSeq @ step(?rho_1, seq(?s_1, ?s_2), ?rho_3) <- step(?rho_1, ?s_1, ?rho_2), step(?rho_2, ?s_2, ?rho_3);
|
|
|
|
StepSeq @ step(?rho_1, seq(?s_1, ?s_2), ?rho_3) <- step(?rho_1, ?s_1, ?rho_2), step(?rho_2, ?s_2, ?rho_3);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
hidden section "" {
|
|
|
|
hidden section "" {
|
|
|
|
EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
|
|
|
|
EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
|
|
|
|
EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(symeq(?x, ?y));
|
|
|
|
EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(eq(?x, ?y));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
{{< /bergamot_widget >}}
|
|
|
|
{{< /bergamot_widget >}}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
As with all the other rules we've seen, the mathematical notation above can
|
|
|
|
|
|
|
|
be directly translated into Agda:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Semantics.agda" 47 64 >}}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Semantics as Ground Truth
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Prior to this post, we had been talking about using lattices and monotone
|
|
|
|
|
|
|
|
functions for program analysis. The key problem with using this framework to
|
|
|
|
|
|
|
|
define analyses is that there are many monotone functions that produce complete
|
|
|
|
|
|
|
|
nonsese; their output is, at best, unrelated to the program they're supposed
|
|
|
|
|
|
|
|
to analyze. We don't want to write such functions, since having incorrect
|
|
|
|
|
|
|
|
information about the programs in question is unhelpful.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
What does it mean for a function to produce correct information, though?
|
|
|
|
|
|
|
|
In the context of sign analysis, it would mean that if we say a variable `x` is `+`,
|
|
|
|
|
|
|
|
then evaluating the program will leave us in a state in which `x` is posive.
|
|
|
|
|
|
|
|
The semantics we defined in this post give us the "evaluating the program piece".
|
|
|
|
|
|
|
|
They establish what the programs _actually_ do, and we can use this ground
|
|
|
|
|
|
|
|
truth when checking that our analyses are correct. In subsequent posts, I will
|
|
|
|
|
|
|
|
prove the exact property I informally stated above: __for the program analyses
|
|
|
|
|
|
|
|
we define, things they "claim" about our program will match what actually happens
|
|
|
|
|
|
|
|
when executing the program using our semantics__.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A piece of the puzzle still remains: how are we going to use the monotone
|
|
|
|
|
|
|
|
functions we've been talking so much about? We need to figure out what to feed
|
|
|
|
|
|
|
|
to our analyses before we can prove their correctness.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
I have an answer to that question: we will be using _control flow graphs_ (CFGs).
|
|
|
|
|
|
|
|
These are another program representation, one that's more commonly found in
|
|
|
|
|
|
|
|
compilers. I will show what they look like in the next post. I hope to see you
|
|
|
|
|
|
|
|
there!
|
|
|
|