Compare commits

...

7 Commits

Author SHA1 Message Date
9fc2d16fb8 Adjust dates of part 4 and part 5 to match real publication dates
Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
2024-11-03 17:51:13 -08:00
f00c69f02c Edit and publish SPA part 5 2024-11-03 17:50:11 -08:00
4fc1191d13 Proofread and publish part 4
Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
2024-11-03 17:37:44 -08:00
951aafc90a Stop using 'symeq'
Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
2024-11-03 17:12:39 -08:00
ee13409b33 Add a visualization of two ASTs
Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
2024-11-03 17:06:19 -08:00
615aeb72da Finish most of part 5 of SPA series
Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
2024-11-03 16:58:06 -08:00
3be67ca4c8 Use unification-based 'eq' for numbers and symbols
Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
2024-11-03 16:05:04 -08:00
3 changed files with 125 additions and 27 deletions

View File

@ -7,10 +7,12 @@ SelectHead @ select(cons([?t, ?v], ?rest), ?default, ?v) <- ?t;
SelectTail @ select(cons([?t, ?v], ?rest), ?default, ?found) <- not(?t), select(?rest, ?default, ?found);
SelectEmpty @ select(nil, ?default, ?default) <-;
Eq @ eq(?x, ?x) <-;
ParenthAssocLeft @ parenthassoc(?a_i, left, right) <-;
ParenthAssocRight @ parenthassoc(?a_i, right, left) <-;
ParenthAssocNone @ parenthassoc(?a_i, none, ?pos) <-;
ParenthAssocNeq @ parenthassoc(?a_i, ?a_o, ?pos) <- not(symeq(?a_i, ?a_o));
ParenthAssocNeq @ parenthassoc(?a_i, ?a_o, ?pos) <- not(eq(?a_i, ?a_o));
Parenth @ parenth(?inner, ?outer, ?pos, ?strin, ?strout) <-
prec(?inner, ?p_i, ?a_i), prec(?outer, ?p_o, ?a_o),

View File

@ -2,14 +2,14 @@
title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 4: The Fixed-Point Algorithm"
series: "Static Program Analysis in Agda"
description: "In this post, I give a top-level overview of my work on formally verified static analyses"
date: 2024-08-10T17:37:42-07:00
date: 2024-11-03T17:50:26-08:00
tags: ["Agda", "Programming Languages"]
draft: true
---
In the preivous post we looked at lattices of finite height, which are a crucial
ingredient to our static analyses. In this post, I will describe the specific
algorithm that makes use of these lattices that will be at the core of it all.
algorithm that makes use of these lattices; this algorithm will be at the core
of this series.
Lattice-based static analyses tend to operate by iteratively combining facts
from the program into new ones. For instance, when analyzing `y = 1 + 2`, we
@ -79,7 +79,7 @@ arrange the ones we've seen so far into a chain:
Each time we fail to find a fixed point, we add one element to our chain, growing
it. But if our lattice \(L\) has a finite height, that means eventually this
process will have to stop; the chain can't grow forever; eventually, we will
process will have to stop; the chain can't grow forever. Eventually, we will
have to find a value such that \(v = f(v)\). Thus, our algorithm is guaranteed
to terminate, and give a fixed point.
@ -119,7 +119,7 @@ it provides just a bit more gas than the maximum-length chain, which means that
if the gas is exhausted, we've certainly arrived at a contradiction. It also
provides an initial chain onto which `doStep` will keep tacking on new inequalities
as it finds them. Since we haven't found any yet, this is the single-element
chain of \(\bot\). The last thing is does is set up the reursion invariant
chain of \(\bot\). The last thing is does is set up the recursion invariant
(that the sum of the gas and the chain length is constant), and provides
a proof that \(\bot \le f(\bot)\). This function always returns a fixed point.
@ -147,11 +147,11 @@ the lattice, we have \(\bot \le b\).
{{< latex >}}
\begin{array}{ccccccccr}
& & \bot & \le & & & b \quad \implies & \text{(monotonicity of}\ f \text{)}\\
& & f(\bot) & \le & f(b) & = & b \quad \implies & \text{(} b\ \text{is a fixed point, monotonicity of}\ f \text{)}\\
& & f^2(\bot) & \le & f(b) & = & b \quad \implies & \text{(} b\ \text{is a fixed point, monotonicity of}\ f \text{)}\\
& & \bot & \le & & & b & \quad \implies & \text{(monotonicity of}\ f \text{)}\\
& & f(\bot) & \le & f(b) & = & b & \quad \implies & \text{(} b\ \text{is a fixed point, monotonicity of}\ f \text{)}\\
& & f^2(\bot) & \le & f(b) & = & b & \quad \implies & \text{(} b\ \text{is a fixed point, monotonicity of}\ f \text{)}\\
\\
& & \vdots & & \vdots & & \vdots & \\
& & \vdots & & \vdots & & \vdots & & \\
\\
a & = & f^k(\bot) & \le & f(b) & = & b &
\end{array}
@ -159,7 +159,7 @@ a & = & f^k(\bot) & \le & f(b) & = & b &
Because of the monotonicity of \(f\), each time we apply it, it preserves the
less-than relationship that started with \(\bot \le b\). Doing that \(k\) times,
we verify that \(a\) is our fixed point.
we verify that \(a\) is our least fixed point.
To convince Agda of this proof, we once again get in an argument with the termination
checker, which ends the same way it did last time: with us using the notion of 'gas'

View File

@ -2,9 +2,8 @@
title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 5: Our Programming Language"
series: "Static Program Analysis in Agda"
description: "In this post, I define the language that well serve as the object of our vartious analyses"
date: 2024-08-10T17:37:43-07:00
date: 2024-11-03T17:50:27-08:00
tags: ["Agda", "Programming Languages"]
draft: true
custom_js: ["parser.js"]
bergamot:
render_presets:
@ -117,11 +116,31 @@ is called an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_synta
Notably, though `2-(x+y)` has parentheses, our grammar above does not include
include them as a case. The reason for this is that the structure of an
abstract syntax tree is sufficient to encode the order in which the operations
should be evaluated.
should be evaluated. Since I lack a nice way of drawing ASTs, I will use
an ASCII drawing to show an example.
{{< todo >}}
Probably two drawings of differently-associated ASTs here.
{{< /todo >}}
```
Expression: 2 - (x+y)
(-)
/ \
2 (+)
/ \
x y
Expression: (2-x) + y
(+)
/ \
(-) y
/ \
2 x
```
Above, in the first AST, `(+)` is a child of the `(-)` node, which means
that it's a sub-expression. As a result, that subexpression is evaluated first,
before evaluating `(-)`, and so, the AST expresents `2-(x+y)`. In the other
example, `(-)` is a child of `(+)`, and is therefore evaluated first. The resulting
association encoded by that AST is `(2-x)+y`.
To an Agda programmer, the one-of-four-things definition above should read
quite similarly to the definition of an algebraic data type. Indeed, this
@ -249,7 +268,7 @@ thus be written as follows:
{{< /latex >}}
Now, on to the actual rules for how to evaluate expressions. Most simply,
integer literals `1` just evaluate to themselves.
integer literals like `1` just evaluate to themselves.
{{< latex >}}
\frac{n \in \text{Int}}{\rho, n \Downarrow n}
@ -285,6 +304,9 @@ the exact rules that were used in computing a particular value. The variables
`x`, `y`, and `z` are pre-defined for your convenience.
{{< bergamot_widget id="expr-widget" query="" prompt="eval(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?v)" modes="Expression:Expression" >}}
hidden section "" {
Eq @ eq(?x, ?x) <-;
}
section "" {
EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
@ -295,7 +317,7 @@ section "" {
}
hidden section "" {
EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(symeq(?x, ?y));
EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(eq(?x, ?y));
}
{{< /bergamot_widget >}}
@ -313,7 +335,7 @@ I showed above.
#### Simple Statements
The main difference between formalizing (simple and "normal") statements is
that they modify the environment. If `x` has one value, writing `x = x + 1` will
certainly change that value. On the other hands, statements don't produce values.
certainly change that value. On the other hand, statements don't produce values.
So, we will be writing claims like \(\rho_1 , \textit{bs} \Rightarrow \rho_2\)
to say that the basic statement \(\textit{bs}\), when starting in environment
\(\rho_1\), will produce environment \(\rho_2\). Here's an example:
@ -349,6 +371,9 @@ a Bergamot instance, this time with simple statements. Try out `noop` or some
sort of variable assignment, like `x = x + 1`.
{{< bergamot_widget id="basic-stmt-widget" query="" prompt="stepbasic(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?env)" modes="Basic Statement:Basic Statement" >}}
hidden section "" {
Eq @ eq(?x, ?x) <-;
}
hidden section "" {
EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
@ -363,7 +388,7 @@ section "" {
}
hidden section "" {
EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(symeq(?x, ?y));
EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(eq(?x, ?y));
}
{{< /bergamot_widget >}}
@ -412,14 +437,52 @@ to evaluate the _expression_ that serves as the condition. We then had an
additional premise that requires the truthiness of the resulting value \(v\).
The rule for evaluating a conditional with a "false" branch is very similar.
{{< latex >}}
\frac
{ \rho_1 , e \Downarrow v \quad v = 0 \quad \rho_1, s_2 \Rightarrow \rho_2}
{ \rho_1, \textbf{if}\ e\ \{ s_1 \}\ \textbf{else}\ \{ s_2 \}\ \Rightarrow \rho_2 }
{{< /latex >}}
Now that we have rules for conditional statements, it will be surprisingly easy
to define the rules for `while` loops. A `while` loop will also have two rules,
one for when its condition is truthy and one for when it's falsey. However,
unlike the "false" case, a while loop will do nothing, leaving the environment
unchanged:
{{< latex >}}
\frac
{ \rho_1 , e \Downarrow v \quad v = 0 }
{ \rho_1 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_1 }
{{< /latex >}}
The trickiest rule is for when the condition of a `while` loop is true.
We evaluate the body once, starting in environment \(\rho_1\) and finishing
in \(\rho_2\), but then we're not done. In fact, we have to go back to the top,
and check the condition again, starting over. As a result, we include another
premise, that tells us that evaluating the loop starting at \(\rho_2\), we
eventually end in state \(\rho_3\). This encodes the "rest of the iterations"
in addition to the one we just performed. The environment \(\rho_3\) is our
final state, so that's what we use in the rule's conclusion.
{{< latex >}}
\frac
{ \rho_1 , e \Downarrow v \quad v \neq 0 \quad \rho_1 , s \Rightarrow \rho_2 \quad \rho_2 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_3 }
{ \rho_1 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_3 }
{{< /latex >}}
And that's it! We have now seen every rule that defines the little object language
I've been using for my Agda work. Below is a Bergamot widget that implements
these rules. Try the following program, which computes the `x`th power of two,
and stores it in `y`:
```
x = 5; y = 1; while (x) { y = y + y; x = x - 1 }
```
{{< bergamot_widget id="stmt-widget" query="" prompt="step(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?env)" modes="Statement:Statement" >}}
hidden section "" {
Eq @ eq(?x, ?x) <-;
}
hidden section "" {
EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
@ -440,14 +503,47 @@ hidden section "" {
StepLiftBasic @ step(?rho_1, ?s, ?rho_2) <- stepbasic(?rho_1, ?s, ?rho_2);
}
section "" {
StepIfTrue @ step(?rho_1, if(?e, ?s_1, ?s_2), ?rho_2) <- eval(?rho_1, ?e, ?v), not(numeq(?v, 0)), step(?rho_1, ?s_1, ?rho_2);
StepIfFalse @ step(?rho_1, if(?e, ?s_1, ?s_2), ?rho_2) <- eval(?rho_1, ?e, ?v), numeq(?v, 0), step(?rho_1, ?s_2, ?rho_2);
StepWhileTrue @ step(?rho_1, while(?e, ?s), ?rho_3) <- eval(?rho_1, ?e, ?v), not(numeq(?v, 0)), step(?rho_1, ?s, ?rho_2), step(?rho_2, while(?e, ?s), ?rho_3);
StepWhileFalse @ step(?rho_1, while(?e, ?s), ?rho_1) <- eval(?rho_1, ?e, ?v), numeq(?v, 0);
StepIfTrue @ step(?rho_1, if(?e, ?s_1, ?s_2), ?rho_2) <- eval(?rho_1, ?e, ?v), not(eq(?v, 0)), step(?rho_1, ?s_1, ?rho_2);
StepIfFalse @ step(?rho_1, if(?e, ?s_1, ?s_2), ?rho_2) <- eval(?rho_1, ?e, ?v), eq(?v, 0), step(?rho_1, ?s_2, ?rho_2);
StepWhileTrue @ step(?rho_1, while(?e, ?s), ?rho_3) <- eval(?rho_1, ?e, ?v), not(eq(?v, 0)), step(?rho_1, ?s, ?rho_2), step(?rho_2, while(?e, ?s), ?rho_3);
StepWhileFalse @ step(?rho_1, while(?e, ?s), ?rho_1) <- eval(?rho_1, ?e, ?v), eq(?v, 0);
StepSeq @ step(?rho_1, seq(?s_1, ?s_2), ?rho_3) <- step(?rho_1, ?s_1, ?rho_2), step(?rho_2, ?s_2, ?rho_3);
}
hidden section "" {
EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(symeq(?x, ?y));
EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(eq(?x, ?y));
}
{{< /bergamot_widget >}}
As with all the other rules we've seen, the mathematical notation above can
be directly translated into Agda:
{{< codelines "Agda" "agda-spa/Language/Semantics.agda" 47 64 >}}
### Semantics as Ground Truth
Prior to this post, we had been talking about using lattices and monotone
functions for program analysis. The key problem with using this framework to
define analyses is that there are many monotone functions that produce complete
nonsese; their output is, at best, unrelated to the program they're supposed
to analyze. We don't want to write such functions, since having incorrect
information about the programs in question is unhelpful.
What does it mean for a function to produce correct information, though?
In the context of sign analysis, it would mean that if we say a variable `x` is `+`,
then evaluating the program will leave us in a state in which `x` is posive.
The semantics we defined in this post give us the "evaluating the program piece".
They establish what the programs _actually_ do, and we can use this ground
truth when checking that our analyses are correct. In subsequent posts, I will
prove the exact property I informally stated above: __for the program analyses
we define, things they "claim" about our program will match what actually happens
when executing the program using our semantics__.
A piece of the puzzle still remains: how are we going to use the monotone
functions we've been talking so much about? We need to figure out what to feed
to our analyses before we can prove their correctness.
I have an answer to that question: we will be using _control flow graphs_ (CFGs).
These are another program representation, one that's more commonly found in
compilers. I will show what they look like in the next post. I hope to see you
there!