blog-static/content/blog/05_spa_agda_semantics/index.md

---
title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 5: Our Programming Language"
series: "Static Program Analysis in Agda"
description: "In this post, I define the language that well serve as the object of our vartious analyses"
date: 2024-11-03T17:50:27-08:00
tags: ["Agda", "Programming Languages"]
custom_js: ["parser.js"]
bergamot:
    render_presets:
        default: "bergamot/rendering/imp.bergamot"
    input_modes:
        - name: "Expression"
          fn: "window.parseExpr"
        - name: "Basic Statement"
          fn: "window.parseBasicStmt"
        - name: "Statement"
          fn: "window.parseStmt"
---

In the previous several posts, I've formalized the notion of lattices, which
are an essential ingredient to formalizing the analyses in Anders Møller's
lecture notes. However, there can be no program analysis without a program
to analyze! In this post, I will define the (very simple) language that we
will be analyzing. An essential aspect of the language is its
[semantics](https://en.wikipedia.org/wiki/Semantics_(computer_science)), which
simply speaking explains what each feature of the language does. At the end
of the previous article, I gave the following _inference rule_ which defined
(partially) how the `if`-`else` statement in the language works.

{{< latex >}}
\frac{\rho_1, e \Downarrow z \quad \neg (z = 0) \quad \rho_1,s_1 \Downarrow \rho_2}
     {\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2}
{{< /latex >}}

Like I mentioned then, this rule reads as follows:

> If the condition of an `if`-`else` statement evaluates to a nonzero value,
> then to evaluate the statement, you evaluate its `then` branch.

Another similar --- but crucially, not equivlalent -- rule is the following:

{{< latex >}}
\frac{\rho_1, e \Downarrow z \quad z = 1 \quad \rho_1,s_1 \Downarrow \rho_2}
     {\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2}
{{< /latex >}}

This time, the English interpretation of the rule is as follows:

> If the condition of an `if`-`else` statement evaluates to one,
> then to evaluate the statement, you evaluate its `then` branch.

These rules are certainly not equivalent. For instance, the former allows
the "then" branch to be executed when the condition is `2`; however, in
the latter, the value of the conditional must be `1`. If our analysis were
intelligent (our first few will not be), then this difference would change
its output when determining the signs of the following program:

```
x = 2
if x {
  y = - 1
} else {
  y = 1
}
```

Using the first, more "relaxed" rule, the condition would be considered "true",
and the sign of `y` would be `-`. On the other hand, using the second,
"stricter" rule, the sign of `y` would be `+`. I stress that in this case,
I am showing a flow-sensitive analysis (one that can understand control flow
and make more specific predictions); for our simplest analyses, we will not
be aiming for flow-sensitivity. There is plenty of work to do even then.

The point of showing these two distinct rules is that we need to be very precise
about how the language will behave, because our analyses depend on that behavior.

Let's not get ahead of ourselves, though. I've motivated the need for semantics,
but there is much groundwork to be laid before we delve into the precise rules
of our language. After all, to define the language's semantics, we need to
have a language.

### The Syntax of Our Simple Language

I've shown a couple of examples our our language now, and there won't be that
much more to it. We can start with _expressions_: things that evaluate to
something. Some examples of expressions are `1`, `x`, and `2-(x+y)`. For our
specific language, the precise set of possible expressions can be given
by the following [Context-Free Grammar](https://en.wikipedia.org/wiki/Context-free_grammar):

{{< latex >}}
\begin{array}{rcll}
    e & ::= & x & \text{(variables)} \\
      &  |  & z & \text{(integer literals)} \\
      &  |  & e + e & \text{(addition)} \\
      &  |  & e - e & \text{(subtraction)}
\end{array}
{{< /latex >}}

The above can be read as follows:

> An expression \(e\) is one of the following things:
> 1. Some variable \(x\) [importantly \(x\) is a placeholder for _any_ variable,
>    which could be `x` or `y` in our program code; specifically, \(x\) is
>    a [_metavariable_](https://en.wikipedia.org/wiki/Metavariable).]
> 2. Some integer \(z\) [once again, \(z\) can be any integer, like 1, -42, etc.].
> 3. The addition of two other expressions [which could themselves be additions etc.].
> 4. The subtraction of two other expressions [which could also themselves be additions, subtractions, etc.].

Since expressions can be nested within other expressions --- which is necessary
to allow complicated code like `2-(x+y)` above --- they form a tree. Each node
is one of the elements of the grammar above (variable, addition, etc.). If
a node contains sub-expressions (like addition and subtraction do), then
these sub-expressions form sub-trees of the given node. This data structure
is called an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).

Notably, though `2-(x+y)` has parentheses, our grammar above does not include
include them as a case. The reason for this is that the structure of an
abstract syntax tree is sufficient to encode the order in which the operations
should be evaluated. Since I lack a nice way of drawing ASTs, I will use
an ASCII drawing to show an example.

```
Expression: 2 - (x+y)
    (-)
   /   \
  2    (+)
      /   \
     x     y


Expression: (2-x) + y
       (+)
      /   \
    (-)    y
   /   \
  2     x
```

Above, in the first AST, `(+)` is a child of the `(-)` node, which means
that it's a sub-expression. As a result, that subexpression is evaluated first,
before evaluating `(-)`, and so, the AST expresents `2-(x+y)`. In the other
example, `(-)` is a child of `(+)`, and is therefore evaluated first. The resulting
association encoded by that AST is `(2-x)+y`.

To an Agda programmer, the one-of-four-things definition above should read
quite similarly to the definition of an algebraic data type. Indeed, this
is how we can encode the abstract syntax tree of expressions:

{{< codelines "Agda" "agda-spa/Language/Base.agda" 12 16 >}}

The only departure from the grammar above is that I had to invent constructors
for the variable and integer cases, since Agda doesn't support implicit coercions.
This adds a little bit of extra overhead, requiring, for example, that we write
numbers as `# 42` instead of `42`.

Having defined expressions, the next thing on the menu is _statements_. Unlike
expressions, which just produce values, statements "do something"; an example
of a statement might be the following Python line:

```Python
print("Hello, world!")
```

The `print` function doesn't produce any value, but it does perform an action;
it prints its argument to the console!

For the formalization, it turns out to be convenient to separate "simple"
statements from "complex" ones. Pragmatically speaking, the difference is that
between the "simple" and the "complex" is control flow; simple statements
will be guaranteed to always execute without any decisions or jumps.
The reason for this will become clearer in subsequent posts; I will foreshadow
a bit by saying that consecutive simple statements can be placed into a single
[basic block](https://en.wikipedia.org/wiki/Basic_block).

The following is a group of three simple statements:

```
x = 1
y = x + 2
noop
```

These will always be executed in the same order, exactly once. Here, `noop`
is a convenient type of statement that simply does nothing.

On the other hand, the following statement is not simple:

```
while x {
  x = x - 1
}
```

It's not simple because it makes decisions about how the code should be executed;
if `x` is nonzero, it will try executing the statement in the body of the loop
(`x = x - 1`). Otherwise, it would skip evaluating that statement, and carry on
with subsequent code.

I first define simple statements using the `BasicStmt` type:

{{< codelines "Agda" "agda-spa/Language/Base.agda" 18 20 >}}

Complex statements are just called `Stmt`; they include loops, conditionals and
sequences ---
{{< sidenote "right" "then-note" "\(s_1\ \text{then}\ s_2\)" >}}
The standard notation for sequencing in imperative languages is
\(s_1; s_2\). However, Agda gives special meaning to the semicolon,
and I couldn't find any passable symbolic alternatives.
{{< /sidenote >}} is a sequence where \(s_2\) is evaluated after \(s_1\).
Complex statements subsume simple statements, which I model using the constructor
`⟨_⟩`.

{{< codelines "Agda" "agda-spa/Language/Base.agda" 25 29 >}}

For an example of using this encoding, take the following simple program:

```
var = 1
if var {
  x = 1
}
```

The Agda version is:

{{< codelines "Agda" "agda-spa/Main.agda" 27 34 >}}

Notice how we used `noop` to express the fact that the `else` branch of the
conditional does nothing.

### The Semantics of Our Language

We now have all the language constructs that I'll be showing off --- because
those are all the concepts that I've formalized. What's left is to define
how they behave. We will do this using a logical tool called
[_inference rules_](https://en.wikipedia.org/wiki/Rule_of_inference). I've
written about them a number of times; they're ubiquitous, particularly in the
sorts of things I like explore on this site. The [section on inference rules]({{< relref "01_aoc_coq#inference-rules" >}})
from my Advent of Code series is pretty relevant, and [the notation section from
a post in my compiler series]({{< relref "03_compiler_typechecking#some-notation" >}}) says
much the same thing; I won't be re-describing them here.

There are three pieces which demand semantics: expressions, simple statements,
and non-simple statements. The semantics of each of the three requires the
semantics of the items that precede it. We will therefore start with expressions.

#### Expressions

The trickiest thing about expression is that the value of an expression depends
on the "context": `x+1` can evaluate to `43` if `x` is `42`, or it can evaluate
to `0` if `x` is `-1`. To evaluate an expression, we will therefore need to
assign values to all of the variables in that expression. A mapping that
assigns values to variables is typically called an _environment_. We will write
\(\varnothing\) for "empty environment", and \(\{\texttt{x} \mapsto 42, \texttt{y} \mapsto -1 \}\) for
an environment that maps the variable \(\texttt{x}\) to 42, and the variable \(\texttt{y}\) to -1.

Now, a bit of notation. We will use the letter \(\rho\) to represent environments
(and if several environments are involved, we will occasionally number them
as \(\rho_1\), \(\rho_2\), etc.) We will use the letter \(e\) to stand for
expressions, and the letter \(v\) to stand for values. Finally, we'll write
\(\rho, e \Downarrow v\) to say that "in an environment \(\rho\), expression \(e\)
evaluates to value \(v\)". Our two previous examples of evaluating `x+1` can
thus be written as follows:

{{< latex >}}
\{ \texttt{x} \mapsto 42 \}, \texttt{x}+1 \Downarrow 43 \\
\{ \texttt{x} \mapsto -1 \}, \texttt{x}+1 \Downarrow 0 \\
{{< /latex >}}

Now, on to the actual rules for how to evaluate expressions. Most simply,
integer literals like `1` just evaluate to themselves.

{{< latex >}}
\frac{n \in \text{Int}}{\rho, n \Downarrow n}
{{< /latex >}}

Note that the letter \(\rho\) is completely unused in the above rule. That's
because no matter what values _variables_ have, a number still evaluates to
the same value. As we've already established, the same is not true for a
variable like \(\texttt{x}\). To evaluate such a variable, we need to retrieve
the value it's mapped to in the current environment, which we will write as
\(\rho(\texttt{x})\). This gives the following inference rule:

{{< latex >}}
\frac{\rho(x) = v}{\rho, x \Downarrow v}
{{< /latex >}}

All that's left is to define addition and subtraction. For an expression in the
form \(e_1+e_2\), we first need to evaluate the two subexpressions \(e_1\)
and \(e_2\), and then add the two resulting numbers. As a result, the addition
rule includes two additional premises, one for evaluating each summand.

{{< latex >}}
\frac
    {\rho, e_1 \Downarrow v_1 \quad \rho, e_2 \Downarrow v_2 \quad v_1 + v_2 = v}
    {\rho, e_1+e_2 \Downarrow v}
{{< /latex >}}

The subtraction rule is similar. Below, I've configured an instance of
[Bergamot]({{< relref "bergamot" >}}) to interpret these exact rules. Try
typing various expressions like `1`, `1+1`, etc. into the input box below
to see them evaluate. If you click the "Full Proof Tree" button, you can also view
the exact rules that were used in computing a particular value. The variables
`x`, `y`, and `z` are pre-defined for your convenience.

{{< bergamot_widget id="expr-widget" query="" prompt="eval(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?v)" modes="Expression:Expression" >}}
hidden section "" {
  Eq @ eq(?x, ?x) <-;
}
section "" {
  EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
  EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
}
section "" {
  EvalPlus @ eval(?rho, plus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), add(?v_1, ?v_2, ?v);
  EvalMinus @ eval(?rho, minus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), subtract(?v_1, ?v_2, ?v);
}
hidden section "" {
  EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
  EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(eq(?x, ?y));
}
{{< /bergamot_widget >}}

The Agda equivalent of this looks very similar to the rules themselves. I use
`⇒ᵉ` instead of \(\Downarrow\), and there's a little bit of tedium with
wrapping integers into a new `Value` type. I also used a (partial) relation
`(x, v) ∈ ρ` instead of explicitly defining accessing an environment, since
it is conceivable for a user to attempt accessing a variable that has not
been assigned to. Aside from these notational changes, the structure of each
of the constructors of the evaluation data type matches the inference rules
I showed above.

{{< codelines "Agda" "agda-spa/Language/Semantics.agda" 27 35 >}}

#### Simple Statements
The main difference between formalizing (simple and "normal") statements is
that they modify the environment. If `x` has one value, writing `x = x + 1` will
certainly change that value. On the other hand, statements don't produce values.
So, we will be writing claims like \(\rho_1 , \textit{bs} \Rightarrow \rho_2\)
to say that the basic statement \(\textit{bs}\), when starting in environment
\(\rho_1\), will produce environment \(\rho_2\). Here's an example:

{{< latex >}}
\{  \texttt{x} \mapsto 42, \texttt{y} \mapsto 17 \}, \  \texttt{x = x - \text{1}} \Rightarrow \{  \texttt{x} \mapsto 41, \texttt{y} \mapsto 17 \}
{{< /latex >}}

Here, we subtracted one from a variable with value `42`, leaving it with a new
value of `41`.

There are two basic statements, and one of them quite literally does nothing.
The inference rule for `noop` is very simple:

{{< latex >}}
\rho,\ \texttt{noop} \Rightarrow \rho
{{< /latex >}}

For the assignment rule, we need to know how to evaluate the expression on the
right side of the equal sign. This is why we needed to define the semantics
of expressions first. Given those, the evaluation rule for assignment is as
follows, with \(\rho[x \mapsto v]\) meaning "the environment \(\rho\) but
mapping the variable \(x\) to value \(v\)".

{{< latex >}}
\frac
  {\rho, e \Downarrow v}
  {\rho, x = e \Rightarrow \rho[x \mapsto v]}
{{< /latex >}}

Those are actually all the rules we need, and below, I am once again configuring
a Bergamot instance, this time with simple statements. Try out `noop` or some
sort of variable assignment, like `x = x + 1`.

{{< bergamot_widget id="basic-stmt-widget" query="" prompt="stepbasic(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?env)" modes="Basic Statement:Basic Statement" >}}
hidden section "" {
  Eq @ eq(?x, ?x) <-;
}
hidden section "" {
  EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
  EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
}
hidden section "" {
  EvalPlus @ eval(?rho, plus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), add(?v_1, ?v_2, ?v);
  EvalMinus @ eval(?rho, minus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), subtract(?v_1, ?v_2, ?v);
}
section "" {
  StepNoop @ stepbasic(?rho, noop, ?rho) <-;
  StepAssign @ stepbasic(?rho, assign(?x, ?e), extend(?rho, ?x, ?v)) <- eval(?rho, ?e, ?v);
}
hidden section "" {
  EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
  EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(eq(?x, ?y));
}
{{< /bergamot_widget >}}

The Agda implementation is once again just a data type with constructors-for-rules.
This time they also look quite similar to the rules I've shown up until now,
though I continue to explicitly quantify over variables like `ρ`.

{{< codelines "Agda" "agda-spa/Language/Semantics.agda" 37 40 >}}

#### Statements

Let's work on non-simple statements next. The easiest rule to define is probably
sequencing. When we use `then` (or `;`) to combine two statements, what we
actually want is to execute the first statement, which may change variables,
and then execute the second statement while keeping the changes from the first.
This means there are three environments: \(\rho_1\) for the initial state before
either statement is executed, \(\rho_2\) for the state between executing the
first and second statement, and \(\rho_3\) for the final state after both
are done executing. This leads to the following rule:

{{< latex >}}
\frac
    { \rho_1, s_1 \Rightarrow \rho_2 \quad \rho_2, s_2 \Rightarrow \rho_3 }
    { \rho_1, s_1; s_2 \Rightarrow \rho_3 }
{{< /latex >}}

We will actually need two rules to evaluate the conditional statement: one
for when the condition evaluates to "true", and one for when the condition
evaluates to "false". Only, I never specified booleans as being part of
the language, which means that we will need to come up what "false" and "true"
are. I will take my cue from C++ and use zero as "false", and any other number
as "true".

If the condition of an `if`-`else` statement is "true" (nonzero), then the
effect of executing the `if`-`else` should be the same as executing the "then"
part of the statement, while completely ignoring the "else" part.

{{< latex >}}
\frac
    { \rho_1 , e \Downarrow v \quad v \neq 0 \quad \rho_1, s_1 \Rightarrow \rho_2}
    { \rho_1, \textbf{if}\ e\ \{ s_1 \}\ \textbf{else}\ \{ s_2 \}\ \Rightarrow \rho_2 }
{{< /latex >}}

Notice that in the above rule, we used the evaluation judgement \(\rho_1, e \Downarrow v\)
to evaluate the _expression_ that serves as the condition. We then had an
additional premise that requires the truthiness of the resulting value \(v\).
The rule for evaluating a conditional with a "false" branch is very similar.

{{< latex >}}
\frac
    { \rho_1 , e \Downarrow v \quad v = 0 \quad \rho_1, s_2 \Rightarrow \rho_2}
    { \rho_1, \textbf{if}\ e\ \{ s_1 \}\ \textbf{else}\ \{ s_2 \}\ \Rightarrow \rho_2 }
{{< /latex >}}

Now that we have rules for conditional statements, it will be surprisingly easy
to define the rules for `while` loops. A `while` loop will also have two rules,
one for when its condition is truthy and one for when it's falsey. However,
unlike the "false" case, a while loop will do nothing, leaving the environment
unchanged:

{{< latex >}}
\frac
    { \rho_1 , e \Downarrow v \quad v = 0 }
    { \rho_1 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_1 }
{{< /latex >}}

The trickiest rule is for when the condition of a `while` loop is true.
We evaluate the body once, starting in environment \(\rho_1\) and finishing
in \(\rho_2\), but then we're not done. In fact, we have to go back to the top,
and check the condition again, starting over. As a result, we include another
premise, that tells us that evaluating the loop starting at \(\rho_2\), we
eventually end in state \(\rho_3\). This encodes the "rest of the iterations"
in addition to the one we just performed. The environment \(\rho_3\) is our
final state, so that's what we use in the rule's conclusion.

{{< latex >}}
\frac
    { \rho_1 , e \Downarrow v \quad v \neq 0 \quad \rho_1 , s \Rightarrow \rho_2 \quad \rho_2 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_3 }
    { \rho_1 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_3 }
{{< /latex >}}

And that's it! We have now seen every rule that defines the little object language
I've been using for my Agda work. Below is a Bergamot widget that implements
these rules. Try the following program, which computes the `x`th power of two,
and stores it in `y`:

```
x = 5; y = 1; while (x) { y = y + y; x = x - 1 }
```

{{< bergamot_widget id="stmt-widget" query="" prompt="step(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?env)" modes="Statement:Statement" >}}
hidden section "" {
  Eq @ eq(?x, ?x) <-;
}
hidden section "" {
  EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
  EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
}
hidden section "" {
  EvalPlus @ eval(?rho, plus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), add(?v_1, ?v_2, ?v);
  EvalMinus @ eval(?rho, minus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), subtract(?v_1, ?v_2, ?v);
}
hidden section "" {
  StepNoop @ stepbasic(?rho, noop, ?rho) <-;
  StepAssign @ stepbasic(?rho, assign(?x, ?e), extend(?rho, ?x, ?v)) <- eval(?rho, ?e, ?v);
}
hidden section "" {
  StepNoop @ stepbasic(?rho, noop, ?rho) <-;
  StepAssign @ stepbasic(?rho, assign(?x, ?e), extend(?rho, ?x, ?v)) <- eval(?rho, ?e, ?v);
}
hidden section "" {
  StepLiftBasic @ step(?rho_1, ?s, ?rho_2) <- stepbasic(?rho_1, ?s, ?rho_2);
}
section "" {
  StepIfTrue @ step(?rho_1, if(?e, ?s_1, ?s_2), ?rho_2) <- eval(?rho_1, ?e, ?v), not(eq(?v, 0)), step(?rho_1, ?s_1, ?rho_2);
  StepIfFalse @ step(?rho_1, if(?e, ?s_1, ?s_2), ?rho_2) <- eval(?rho_1, ?e, ?v), eq(?v, 0), step(?rho_1, ?s_2, ?rho_2);
  StepWhileTrue @ step(?rho_1, while(?e, ?s), ?rho_3) <- eval(?rho_1, ?e, ?v), not(eq(?v, 0)), step(?rho_1, ?s, ?rho_2), step(?rho_2, while(?e, ?s), ?rho_3);
  StepWhileFalse @ step(?rho_1, while(?e, ?s), ?rho_1) <- eval(?rho_1, ?e, ?v), eq(?v, 0);
  StepSeq @ step(?rho_1, seq(?s_1, ?s_2), ?rho_3) <- step(?rho_1, ?s_1, ?rho_2), step(?rho_2, ?s_2, ?rho_3);
}
hidden section "" {
  EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
  EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(eq(?x, ?y));
}
{{< /bergamot_widget >}}

As with all the other rules we've seen, the mathematical notation above can
be directly translated into Agda:

{{< codelines "Agda" "agda-spa/Language/Semantics.agda" 47 64 >}}

### Semantics as Ground Truth

Prior to this post, we had been talking about using lattices and monotone
functions for program analysis. The key problem with using this framework to
define analyses is that there are many monotone functions that produce complete
nonsese; their output is, at best, unrelated to the program they're supposed
to analyze. We don't want to write such functions, since having incorrect
information about the programs in question is unhelpful.

What does it mean for a function to produce correct information, though?
In the context of sign analysis, it would mean that if we say a variable `x` is `+`,
then evaluating the program will leave us in a state in which `x` is posive.
The semantics we defined in this post give us the "evaluating the program piece".
They establish what the programs _actually_ do, and we can use this ground
truth when checking that our analyses are correct. In subsequent posts, I will
prove the exact property I informally stated above: __for the program analyses
we define, things they "claim" about our program will match what actually happens
when executing the program using our semantics__.

A piece of the puzzle still remains: how are we going to use the monotone
functions we've been talking so much about? We need to figure out what to feed
to our analyses before we can prove their correctness.

I have an answer to that question: we will be using _control flow graphs_ (CFGs).
These are another program representation, one that's more commonly found in
compilers. I will show what they look like in the next post. I hope to see you
there!
-												Start work on the draft about semantics

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-18 18:05:39 -07:00
+								---
 								title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 5: Our Programming Language"
 								series: "Static Program Analysis in Agda"
 								description: "In this post, I define the language that well serve as the object of our vartious analyses"
-												Adjust dates of part 4 and part 5 to match real publication dates

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-11-03 17:51:13 -08:00
+								date: 2024-11-03T17:50:27-08:00
-												Start work on the draft about semantics

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-18 18:05:39 -07:00
+								tags: ["Agda", "Programming Languages"]
-												Write the semantics section using Bergamot

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-10-13 13:31:47 -07:00
+								custom_js: ["parser.js"]
 								bergamot:
 								    render_presets:
 								        default: "bergamot/rendering/imp.bergamot"
 								    input_modes:
 								        - name: "Expression"
 								          fn: "window.parseExpr"
 								        - name: "Basic Statement"
 								          fn: "window.parseBasicStmt"
 								        - name: "Statement"
 								          fn: "window.parseStmt"
-												Start work on the draft about semantics

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-18 18:05:39 -07:00
+								---
 								In the previous several posts, I've formalized the notion of lattices, which
 								are an essential ingredient to formalizing the analyses in Anders Møller's
-												Make minor edits to part 5 of SPA

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-31 11:37:28 -07:00
+								lecture notes. However, there can be no program analysis without a program
-												Start work on the draft about semantics

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-18 18:05:39 -07:00
+								to analyze! In this post, I will define the (very simple) language that we
 								will be analyzing. An essential aspect of the language is its
-												Fix missing paren

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-09-01 15:19:47 -07:00
+								[semantics](https://en.wikipedia.org/wiki/Semantics_(computer_science)), which
-												Start work on the draft about semantics

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-18 18:05:39 -07:00
+								simply speaking explains what each feature of the language does. At the end
 								of the previous article, I gave the following _inference rule_ which defined
 								(partially) how the `if`-`else` statement in the language works.
 								{{< latex >}}
 								\frac{\rho_1, e \Downarrow z \quad \neg (z = 0) \quad \rho_1,s_1 \Downarrow \rho_2}
 								     {\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2}
 								{{< /latex >}}
 								Like I mentioned then, this rule reads as follows:
 								> If the condition of an `if`-`else` statement evaluates to a nonzero value,
 								> then to evaluate the statement, you evaluate its `then` branch.
 								Another similar --- but crucially, not equivlalent -- rule is the following:
 								{{< latex >}}
 								\frac{\rho_1, e \Downarrow z \quad z = 1 \quad \rho_1,s_1 \Downarrow \rho_2}
 								     {\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2}
 								{{< /latex >}}
 								This time, the English interpretation of the rule is as follows:
 								> If the condition of an `if`-`else` statement evaluates to one,
 								> then to evaluate the statement, you evaluate its `then` branch.
 								These rules are certainly not equivalent. For instance, the former allows
-												Make minor edits to part 5 of SPA

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-31 11:37:28 -07:00
+								the "then" branch to be executed when the condition is `2`; however, in
-												Start work on the draft about semantics

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-18 18:05:39 -07:00
+								the latter, the value of the conditional must be `1`. If our analysis were
-												Make minor edits to part 5 of SPA

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-31 11:37:28 -07:00
+								intelligent (our first few will not be), then this difference would change
-												Start work on the draft about semantics

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-18 18:05:39 -07:00
+								its output when determining the signs of the following program:
 								```
 								x = 2
 								if x {
 								  y = - 1
 								} else {
 								  y = 1
 								}
 								```
 								Using the first, more "relaxed" rule, the condition would be considered "true",
 								and the sign of `y` would be `-`. On the other hand, using the second,
 								"stricter" rule, the sign of `y` would be `+`. I stress that in this case,
 								I am showing a flow-sensitive analysis (one that can understand control flow
-												Make minor edits to part 5 of SPA

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-31 11:37:28 -07:00
+								and make more specific predictions); for our simplest analyses, we will not
-												Start work on the draft about semantics

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-18 18:05:39 -07:00
+								be aiming for flow-sensitivity. There is plenty of work to do even then.
 								The point of showing these two distinct rules is that we need to be very precise
 								about how the language will behave, because our analyses depend on that behavior.
 								Let's not get ahead of ourselves, though. I've motivated the need for semantics,
 								but there is much groundwork to be laid before we delve into the precise rules
 								of our language. After all, to define the language's semantics, we need to
 								have a language.
 								### The Syntax of Our Simple Language
 								I've shown a couple of examples our our language now, and there won't be that
 								much more to it. We can start with _expressions_: things that evaluate to
 								something. Some examples of expressions are `1`, `x`, and `2-(x+y)`. For our
 								specific language, the precise set of possible expressions can be given
 								by the following [Context-Free Grammar](https://en.wikipedia.org/wiki/Context-free_grammar):
 								{{< latex >}}
 								\begin{array}{rcll}
 								    e & ::= & x & \text{(variables)} \\
 								      &  |  & z & \text{(integer literals)} \\
 								      &  |  & e + e & \text{(addition)} \\
 								      &  |  & e - e & \text{(subtraction)}
 								\end{array}
 								{{< /latex >}}
 								The above can be read as follows:
 								> An expression \(e\) is one of the following things:
 								> 1. Some variable \(x\) [importantly \(x\) is a placeholder for _any_ variable,
 								>    which could be `x` or `y` in our program code; specifically, \(x\) is
 								>    a [_metavariable_](https://en.wikipedia.org/wiki/Metavariable).]
 								> 2. Some integer \(z\) [once again, \(z\) can be any integer, like 1, -42, etc.].
 								> 3. The addition of two other expressions [which could themselves be additions etc.].
 								> 4. The subtraction of two other expressions [which could also themselves be additions, subtractions, etc.].
 								Since expressions can be nested within other expressions --- which is necessary
 								to allow complicated code like `2-(x+y)` above --- they form a tree. Each node
 								is one of the elements of the grammar above (variable, addition, etc.). If
 								a node contains sub-expressions (like addition and subtraction do), then
 								these sub-expressions form sub-trees of the given node. This data structure
 								is called an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
 								Notably, though `2-(x+y)` has parentheses, our grammar above does not include
 								include them as a case. The reason for this is that the structure of an
 								abstract syntax tree is sufficient to encode the order in which the operations
-												Add a visualization of two ASTs

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-11-03 17:06:19 -08:00
+								should be evaluated. Since I lack a nice way of drawing ASTs, I will use
 								an ASCII drawing to show an example.
-												Start work on the draft about semantics

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-18 18:05:39 -07:00
-												Add a visualization of two ASTs

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-11-03 17:06:19 -08:00
+								```
 								Expression: 2 - (x+y)
 								    (-)
 								   /   \
 (+)
 								      /   \
 								     x     y
 								Expression: (2-x) + y
 								       (+)
 								      /   \
 								    (-)    y
 								   /   \
 x
 								```
 								Above, in the first AST, `(+)` is a child of the `(-)` node, which means
 								that it's a sub-expression. As a result, that subexpression is evaluated first,
 								before evaluating `(-)`, and so, the AST expresents `2-(x+y)`. In the other
 								example, `(-)` is a child of `(+)`, and is therefore evaluated first. The resulting
 								association encoded by that AST is `(2-x)+y`.
-												Start work on the draft about semantics

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-18 18:05:39 -07:00
 								To an Agda programmer, the one-of-four-things definition above should read
 								quite similarly to the definition of an algebraic data type. Indeed, this
 								is how we can encode the abstract syntax tree of expressions:
 								{{< codelines "Agda" "agda-spa/Language/Base.agda" 12 16 >}}
 								The only departure from the grammar above is that I had to invent constructors
 								for the variable and integer cases, since Agda doesn't support implicit coercions.
 								This adds a little bit of extra overhead, requiring, for example, that we write
 								numbers as `# 42` instead of `42`.
 								Having defined expressions, the next thing on the menu is _statements_. Unlike
 								expressions, which just produce values, statements "do something"; an example
 								of a statement might be the following Python line:
 								```Python
 								print("Hello, world!")
 								```
 								The `print` function doesn't produce any value, but it does perform an action;
 								it prints its argument to the console!
 								For the formalization, it turns out to be convenient to separate "simple"
 								statements from "complex" ones. Pragmatically speaking, the difference is that
 								between the "simple" and the "complex" is control flow; simple statements
-												Make minor edits to part 5 of SPA

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-31 11:37:28 -07:00
+								will be guaranteed to always execute without any decisions or jumps.
-												Start work on the draft about semantics

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-18 18:05:39 -07:00
+								The reason for this will become clearer in subsequent posts; I will foreshadow
 								a bit by saying that consecutive simple statements can be placed into a single
 								[basic block](https://en.wikipedia.org/wiki/Basic_block).
 								The following is a group of three simple statements:
 								```
 								x = 1
 								y = x + 2
 								noop
 								```
 								These will always be executed in the same order, exactly once. Here, `noop`
-												Make minor edits to part 5 of SPA

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-31 11:37:28 -07:00
+								is a convenient type of statement that simply does nothing.
-												Start work on the draft about semantics

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-18 18:05:39 -07:00
 								On the other hand, the following statement is not simple:
 								```
 								while x {
 								  x = x - 1
 								}
 								```
-												Make minor edits to part 5 of SPA

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-31 11:37:28 -07:00
+								It's not simple because it makes decisions about how the code should be executed;
-												Start work on the draft about semantics

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-18 18:05:39 -07:00
+								if `x` is nonzero, it will try executing the statement in the body of the loop
 								(`x = x - 1`). Otherwise, it would skip evaluating that statement, and carry on
 								with subsequent code.
 								I first define simple statements using the `BasicStmt` type:
 								{{< codelines "Agda" "agda-spa/Language/Base.agda" 18 20 >}}
 								Complex statements are just called `Stmt`; they include loops, conditionals and
-												Make minor edits to part 5 of SPA

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-31 11:37:28 -07:00
+								sequences ---
-												Start work on the draft about semantics

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-18 18:05:39 -07:00
+								{{< sidenote "right" "then-note" "\(s_1\ \text{then}\ s_2\)" >}}
 								The standard notation for sequencing in imperative languages is
 								\(s_1; s_2\). However, Agda gives special meaning to the semicolon,
 								and I couldn't find any passable symbolic alternatives.
-												Make minor edits to part 5 of SPA

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-31 11:37:28 -07:00
+								{{< /sidenote >}} is a sequence where \(s_2\) is evaluated after \(s_1\).
-												Start work on the draft about semantics

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-08-18 18:05:39 -07:00
+								Complex statements subsume simple statements, which I model using the constructor
 								`⟨_⟩`.
 								{{< codelines "Agda" "agda-spa/Language/Base.agda" 25 29 >}}
 								For an example of using this encoding, take the following simple program:
 								```
 								var = 1
 								if var {
 								  x = 1
 								}
 								```
 								The Agda version is:
 								{{< codelines "Agda" "agda-spa/Main.agda" 27 34 >}}
 								Notice how we used `noop` to express the fact that the `else` branch of the
 								conditional does nothing.
-												Write the semantics section using Bergamot

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-10-13 13:31:47 -07:00
 								### The Semantics of Our Language
 								We now have all the language constructs that I'll be showing off --- because
 								those are all the concepts that I've formalized. What's left is to define
 								how they behave. We will do this using a logical tool called
 								[_inference rules_](https://en.wikipedia.org/wiki/Rule_of_inference). I've
 								written about them a number of times; they're ubiquitous, particularly in the
 								sorts of things I like explore on this site. The [section on inference rules]({{< relref "01_aoc_coq#inference-rules" >}})
 								from my Advent of Code series is pretty relevant, and [the notation section from
 								a post in my compiler series]({{< relref "03_compiler_typechecking#some-notation" >}}) says
 								much the same thing; I won't be re-describing them here.
 								There are three pieces which demand semantics: expressions, simple statements,
 								and non-simple statements. The semantics of each of the three requires the
 								semantics of the items that precede it. We will therefore start with expressions.
 								#### Expressions
 								The trickiest thing about expression is that the value of an expression depends
 								on the "context": `x+1` can evaluate to `43` if `x` is `42`, or it can evaluate
 								to `0` if `x` is `-1`. To evaluate an expression, we will therefore need to
 								assign values to all of the variables in that expression. A mapping that
 								assigns values to variables is typically called an _environment_. We will write
 								\(\varnothing\) for "empty environment", and \(\{\texttt{x} \mapsto 42, \texttt{y} \mapsto -1 \}\) for
 								an environment that maps the variable \(\texttt{x}\) to 42, and the variable \(\texttt{y}\) to -1.
 								Now, a bit of notation. We will use the letter \(\rho\) to represent environments
 								(and if several environments are involved, we will occasionally number them
 								as \(\rho_1\), \(\rho_2\), etc.) We will use the letter \(e\) to stand for
 								expressions, and the letter \(v\) to stand for values. Finally, we'll write
 								\(\rho, e \Downarrow v\) to say that "in an environment \(\rho\), expression \(e\)
 								evaluates to value \(v\)". Our two previous examples of evaluating `x+1` can
 								thus be written as follows:
 								{{< latex >}}
 								\{ \texttt{x} \mapsto 42 \}, \texttt{x}+1 \Downarrow 43 \\
 								\{ \texttt{x} \mapsto -1 \}, \texttt{x}+1 \Downarrow 0 \\
 								{{< /latex >}}
 								Now, on to the actual rules for how to evaluate expressions. Most simply,
-												Edit and publish SPA part 5

											
										
										
											2024-11-03 17:50:11 -08:00
+								integer literals like `1` just evaluate to themselves.
-												Write the semantics section using Bergamot

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-10-13 13:31:47 -07:00
 								{{< latex >}}
 								\frac{n \in \text{Int}}{\rho, n \Downarrow n}
 								{{< /latex >}}
 								Note that the letter \(\rho\) is completely unused in the above rule. That's
 								because no matter what values _variables_ have, a number still evaluates to
 								the same value. As we've already established, the same is not true for a
 								variable like \(\texttt{x}\). To evaluate such a variable, we need to retrieve
 								the value it's mapped to in the current environment, which we will write as
 								\(\rho(\texttt{x})\). This gives the following inference rule:
 								{{< latex >}}
 								\frac{\rho(x) = v}{\rho, x \Downarrow v}
 								{{< /latex >}}
 								All that's left is to define addition and subtraction. For an expression in the
 								form \(e_1+e_2\), we first need to evaluate the two subexpressions \(e_1\)
 								and \(e_2\), and then add the two resulting numbers. As a result, the addition
 								rule includes two additional premises, one for evaluating each summand.
 								{{< latex >}}
 								\frac
 								    {\rho, e_1 \Downarrow v_1 \quad \rho, e_2 \Downarrow v_2 \quad v_1 + v_2 = v}
 								    {\rho, e_1+e_2 \Downarrow v}
 								{{< /latex >}}
 								The subtraction rule is similar. Below, I've configured an instance of
 								[Bergamot]({{< relref "bergamot" >}}) to interpret these exact rules. Try
 								typing various expressions like `1`, `1+1`, etc. into the input box below
 								to see them evaluate. If you click the "Full Proof Tree" button, you can also view
 								the exact rules that were used in computing a particular value. The variables
 								`x`, `y`, and `z` are pre-defined for your convenience.
 								{{< bergamot_widget id="expr-widget" query="" prompt="eval(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?v)" modes="Expression:Expression" >}}
-												Use unification-based 'eq' for numbers and symbols

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-11-03 16:05:04 -08:00
+								hidden section "" {
 								  Eq @ eq(?x, ?x) <-;
 								}
-												Write the semantics section using Bergamot

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-10-13 13:31:47 -07:00
+								section "" {
 								  EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
 								  EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
 								}
 								section "" {
 								  EvalPlus @ eval(?rho, plus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), add(?v_1, ?v_2, ?v);
 								  EvalMinus @ eval(?rho, minus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), subtract(?v_1, ?v_2, ?v);
 								}
 								hidden section "" {
 								  EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
-												Use unification-based 'eq' for numbers and symbols

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-11-03 16:05:04 -08:00
+								  EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(eq(?x, ?y));
-												Write the semantics section using Bergamot

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-10-13 13:31:47 -07:00
+								}
 								{{< /bergamot_widget >}}
 								The Agda equivalent of this looks very similar to the rules themselves. I use
 								`⇒ᵉ` instead of \(\Downarrow\), and there's a little bit of tedium with
 								wrapping integers into a new `Value` type. I also used a (partial) relation
 								`(x, v) ∈ ρ` instead of explicitly defining accessing an environment, since
 								it is conceivable for a user to attempt accessing a variable that has not
 								been assigned to. Aside from these notational changes, the structure of each
 								of the constructors of the evaluation data type matches the inference rules
 								I showed above.
 								{{< codelines "Agda" "agda-spa/Language/Semantics.agda" 27 35 >}}
 								#### Simple Statements
 								The main difference between formalizing (simple and "normal") statements is
 								that they modify the environment. If `x` has one value, writing `x = x + 1` will
-												Edit and publish SPA part 5

											
										
										
											2024-11-03 17:50:11 -08:00
+								certainly change that value. On the other hand, statements don't produce values.
-												Write the semantics section using Bergamot

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-10-13 13:31:47 -07:00
+								So, we will be writing claims like \(\rho_1 , \textit{bs} \Rightarrow \rho_2\)
 								to say that the basic statement \(\textit{bs}\), when starting in environment
 								\(\rho_1\), will produce environment \(\rho_2\). Here's an example:
 								{{< latex >}}
 								\{  \texttt{x} \mapsto 42, \texttt{y} \mapsto 17 \}, \  \texttt{x = x - \text{1}} \Rightarrow \{  \texttt{x} \mapsto 41, \texttt{y} \mapsto 17 \}
 								{{< /latex >}}
 								Here, we subtracted one from a variable with value `42`, leaving it with a new
 								value of `41`.
 								There are two basic statements, and one of them quite literally does nothing.
 								The inference rule for `noop` is very simple:
 								{{< latex >}}
 								\rho,\ \texttt{noop} \Rightarrow \rho
 								{{< /latex >}}
 								For the assignment rule, we need to know how to evaluate the expression on the
 								right side of the equal sign. This is why we needed to define the semantics
 								of expressions first. Given those, the evaluation rule for assignment is as
 								follows, with \(\rho[x \mapsto v]\) meaning "the environment \(\rho\) but
 								mapping the variable \(x\) to value \(v\)".
 								{{< latex >}}
 								\frac
 								  {\rho, e \Downarrow v}
 								  {\rho, x = e \Rightarrow \rho[x \mapsto v]}
 								{{< /latex >}}
 								Those are actually all the rules we need, and below, I am once again configuring
 								a Bergamot instance, this time with simple statements. Try out `noop` or some
 								sort of variable assignment, like `x = x + 1`.
 								{{< bergamot_widget id="basic-stmt-widget" query="" prompt="stepbasic(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?env)" modes="Basic Statement:Basic Statement" >}}
-												Use unification-based 'eq' for numbers and symbols

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-11-03 16:05:04 -08:00
+								hidden section "" {
 								  Eq @ eq(?x, ?x) <-;
 								}
-												Write the semantics section using Bergamot

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-10-13 13:31:47 -07:00
+								hidden section "" {
 								  EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
 								  EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
 								}
 								hidden section "" {
 								  EvalPlus @ eval(?rho, plus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), add(?v_1, ?v_2, ?v);
 								  EvalMinus @ eval(?rho, minus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), subtract(?v_1, ?v_2, ?v);
 								}
 								section "" {
 								  StepNoop @ stepbasic(?rho, noop, ?rho) <-;
 								  StepAssign @ stepbasic(?rho, assign(?x, ?e), extend(?rho, ?x, ?v)) <- eval(?rho, ?e, ?v);
 								}
 								hidden section "" {
 								  EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
-												Use unification-based 'eq' for numbers and symbols

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-11-03 16:05:04 -08:00
+								  EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(eq(?x, ?y));
-												Write the semantics section using Bergamot

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-10-13 13:31:47 -07:00
+								}
 								{{< /bergamot_widget >}}
 								The Agda implementation is once again just a data type with constructors-for-rules.
 								This time they also look quite similar to the rules I've shown up until now,
 								though I continue to explicitly quantify over variables like `ρ`.
 								{{< codelines "Agda" "agda-spa/Language/Semantics.agda" 37 40 >}}
 								#### Statements
 								Let's work on non-simple statements next. The easiest rule to define is probably
 								sequencing. When we use `then` (or `;`) to combine two statements, what we
 								actually want is to execute the first statement, which may change variables,
 								and then execute the second statement while keeping the changes from the first.
 								This means there are three environments: \(\rho_1\) for the initial state before
 								either statement is executed, \(\rho_2\) for the state between executing the
 								first and second statement, and \(\rho_3\) for the final state after both
 								are done executing. This leads to the following rule:
 								{{< latex >}}
 								\frac
 								    { \rho_1, s_1 \Rightarrow \rho_2 \quad \rho_2, s_2 \Rightarrow \rho_3 }
 								    { \rho_1, s_1; s_2 \Rightarrow \rho_3 }
 								{{< /latex >}}
 								We will actually need two rules to evaluate the conditional statement: one
 								for when the condition evaluates to "true", and one for when the condition
 								evaluates to "false". Only, I never specified booleans as being part of
 								the language, which means that we will need to come up what "false" and "true"
 								are. I will take my cue from C++ and use zero as "false", and any other number
 								as "true".
 								If the condition of an `if`-`else` statement is "true" (nonzero), then the
 								effect of executing the `if`-`else` should be the same as executing the "then"
 								part of the statement, while completely ignoring the "else" part.
 								{{< latex >}}
 								\frac
 								    { \rho_1 , e \Downarrow v \quad v \neq 0 \quad \rho_1, s_1 \Rightarrow \rho_2}
 								    { \rho_1, \textbf{if}\ e\ \{ s_1 \}\ \textbf{else}\ \{ s_2 \}\ \Rightarrow \rho_2 }
 								{{< /latex >}}
 								Notice that in the above rule, we used the evaluation judgement \(\rho_1, e \Downarrow v\)
 								to evaluate the _expression_ that serves as the condition. We then had an
 								additional premise that requires the truthiness of the resulting value \(v\).
 								The rule for evaluating a conditional with a "false" branch is very similar.
 								{{< latex >}}
 								\frac
 								    { \rho_1 , e \Downarrow v \quad v = 0 \quad \rho_1, s_2 \Rightarrow \rho_2}
 								    { \rho_1, \textbf{if}\ e\ \{ s_1 \}\ \textbf{else}\ \{ s_2 \}\ \Rightarrow \rho_2 }
 								{{< /latex >}}
-												Finish most of part 5 of SPA series

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-11-03 16:58:06 -08:00
+								Now that we have rules for conditional statements, it will be surprisingly easy
 								to define the rules for `while` loops. A `while` loop will also have two rules,
 								one for when its condition is truthy and one for when it's falsey. However,
 								unlike the "false" case, a while loop will do nothing, leaving the environment
 								unchanged:
 								{{< latex >}}
 								\frac
 								    { \rho_1 , e \Downarrow v \quad v = 0 }
 								    { \rho_1 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_1 }
 								{{< /latex >}}
 								The trickiest rule is for when the condition of a `while` loop is true.
 								We evaluate the body once, starting in environment \(\rho_1\) and finishing
 								in \(\rho_2\), but then we're not done. In fact, we have to go back to the top,
 								and check the condition again, starting over. As a result, we include another
 								premise, that tells us that evaluating the loop starting at \(\rho_2\), we
 								eventually end in state \(\rho_3\). This encodes the "rest of the iterations"
 								in addition to the one we just performed. The environment \(\rho_3\) is our
 								final state, so that's what we use in the rule's conclusion.
 								{{< latex >}}
 								\frac
 								    { \rho_1 , e \Downarrow v \quad v \neq 0 \quad \rho_1 , s \Rightarrow \rho_2 \quad \rho_2 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_3 }
 								    { \rho_1 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_3 }
 								{{< /latex >}}
 								And that's it! We have now seen every rule that defines the little object language
-												Edit and publish SPA part 5

											
										
										
											2024-11-03 17:50:11 -08:00
+								I've been using for my Agda work. Below is a Bergamot widget that implements
 								these rules. Try the following program, which computes the `x`th power of two,
 								and stores it in `y`:
-												Finish most of part 5 of SPA series

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-11-03 16:58:06 -08:00
 								```
 								x = 5; y = 1; while (x) { y = y + y; x = x - 1 }
 								```
-												Write the semantics section using Bergamot

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-10-13 13:31:47 -07:00
+								{{< bergamot_widget id="stmt-widget" query="" prompt="step(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?env)" modes="Statement:Statement" >}}
-												Use unification-based 'eq' for numbers and symbols

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-11-03 16:05:04 -08:00
+								hidden section "" {
 								  Eq @ eq(?x, ?x) <-;
 								}
-												Write the semantics section using Bergamot

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-10-13 13:31:47 -07:00
+								hidden section "" {
 								  EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
 								  EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
 								}
 								hidden section "" {
 								  EvalPlus @ eval(?rho, plus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), add(?v_1, ?v_2, ?v);
 								  EvalMinus @ eval(?rho, minus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), subtract(?v_1, ?v_2, ?v);
 								}
 								hidden section "" {
 								  StepNoop @ stepbasic(?rho, noop, ?rho) <-;
 								  StepAssign @ stepbasic(?rho, assign(?x, ?e), extend(?rho, ?x, ?v)) <- eval(?rho, ?e, ?v);
 								}
 								hidden section "" {
 								  StepNoop @ stepbasic(?rho, noop, ?rho) <-;
 								  StepAssign @ stepbasic(?rho, assign(?x, ?e), extend(?rho, ?x, ?v)) <- eval(?rho, ?e, ?v);
 								}
 								hidden section "" {
 								  StepLiftBasic @ step(?rho_1, ?s, ?rho_2) <- stepbasic(?rho_1, ?s, ?rho_2);
 								}
 								section "" {
-												Use unification-based 'eq' for numbers and symbols

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-11-03 16:05:04 -08:00
+								  StepIfTrue @ step(?rho_1, if(?e, ?s_1, ?s_2), ?rho_2) <- eval(?rho_1, ?e, ?v), not(eq(?v, 0)), step(?rho_1, ?s_1, ?rho_2);
 								  StepIfFalse @ step(?rho_1, if(?e, ?s_1, ?s_2), ?rho_2) <- eval(?rho_1, ?e, ?v), eq(?v, 0), step(?rho_1, ?s_2, ?rho_2);
 								  StepWhileTrue @ step(?rho_1, while(?e, ?s), ?rho_3) <- eval(?rho_1, ?e, ?v), not(eq(?v, 0)), step(?rho_1, ?s, ?rho_2), step(?rho_2, while(?e, ?s), ?rho_3);
 								  StepWhileFalse @ step(?rho_1, while(?e, ?s), ?rho_1) <- eval(?rho_1, ?e, ?v), eq(?v, 0);
-												Write the semantics section using Bergamot

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-10-13 13:31:47 -07:00
+								  StepSeq @ step(?rho_1, seq(?s_1, ?s_2), ?rho_3) <- step(?rho_1, ?s_1, ?rho_2), step(?rho_2, ?s_2, ?rho_3);
 								}
 								hidden section "" {
 								  EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
-												Use unification-based 'eq' for numbers and symbols

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-11-03 16:05:04 -08:00
+								  EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(eq(?x, ?y));
-												Write the semantics section using Bergamot

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-10-13 13:31:47 -07:00
+								}
 								{{< /bergamot_widget >}}
-												Finish most of part 5 of SPA series

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-11-03 16:58:06 -08:00
-												Edit and publish SPA part 5

											
										
										
											2024-11-03 17:50:11 -08:00
+								As with all the other rules we've seen, the mathematical notation above can
 								be directly translated into Agda:
 								{{< codelines "Agda" "agda-spa/Language/Semantics.agda" 47 64 >}}
-												Finish most of part 5 of SPA series

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>

											
										
										
											2024-11-03 16:58:06 -08:00
+								### Semantics as Ground Truth
 								Prior to this post, we had been talking about using lattices and monotone
 								functions for program analysis. The key problem with using this framework to
 								define analyses is that there are many monotone functions that produce complete
 								nonsese; their output is, at best, unrelated to the program they're supposed
 								to analyze. We don't want to write such functions, since having incorrect
 								information about the programs in question is unhelpful.
 								What does it mean for a function to produce correct information, though?
 								In the context of sign analysis, it would mean that if we say a variable `x` is `+`,
 								then evaluating the program will leave us in a state in which `x` is posive.
 								The semantics we defined in this post give us the "evaluating the program piece".
 								They establish what the programs _actually_ do, and we can use this ground
 								truth when checking that our analyses are correct. In subsequent posts, I will
 								prove the exact property I informally stated above: __for the program analyses
 								we define, things they "claim" about our program will match what actually happens
 								when executing the program using our semantics__.
 								A piece of the puzzle still remains: how are we going to use the monotone
 								functions we've been talking so much about? We need to figure out what to feed
 								to our analyses before we can prove their correctness.
 								I have an answer to that question: we will be using _control flow graphs_ (CFGs).
 								These are another program representation, one that's more commonly found in
 								compilers. I will show what they look like in the next post. I hope to see you
 								there!