2024-03-14 22:25:17 -07:00
|
|
|
|
---
|
2024-05-12 13:49:21 -07:00
|
|
|
|
title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 0: Intro"
|
2024-03-14 22:25:17 -07:00
|
|
|
|
series: "Static Program Analysis in Agda"
|
2024-05-12 18:28:44 -07:00
|
|
|
|
description: "In this post, I give a top-level overview of my work on formally verified static analyses"
|
2024-07-06 17:39:03 -07:00
|
|
|
|
date: 2024-07-06T17:37:42-07:00
|
2024-05-20 21:46:14 -07:00
|
|
|
|
tags: ["Agda", "Programming Languages"]
|
2024-03-14 22:25:17 -07:00
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
Some years ago, when the Programming Languages research group at Oregon State
|
|
|
|
|
University was discussing what to read, the [_Static Program Analysis_](https://cs.au.dk/~amoeller/spa/)
|
2024-05-12 13:49:21 -07:00
|
|
|
|
lecture notes came up. The group didn't end up reading the lecture notes,
|
2024-03-14 22:25:17 -07:00
|
|
|
|
but I did. As I was going through them, I noticed that they were quite rigorous:
|
|
|
|
|
the first several chapters cover a little bit of [lattice theory](https://en.wikipedia.org/wiki/Lattice_(order)),
|
|
|
|
|
and the subsequent analyses -- and the descriptions thereof -- are quite precise.
|
|
|
|
|
When I went to implement the algorithms in the textbook, I realized that just
|
|
|
|
|
writing them down would not be enough. After all, the textbook also proves
|
|
|
|
|
several properties of the lattice-based analyses, which would be lost in
|
|
|
|
|
translation if I were to just write C++ or Haskell.
|
|
|
|
|
|
|
|
|
|
At the same time, I noticed that lots of recent papers in programming language
|
|
|
|
|
theory were formalizing their results in
|
|
|
|
|
[Agda](https://agda.readthedocs.io/en/latest/getting-started/what-is-agda.html).
|
|
|
|
|
Having [played]({{< relref "meaningfully-typechecking-a-language-in-idris" >}})
|
|
|
|
|
[with]({{< relref "advent-of-code-in-coq" >}}) [dependent]({{< relref "coq_dawn_eval" >}})
|
|
|
|
|
[types]({{< relref "coq_palindrome" >}}) before, I was excited to try it out.
|
2024-05-12 13:49:21 -07:00
|
|
|
|
Thus began my journey to formalize (the first few chapters of) _Static Program Analysis_
|
2024-03-14 22:25:17 -07:00
|
|
|
|
in Agda.
|
|
|
|
|
|
2024-05-12 13:49:21 -07:00
|
|
|
|
In all, I built a framework for static analyses, based on a tool
|
|
|
|
|
called _motone functions_. This framework can be used to implement and
|
|
|
|
|
reason about many different analyses (currently only a certain class called
|
|
|
|
|
_forward analyses_, but that's not hard limitation). Recently, I've proven
|
|
|
|
|
the _correctness_ of the algorithms my framework produces. Having reached
|
|
|
|
|
this milestone, I'd like to pause and talk about what I've done.
|
|
|
|
|
|
|
|
|
|
In subsequent posts in this series, will describe what I have so far.
|
|
|
|
|
It's not perfect, and some work is yet to be done; however, getting to
|
|
|
|
|
this point was no joke, and I think it's worth discussing. In all,
|
2024-05-30 20:02:49 -07:00
|
|
|
|
I'd like to cover the following major topics, spending a couple of posts on each:
|
2024-05-12 13:49:21 -07:00
|
|
|
|
|
|
|
|
|
* __Lattices__: the analyses I'm reasoning about use an algebraic structure
|
|
|
|
|
called a _lattice_. This structure has certain properties that make it
|
|
|
|
|
amenable to describing degrees of "knowledge" about a program. In
|
|
|
|
|
lattice-based static program analysis, the various elements of the
|
|
|
|
|
lattice represent different facts or properties that we know about the
|
|
|
|
|
program in question; operations on the lattice help us combine these facts
|
2024-07-06 17:39:03 -07:00
|
|
|
|
and reason about them. I write about this in {{< draftlink "Part 1: Lattices" "01_spa_agda_lattices" >}}.
|
2024-05-12 13:49:21 -07:00
|
|
|
|
|
|
|
|
|
Interestingly, lattices can be made by combining other lattices in certain
|
|
|
|
|
ways. We can therefore use simpler lattices as building blocks to create
|
|
|
|
|
more complex ones, all while preserving the algebraic structure that
|
2024-05-30 20:02:49 -07:00
|
|
|
|
we need for program analysis. I write about this in
|
2024-07-06 17:39:03 -07:00
|
|
|
|
{{< draftlink "Part 2: Combining Lattices" "02_spa_agda_combining_lattices" >}}.
|
2024-05-12 13:49:21 -07:00
|
|
|
|
|
|
|
|
|
* __The Fixed-Point Algorithm__: to analyze a program, we use information
|
|
|
|
|
that we already know to compute additional information. For instance,
|
|
|
|
|
we might use the fact that `1` is positive to compute the fact that
|
|
|
|
|
`1+1` is positive as well. Using that information, we can determine the
|
|
|
|
|
sign of `(1+1)+1`, and so on. In practice, this is often done by calling
|
|
|
|
|
some kind of "analyze" function over and over, each time getting closer to an
|
|
|
|
|
accurate characterization of the program's behavior. When the output of "analyze"
|
|
|
|
|
stops changing, we know we've found as much as we can find, and stop.
|
|
|
|
|
|
|
|
|
|
What does it mean for the output to stop changing? Roughly, that's when
|
|
|
|
|
the following equation holds: `knownInfo = analyze(knownInfo)`. In mathematics,
|
|
|
|
|
this is known as a [fixed point](https://mathworld.wolfram.com/FixedPoint.html).
|
2024-07-06 17:39:03 -07:00
|
|
|
|
To enable computing fixed points, we focus on a specific kind of lattices:
|
|
|
|
|
those with a _fixed height_. I talk about what this means in
|
|
|
|
|
{{< draftlink "Part 3: Lattices of Finite Height" "03_spa_agda_fixed_height" >}}.
|
|
|
|
|
|
|
|
|
|
Even if we restrict our attention to lattices of fixed height,
|
|
|
|
|
not all functions have fixed points; however, certain types of functions on
|
|
|
|
|
lattices always do. The _fixed-point algorithm_ is a way to compute these
|
|
|
|
|
points, and we will use this to drive our analyses. I talk
|
|
|
|
|
about this in {{< draftlink "Part 4: The Fixed-Point Algorithm" "04_spa_agda_fixedpoint" >}}.
|
2024-05-12 13:49:21 -07:00
|
|
|
|
|
|
|
|
|
* __Correctness__: putting together the work on lattices and the fixed-point
|
|
|
|
|
algorithm, we can implement a static program analyzer in Agda. However,
|
|
|
|
|
it's not hard to write an "analyze" function that has a fixed point but
|
|
|
|
|
produces an incorrect result. Thus, the next step is to prove that the results
|
|
|
|
|
of our analyzer accurately describe the program in question.
|
|
|
|
|
|
|
|
|
|
The interesting aspect of this step is that our program analyzer works
|
|
|
|
|
on [control-flow graphs](https://en.wikipedia.org/wiki/Control-flow_graph) (CFGs),
|
|
|
|
|
which are a relatively compiler-centric representation of programs. On the
|
|
|
|
|
other hand, what the language _actually does_ is defined by its
|
|
|
|
|
[semantics](https://en.wikipedia.org/wiki/Semantics_(computer_science)),
|
|
|
|
|
which is not at all compiler-centric. We need to connect these two, showing
|
|
|
|
|
that the CFGs we produce "make sense" for our language, and that given
|
|
|
|
|
CFGs that make sense, our analysis produces results that match the language's
|
|
|
|
|
execution.
|
|
|
|
|
|
2024-05-30 20:52:33 -07:00
|
|
|
|
|
|
|
|
|
### Navigation
|
|
|
|
|
Here are the posts that I’ve written so far for this series:
|
|
|
|
|
|
2024-07-06 17:39:03 -07:00
|
|
|
|
* {{< draftlink "Lattices" "01_spa_agda_lattices" >}}
|
|
|
|
|
* {{< draftlink "Combining Lattices" "02_spa_agda_combining_lattices" >}}
|
|
|
|
|
* {{< draftlink "Lattices of Finite Height" "03_spa_agda_fixed_height" >}}
|