109 lines
6.6 KiB
Markdown
109 lines
6.6 KiB
Markdown
---
|
||
title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 0: Intro"
|
||
series: "Static Program Analysis in Agda"
|
||
description: "In this post, I give a top-level overview of my work on formally verified static analyses"
|
||
date: 2024-07-06T17:37:42-07:00
|
||
tags: ["Agda", "Programming Languages"]
|
||
---
|
||
|
||
Some years ago, when the Programming Languages research group at Oregon State
|
||
University was discussing what to read, the [_Static Program Analysis_](https://cs.au.dk/~amoeller/spa/)
|
||
lecture notes came up. The group didn't end up reading the lecture notes,
|
||
but I did. As I was going through them, I noticed that they were quite rigorous:
|
||
the first several chapters cover a little bit of [lattice theory](https://en.wikipedia.org/wiki/Lattice_(order)),
|
||
and the subsequent analyses -- and the descriptions thereof -- are quite precise.
|
||
When I went to implement the algorithms in the textbook, I realized that just
|
||
writing them down would not be enough. After all, the textbook also proves
|
||
several properties of the lattice-based analyses, which would be lost in
|
||
translation if I were to just write C++ or Haskell.
|
||
|
||
At the same time, I noticed that lots of recent papers in programming language
|
||
theory were formalizing their results in
|
||
[Agda](https://agda.readthedocs.io/en/latest/getting-started/what-is-agda.html).
|
||
Having [played]({{< relref "meaningfully-typechecking-a-language-in-idris" >}})
|
||
[with]({{< relref "advent-of-code-in-coq" >}}) [dependent]({{< relref "coq_dawn_eval" >}})
|
||
[types]({{< relref "coq_palindrome" >}}) before, I was excited to try it out.
|
||
Thus began my journey to formalize (the first few chapters of) _Static Program Analysis_
|
||
in Agda.
|
||
|
||
In all, I built a framework for static analyses, based on a tool
|
||
called _motone functions_. This framework can be used to implement and
|
||
reason about many different analyses (currently only a certain class called
|
||
_forward analyses_, but that's not hard limitation). Recently, I've proven
|
||
the _correctness_ of the algorithms my framework produces. Having reached
|
||
this milestone, I'd like to pause and talk about what I've done.
|
||
|
||
In subsequent posts in this series, will describe what I have so far.
|
||
It's not perfect, and some work is yet to be done; however, getting to
|
||
this point was no joke, and I think it's worth discussing. In all,
|
||
I'd like to cover the following major topics, spending a couple of posts on each:
|
||
|
||
* __Lattices__: the analyses I'm reasoning about use an algebraic structure
|
||
called a _lattice_. This structure has certain properties that make it
|
||
amenable to describing degrees of "knowledge" about a program. In
|
||
lattice-based static program analysis, the various elements of the
|
||
lattice represent different facts or properties that we know about the
|
||
program in question; operations on the lattice help us combine these facts
|
||
and reason about them. I write about this in {{< draftlink "Part 1: Lattices" "01_spa_agda_lattices" >}}.
|
||
|
||
Interestingly, lattices can be made by combining other lattices in certain
|
||
ways. We can therefore use simpler lattices as building blocks to create
|
||
more complex ones, all while preserving the algebraic structure that
|
||
we need for program analysis. I write about this in
|
||
{{< draftlink "Part 2: Combining Lattices" "02_spa_agda_combining_lattices" >}}.
|
||
|
||
* __The Fixed-Point Algorithm__: to analyze a program, we use information
|
||
that we already know to compute additional information. For instance,
|
||
we might use the fact that `1` is positive to compute the fact that
|
||
`1+1` is positive as well. Using that information, we can determine the
|
||
sign of `(1+1)+1`, and so on. In practice, this is often done by calling
|
||
some kind of "analyze" function over and over, each time getting closer to an
|
||
accurate characterization of the program's behavior. When the output of "analyze"
|
||
stops changing, we know we've found as much as we can find, and stop.
|
||
|
||
What does it mean for the output to stop changing? Roughly, that's when
|
||
the following equation holds: `knownInfo = analyze(knownInfo)`. In mathematics,
|
||
this is known as a [fixed point](https://mathworld.wolfram.com/FixedPoint.html).
|
||
To enable computing fixed points, we focus on a specific kind of lattices:
|
||
those with a _fixed height_. I talk about what this means in
|
||
{{< draftlink "Part 3: Lattices of Finite Height" "03_spa_agda_fixed_height" >}}.
|
||
|
||
Even if we restrict our attention to lattices of fixed height,
|
||
not all functions have fixed points; however, certain types of functions on
|
||
lattices always do. The _fixed-point algorithm_ is a way to compute these
|
||
points, and we will use this to drive our analyses. I talk
|
||
about this in {{< draftlink "Part 4: The Fixed-Point Algorithm" "04_spa_agda_fixedpoint" >}}.
|
||
|
||
* __Correctness__: putting together the work on lattices and the fixed-point
|
||
algorithm, we can implement a static program analyzer in Agda. However,
|
||
it's not hard to write an "analyze" function that has a fixed point but
|
||
produces an incorrect result. Thus, the next step is to prove that the results
|
||
of our analyzer accurately describe the program in question.
|
||
|
||
The interesting aspect of this step is that our program analyzer works
|
||
on [control-flow graphs](https://en.wikipedia.org/wiki/Control-flow_graph) (CFGs),
|
||
which are a relatively compiler-centric representation of programs. On the
|
||
other hand, what the language _actually does_ is defined by its
|
||
[semantics](https://en.wikipedia.org/wiki/Semantics_(computer_science)),
|
||
which is not at all compiler-centric. We need to connect these two, showing
|
||
that the CFGs we produce "make sense" for our language, and that given
|
||
CFGs that make sense, our analysis produces results that match the language's
|
||
execution. To do so, I write about the language and its semantics
|
||
in {{< draftlink "Part 5: Our Programming Language" "05_spa_agda_semantics" >}},
|
||
then about building control flow graphs for the language in
|
||
{{< draftlink "Part 6: Control Flow Graphs" "06_spa_agda_cfg" >}}.
|
||
I then write about combining these two representations in
|
||
{{< draftlink "Part 7: Connecting Semantics and Control Flow Graphs" "07_spa_agda_semantics_and_cfg" >}}.
|
||
|
||
|
||
### Navigation
|
||
Here are the posts that I’ve written so far for this series:
|
||
|
||
* {{< draftlink "Lattices" "01_spa_agda_lattices" >}}
|
||
* {{< draftlink "Combining Lattices" "02_spa_agda_combining_lattices" >}}
|
||
* {{< draftlink "Lattices of Finite Height" "03_spa_agda_fixed_height" >}}
|
||
* {{< draftlink "The Fixed-Point Algorithm" "04_spa_agda_fixedpoint" >}}
|
||
* {{< draftlink "Our Programming Language" "05_spa_agda_semantics" >}}
|
||
* {{< draftlink "Control Flow Graphs" "06_spa_agda_cfg" >}}.
|
||
* {{< draftlink "Connecting Semantics and Control Flow Graphs" "07_spa_agda_semantics_and_cfg" >}}.
|