blog-static/content/blog/00_spa_agda_intro.md
2024-05-12 18:28:44 -07:00

5.1 KiB

title series description date draft
Implementing and Verifying "Static Program Analysis" in Agda, Part 0: Intro Static Program Analysis in Agda In this post, I give a top-level overview of my work on formally verified static analyses 2024-04-12T14:23:03-07:00 true

Some years ago, when the Programming Languages research group at Oregon State University was discussing what to read, the Static Program Analysis lecture notes came up. The group didn't end up reading the lecture notes, but I did. As I was going through them, I noticed that they were quite rigorous: the first several chapters cover a little bit of lattice theory, and the subsequent analyses -- and the descriptions thereof -- are quite precise. When I went to implement the algorithms in the textbook, I realized that just writing them down would not be enough. After all, the textbook also proves several properties of the lattice-based analyses, which would be lost in translation if I were to just write C++ or Haskell.

At the same time, I noticed that lots of recent papers in programming language theory were formalizing their results in Agda. Having [played]({{< relref "meaningfully-typechecking-a-language-in-idris" >}}) [with]({{< relref "advent-of-code-in-coq" >}}) [dependent]({{< relref "coq_dawn_eval" >}}) [types]({{< relref "coq_palindrome" >}}) before, I was excited to try it out. Thus began my journey to formalize (the first few chapters of) Static Program Analysis in Agda.

In all, I built a framework for static analyses, based on a tool called motone functions. This framework can be used to implement and reason about many different analyses (currently only a certain class called forward analyses, but that's not hard limitation). Recently, I've proven the correctness of the algorithms my framework produces. Having reached this milestone, I'd like to pause and talk about what I've done.

In subsequent posts in this series, will describe what I have so far. It's not perfect, and some work is yet to be done; however, getting to this point was no joke, and I think it's worth discussing. In all, I envision three major topics to cover, each of which is likely going to make for a post or two:

  • Lattices: the analyses I'm reasoning about use an algebraic structure called a lattice. This structure has certain properties that make it amenable to describing degrees of "knowledge" about a program. In lattice-based static program analysis, the various elements of the lattice represent different facts or properties that we know about the program in question; operations on the lattice help us combine these facts and reason about them.

    Interestingly, lattices can be made by combining other lattices in certain ways. We can therefore use simpler lattices as building blocks to create more complex ones, all while preserving the algebraic structure that we need for program analysis.

  • The Fixed-Point Algorithm: to analyze a program, we use information that we already know to compute additional information. For instance, we might use the fact that 1 is positive to compute the fact that 1+1 is positive as well. Using that information, we can determine the sign of (1+1)+1, and so on. In practice, this is often done by calling some kind of "analyze" function over and over, each time getting closer to an accurate characterization of the program's behavior. When the output of "analyze" stops changing, we know we've found as much as we can find, and stop.

    What does it mean for the output to stop changing? Roughly, that's when the following equation holds: knownInfo = analyze(knownInfo). In mathematics, this is known as a fixed point. Crucially, not all functions have fixed points; however, certain types of functions on lattices do. The fixed-point algorithm is a way to compute these points, and we will use this to drive our analyses.

  • Correctness: putting together the work on lattices and the fixed-point algorithm, we can implement a static program analyzer in Agda. However, it's not hard to write an "analyze" function that has a fixed point but produces an incorrect result. Thus, the next step is to prove that the results of our analyzer accurately describe the program in question.

    The interesting aspect of this step is that our program analyzer works on control-flow graphs (CFGs), which are a relatively compiler-centric representation of programs. On the other hand, what the language actually does is defined by its semantics, which is not at all compiler-centric. We need to connect these two, showing that the CFGs we produce "make sense" for our language, and that given CFGs that make sense, our analysis produces results that match the language's execution.

{{< todo >}} Once the posts are ready, link them here to add some kind of navigation. {{< /todo >}}