From 5873c1ca96f8ee5b10d153f7124938d88b4e21b0 Mon Sep 17 00:00:00 2001 From: Danila Fedorin Date: Sun, 12 May 2024 13:49:21 -0700 Subject: [PATCH] Write a high-level introduction for the SPA series Signed-off-by: Danila Fedorin --- content/blog/00_spa_agda.md | 73 +++++++++++++++++++++++++++++++++---- 1 file changed, 65 insertions(+), 8 deletions(-) diff --git a/content/blog/00_spa_agda.md b/content/blog/00_spa_agda.md index bbfc27d..b1205d5 100644 --- a/content/blog/00_spa_agda.md +++ b/content/blog/00_spa_agda.md @@ -1,13 +1,13 @@ --- -title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 1: Lattices" +title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 0: Intro" series: "Static Program Analysis in Agda" -date: 2024-03-11T14:23:03-07:00 +date: 2024-04-12T14:23:03-07:00 draft: true --- Some years ago, when the Programming Languages research group at Oregon State University was discussing what to read, the [_Static Program Analysis_](https://cs.au.dk/~amoeller/spa/) -lecture notes came up. The group didn't end up reading the lecture notes, +lecture notes came up. The group didn't end up reading the lecture notes, but I did. As I was going through them, I noticed that they were quite rigorous: the first several chapters cover a little bit of [lattice theory](https://en.wikipedia.org/wiki/Lattice_(order)), and the subsequent analyses -- and the descriptions thereof -- are quite precise. @@ -22,13 +22,70 @@ theory were formalizing their results in Having [played]({{< relref "meaningfully-typechecking-a-language-in-idris" >}}) [with]({{< relref "advent-of-code-in-coq" >}}) [dependent]({{< relref "coq_dawn_eval" >}}) [types]({{< relref "coq_palindrome" >}}) before, I was excited to try it out. -Thus began my journey to formalize the (first few chapters of) _Static Program Analysis_ +Thus began my journey to formalize (the first few chapters of) _Static Program Analysis_ in Agda. -In this post (and subsequent follow-ups), I hope to describe what I have so far. -Although not everything works, formalizing fixed-point algorithms (I'll get back -to this) and getting even the most straightforward analysis working was no -joke, and I think is worthy of discussion. So, here goes. +In all, I built a framework for static analyses, based on a tool +called _motone functions_. This framework can be used to implement and +reason about many different analyses (currently only a certain class called +_forward analyses_, but that's not hard limitation). Recently, I've proven +the _correctness_ of the algorithms my framework produces. Having reached +this milestone, I'd like to pause and talk about what I've done. + +In subsequent posts in this series, will describe what I have so far. +It's not perfect, and some work is yet to be done; however, getting to +this point was no joke, and I think it's worth discussing. In all, +I envision three major topics to cover, each of which is likely going to make +for a post or two: + +* __Lattices__: the analyses I'm reasoning about use an algebraic structure + called a _lattice_. This structure has certain properties that make it + amenable to describing degrees of "knowledge" about a program. In + lattice-based static program analysis, the various elements of the + lattice represent different facts or properties that we know about the + program in question; operations on the lattice help us combine these facts + and reason about them. + + Interestingly, lattices can be made by combining other lattices in certain + ways. We can therefore use simpler lattices as building blocks to create + more complex ones, all while preserving the algebraic structure that + we need for program analysis. + +* __The Fixed-Point Algorithm__: to analyze a program, we use information + that we already know to compute additional information. For instance, + we might use the fact that `1` is positive to compute the fact that + `1+1` is positive as well. Using that information, we can determine the + sign of `(1+1)+1`, and so on. In practice, this is often done by calling + some kind of "analyze" function over and over, each time getting closer to an + accurate characterization of the program's behavior. When the output of "analyze" + stops changing, we know we've found as much as we can find, and stop. + + What does it mean for the output to stop changing? Roughly, that's when + the following equation holds: `knownInfo = analyze(knownInfo)`. In mathematics, + this is known as a [fixed point](https://mathworld.wolfram.com/FixedPoint.html). + Crucially, not all functions have fixed points; however, certain types of + functions on lattices do. The fixed-point algorithm is a way to compute these + points, and we will use this to drive our analyses. + +* __Correctness__: putting together the work on lattices and the fixed-point + algorithm, we can implement a static program analyzer in Agda. However, + it's not hard to write an "analyze" function that has a fixed point but + produces an incorrect result. Thus, the next step is to prove that the results + of our analyzer accurately describe the program in question. + + The interesting aspect of this step is that our program analyzer works + on [control-flow graphs](https://en.wikipedia.org/wiki/Control-flow_graph) (CFGs), + which are a relatively compiler-centric representation of programs. On the + other hand, what the language _actually does_ is defined by its + [semantics](https://en.wikipedia.org/wiki/Semantics_(computer_science)), + which is not at all compiler-centric. We need to connect these two, showing + that the CFGs we produce "make sense" for our language, and that given + CFGs that make sense, our analysis produces results that match the language's + execution. + +{{< todo >}} +Once the posts are ready, link them here to add some kind of navigation. +{{< /todo >}} ### Monotone Frameworks