109 lines
		
	
	
		
			6.6 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			109 lines
		
	
	
		
			6.6 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | ||
| title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 0: Intro"
 | ||
| series: "Static Program Analysis in Agda"
 | ||
| description: "In this post, I give a top-level overview of my work on formally verified static analyses"
 | ||
| date: 2024-07-06T17:37:42-07:00
 | ||
| tags: ["Agda", "Programming Languages"]
 | ||
| ---
 | ||
| 
 | ||
| Some years ago, when the Programming Languages research group at Oregon State
 | ||
| University was discussing what to read, the [_Static Program Analysis_](https://cs.au.dk/~amoeller/spa/)
 | ||
| lecture notes came up. The group didn't end up reading the lecture notes,
 | ||
| but I did. As I was going through them, I noticed that they were quite rigorous:
 | ||
| the first several chapters cover a little bit of [lattice theory](https://en.wikipedia.org/wiki/Lattice_(order)),
 | ||
| and the subsequent analyses -- and the descriptions thereof -- are quite precise.
 | ||
| When I went to implement the algorithms in the textbook, I realized that just
 | ||
| writing them down would not be enough. After all, the textbook also proves
 | ||
| several properties of the lattice-based analyses, which would be lost in
 | ||
| translation if I were to just write C++ or Haskell.
 | ||
| 
 | ||
| At the same time, I noticed that lots of recent papers in programming language
 | ||
| theory were formalizing their results in
 | ||
| [Agda](https://agda.readthedocs.io/en/latest/getting-started/what-is-agda.html).
 | ||
| Having [played]({{< relref "meaningfully-typechecking-a-language-in-idris" >}})
 | ||
| [with]({{< relref "advent-of-code-in-coq" >}}) [dependent]({{< relref "coq_dawn_eval" >}})
 | ||
| [types]({{< relref "coq_palindrome" >}}) before, I was excited to try it out.
 | ||
| Thus began my journey to formalize (the first few chapters of) _Static Program Analysis_
 | ||
| in Agda.
 | ||
| 
 | ||
| In all, I built a framework for static analyses, based on a tool
 | ||
| called _motone functions_. This framework can be used to implement and
 | ||
| reason about many different analyses (currently only a certain class called
 | ||
| _forward analyses_, but that's not hard limitation). Recently, I've proven
 | ||
| the _correctness_ of the algorithms my framework produces. Having reached
 | ||
| this milestone, I'd like to pause and talk about what I've done.
 | ||
| 
 | ||
| In subsequent posts in this series, will describe what I have so far.
 | ||
| It's not perfect, and some work is yet to be done; however, getting to
 | ||
| this point was no joke, and I think it's worth discussing. In all,
 | ||
| I'd like to cover the following major topics, spending a couple of posts on each:
 | ||
| 
 | ||
| * __Lattices__: the analyses I'm reasoning about use an algebraic structure
 | ||
|   called a _lattice_. This structure has certain properties that make it
 | ||
|   amenable to describing degrees of "knowledge" about a program. In
 | ||
|   lattice-based static program analysis, the various elements of the
 | ||
|   lattice represent different facts or properties that we know about the
 | ||
|   program in question; operations on the lattice help us combine these facts
 | ||
|   and reason about them. I write about this in {{< draftlink "Part 1: Lattices" "01_spa_agda_lattices" >}}.
 | ||
| 
 | ||
|   Interestingly, lattices can be made by combining other lattices in certain
 | ||
|   ways. We can therefore use simpler lattices as building blocks to create
 | ||
|   more complex ones, all while preserving the algebraic structure that
 | ||
|   we need for program analysis. I write about this in
 | ||
|   {{< draftlink "Part 2: Combining Lattices" "02_spa_agda_combining_lattices" >}}.
 | ||
| 
 | ||
| * __The Fixed-Point Algorithm__: to analyze a program, we use information
 | ||
|   that we already know to compute additional information. For instance,
 | ||
|   we might use the fact that `1` is positive to compute the fact that
 | ||
|   `1+1` is positive as well. Using that information, we can determine the
 | ||
|   sign of `(1+1)+1`, and so on. In practice, this is often done by calling
 | ||
|   some kind of "analyze" function over and over, each time getting closer to an
 | ||
|   accurate characterization of the program's behavior. When the output of "analyze"
 | ||
|   stops changing, we know we've found as much as we can find, and stop.
 | ||
| 
 | ||
|   What does it mean for the output to stop changing? Roughly, that's when
 | ||
|   the following equation holds: `knownInfo = analyze(knownInfo)`. In mathematics,
 | ||
|   this is known as a [fixed point](https://mathworld.wolfram.com/FixedPoint.html).
 | ||
|   To enable computing fixed points, we focus on a specific kind of lattices:
 | ||
|   those with a _fixed height_. I talk about what this means in
 | ||
|   {{< draftlink "Part 3: Lattices of Finite Height" "03_spa_agda_fixed_height" >}}.
 | ||
| 
 | ||
|   Even if we restrict our attention to lattices of fixed height,
 | ||
|   not all functions have fixed points; however, certain types of functions on
 | ||
|   lattices always do. The _fixed-point algorithm_ is a way to compute these
 | ||
|   points, and we will use this to drive our analyses. I talk
 | ||
|   about this in {{< draftlink "Part 4: The Fixed-Point Algorithm" "04_spa_agda_fixedpoint" >}}.
 | ||
| 
 | ||
| * __Correctness__: putting together the work on lattices and the fixed-point
 | ||
|   algorithm, we can implement a static program analyzer in Agda. However,
 | ||
|   it's not hard to write an "analyze" function that has a fixed point but
 | ||
|   produces an incorrect result. Thus, the next step is to prove that the results
 | ||
|   of our analyzer accurately describe the program in question.
 | ||
| 
 | ||
|   The interesting aspect of this step is that our program analyzer works
 | ||
|   on [control-flow graphs](https://en.wikipedia.org/wiki/Control-flow_graph) (CFGs),
 | ||
|   which are a relatively compiler-centric representation of programs. On the
 | ||
|   other hand, what the language _actually does_ is defined by its
 | ||
|   [semantics](https://en.wikipedia.org/wiki/Semantics_(computer_science)),
 | ||
|   which is not at all compiler-centric. We need to connect these two, showing
 | ||
|   that the CFGs we produce "make sense" for our language, and that given
 | ||
|   CFGs that make sense, our analysis produces results that match the language's
 | ||
|   execution. To do so, I write about the language and its semantics
 | ||
|   in {{< draftlink "Part 5: Our Programming Language" "05_spa_agda_semantics" >}},
 | ||
|   then about building control flow graphs for the language in
 | ||
|   {{< draftlink "Part 6: Control Flow Graphs" "06_spa_agda_cfg" >}}.
 | ||
|   I then write about combining these two representations in
 | ||
|   {{< draftlink "Part 7: Connecting Semantics and Control Flow Graphs" "07_spa_agda_semantics_and_cfg" >}}.
 | ||
| 
 | ||
| 
 | ||
| ### Navigation
 | ||
| Here are the posts that I’ve written so far for this series:
 | ||
| 
 | ||
| * {{< draftlink "Lattices" "01_spa_agda_lattices" >}}
 | ||
| * {{< draftlink "Combining Lattices" "02_spa_agda_combining_lattices" >}}
 | ||
| * {{< draftlink "Lattices of Finite Height" "03_spa_agda_fixed_height" >}}
 | ||
| * {{< draftlink "The Fixed-Point Algorithm" "04_spa_agda_fixedpoint" >}}
 | ||
| * {{< draftlink "Our Programming Language" "05_spa_agda_semantics" >}}
 | ||
| * {{< draftlink "Control Flow Graphs" "06_spa_agda_cfg" >}}.
 | ||
| * {{< draftlink "Connecting Semantics and Control Flow Graphs" "07_spa_agda_semantics_and_cfg" >}}.
 |