444 lines
20 KiB
Markdown
444 lines
20 KiB
Markdown
---
|
|
title: "Proving My Compiler Code Incorrect With Alloy"
|
|
date: 2023-05-02T22:48:52-07:00
|
|
tags: ["Compilers", "Alloy"]
|
|
draft: true
|
|
---
|
|
|
|
{{< todo >}}
|
|
Intro section and disclaimer
|
|
{{< /todo >}}
|
|
|
|
I work as a compiler developer on the [Chapel](https://chapel-lang.org) team.
|
|
|
|
### The Problem at Hand
|
|
One of the things that a language like Chapel has to do is called _resolution_,
|
|
which is the process of figuring out what each identifier, like `x`, refers to,
|
|
and what its type is. Even the first part of that is pretty complicated, what
|
|
with public and private variables, methods (which can be decalred outside
|
|
of their receiver type in Chapel), and more...
|
|
|
|
Scope resolution in Chapel is further complicated by the fact that the same
|
|
scope might need to be searched multiple times, in different contexts. Let
|
|
me start with a few examples to illustrate what I mean. Here's the first
|
|
program:
|
|
|
|
```Chapel {linenos=true}
|
|
module M {
|
|
class C {}
|
|
|
|
// A regular procedure (not a method)
|
|
proc foo() {}
|
|
|
|
// A method on C.
|
|
proc C.foo() {}
|
|
|
|
// Another method on C.
|
|
proc C.doSomething() {
|
|
foo();
|
|
}
|
|
}
|
|
```
|
|
|
|
If you don't know Chapel (and you probably don't!) this program already merits a fair
|
|
bit of explanation. A _module_ in Chapel (declared via a `module` keyword)
|
|
is just a collection of definitions. Such definitions could include variables,
|
|
methods, classes and more. Putting them in a module helps group them.
|
|
|
|
{{< todo >}}
|
|
Write the rest of this explanation.
|
|
{{< /todo >}}
|
|
|
|
The interesting part of the snippet is the body of the `doSomething` method.
|
|
It has a call to `foo`: but which `foo` is it referring to? There are two:
|
|
the regular procedure (non-method) `foo`, declared on line 5, and the
|
|
method `C.foo` declared on line 8. In Chapel, the rules dictate that when
|
|
such a situation arises, and a fitting method is found, the method is
|
|
preferred to the non-method. In the rewritten version of the Chapel compiler,
|
|
titled Dyno, this disambiguation is achieved by first searching the scopes
|
|
visible from the class `C` for methods only. In this particular example,
|
|
the two scopes searched will be:
|
|
|
|
1. The inside of class `C`. The class itself doesn't define any methods, so
|
|
nothing is found.
|
|
2. The module in which `C` is defined (`M` in this case). This module does
|
|
have a method, the one on line 8, so that one is returned.
|
|
|
|
Only if methods are not found are non-methods considered. In this situation,
|
|
the search order will be as follows:
|
|
|
|
1. The inside of `C.doSomething` will be searched. `doSomething` doesn't declare
|
|
anything, so the search will come up empty.
|
|
2. The module in which `C.doSomething` is defined (`M` again) will be searched. This time,
|
|
both methods and non-methods will be considered. Since we're considering
|
|
a hypothetical situation in which the method `C.foo` isn't there (otherwise
|
|
it would've been found earlier), the only thing that will be found will
|
|
be the non-method `foo`.
|
|
|
|
Notice that we've already had to search the module `M` twice, looking for
|
|
different things each time. First, we were looking for only method, but
|
|
later, we were looking for anything. However, this isn't as complicated as
|
|
things can get. The simplifying aspect of this program is that both `doSomething`
|
|
and `C` are defined inside the class `C`, and therefore have access to its
|
|
private methods and procedures. If we extracted `C.doSomething` into its
|
|
own separate module, the program would look like this.
|
|
|
|
```Chapel {linenos=true}
|
|
module M1 {
|
|
class C {}
|
|
|
|
// A regular procedure (not a method)
|
|
proc foo() {}
|
|
|
|
// A method on C.
|
|
proc C.foo() {}
|
|
}
|
|
module M2 {
|
|
use super.M1;
|
|
|
|
// Another method on C.
|
|
proc C.doSomething() {
|
|
foo();
|
|
}
|
|
}
|
|
```
|
|
|
|
Since `doSomething` is now in another module, it can't just access the `foo`s from
|
|
`M1` willy-nilly. There are a few ways to get the things that were declared in another module out
|
|
and make use of them. I opted for a `use` statement, which, in its simplest form,
|
|
just brings all the declarations inside the `use`d module into the current scope. Thus,
|
|
the `use` statement on line 11 would bring all things declared in `M1` into
|
|
the scope inside `M2`. There's a catch, though: since `M2` is not declared
|
|
inside `M1`, a `use` statement will not be able to bring in _private_ symbols
|
|
from `M1` (they're private for a reason!). So, this time, when searching the scope
|
|
for `M1`, we will have to search only for public symbols. That's another,
|
|
different way of searching `M1`. So far, we've seen three:
|
|
|
|
* Search `M1` for any symbol.
|
|
* Search `M1` for methods only.
|
|
* Search `M1` for public symbols only.
|
|
|
|
In Dyno, there are even more different ways of searching a single scope, and
|
|
some of them are mixes of others (one might consider, for instance, searching
|
|
for only public methods). To represent the various search configurations,
|
|
the Dyno team came up with using a bitfield of _flags_, each of which indicated
|
|
a necessary condition for a symbol to be returned. A bitfield with flags set
|
|
for two properties (like "public" and "method") requires that both such
|
|
properties be found on each symbol that's returned from a scope. This led to
|
|
C++ code along the lines of:
|
|
|
|
```C++
|
|
auto allPublicSymbols = Flags::PUBLIC;
|
|
auto allPublicMethods = Flags::PUBLIC | Flags::METHOD;
|
|
```
|
|
|
|
It also turned out convenient to add negative versions of each flag
|
|
(`NOT_PUBLIC` for private symbols, `NOT_METHOD` for regular old procedures
|
|
and other definitions, and so on. So, some other possible flag combinations
|
|
include:
|
|
|
|
```C++
|
|
auto allNonMethods = Flags::NOT_METHOD;
|
|
auto privateMethods = Flags::NOT_PUBLIC | Flags::METHOD;
|
|
```
|
|
|
|
Given these flags, there are some situations in which checking a scope a
|
|
second time is redundant, in that it is guaranteed to find no additional
|
|
symbols. For instance, if you search a scope for all public symbols, and
|
|
then subsequently search for all public methods, you will only find
|
|
duplicates -- after all, all public methods are public symbols. Most
|
|
generally, this occurs when a second search has all the flags from a
|
|
previous search, and maybe more. In math lingo, if the set of flags checked
|
|
the first time is a subset of the set of flags checked the second time,
|
|
it's guaranteed not to find anything new.
|
|
|
|
In Dyno, we like to avoid additional work when we can. To do so, we track
|
|
which scopes have already been searched, and avoid searching them again.
|
|
Since what comes up from a search depends on the flags, we store the flags
|
|
alongside the scopes we've checked. __If we find that the previously-checked
|
|
bitfield is a subset of the current biset, we just skip the search__.
|
|
|
|
But then, what if it _isn't_ a subset? Another concern here is avoiding
|
|
duplicate results (it's easier to check for duplicate definitions if you
|
|
know a symbol is only returned from a search once). So, another feature of
|
|
Dyno's scope search is an additional bitfield of what to _exclude_, which we
|
|
set to be the previous search's filter. So if the first search looked for
|
|
symbols matching description \\(A\\), and the second search is supposed to
|
|
look for symbols matching description \\(B\\), __then really we do a search
|
|
for \\(A \\land \\lnot B\\) (that is, \\(A\\) and not \\(B\\))__.
|
|
|
|
{{< dialog >}}
|
|
{{< message "question" "reader" >}}
|
|
Hold on, why do you need a whole another bitfield? There are already negated
|
|
versions of each flag available. Can't you just add those to the filter?
|
|
{{< /message >}}
|
|
{{< message "answer" "Daniel" >}}
|
|
Good question. The difference is a little bit tricky. If we just negated
|
|
each flag, we'd turn an expression like \(A \land B\) into \(\lnot A \land
|
|
\lnot B\). However, according to De Morgan's laws, the proper negation of
|
|
\(A \land B\) is \(\lnot A \lor \lnot B\) (notice the use of "or" instead
|
|
of "and"). On the other hand, using an "exclude" bitfield negates the whole
|
|
conjunction, rather than the individual flags, and so gives us the result
|
|
we need.
|
|
{{< /message >}}
|
|
{{< /dialog >}}
|
|
|
|
One last thing: what happens if there were two previous searches? What we
|
|
need is to to somehow combine the two filters into one. Taking a cue from
|
|
a previous example, in which "public" was followed by "public methods", we
|
|
can observe that since the second search has additional flags, it's more
|
|
restrictive, and thus guaranteed to not find anything. __So we try to create
|
|
the least restrictive bitfield possible, by taking an intersection of the
|
|
flags used.__
|
|
|
|
Actually, that last point is not quite correct in every possible case
|
|
(taking the intersection is not always the right thing to do).
|
|
However, running the code through our test suite, we did not notice any
|
|
cases in which it misbehaved. So, noting the potential issue in a comment,
|
|
we moved on to other things.
|
|
|
|
That is, until I decided that it was time to add another possible flag to
|
|
the bitfield. At that point, sitting and trying to reason about the
|
|
possible cases, I realized that it would be much nicer to describe this
|
|
mathematically, and have a model checker generate outlandish scenarios for
|
|
me. Having at some point seen [Hillel Wayne's
|
|
post](https://www.hillelwayne.com/post/alloy6/) about the release of
|
|
[Alloy 6](https://alloytools.org/), I thought I'd give it a go. I'd never
|
|
touched alloy before this, so be warned: this is what I came up with on my
|
|
own attempt.
|
|
|
|
### Modeling Flags and Bitsets in Alloy
|
|
Flags are represented on the C++ side as an `enum` (with custom indexing so
|
|
as to make each flag be exactly one bit). I checked, and it looked like
|
|
Alloy had an `enum` feature, too! I started off by making an enum of the
|
|
flags I wanted to play with.
|
|
|
|
{{< codelines "Alloy" "dyno-alloy/DynoAlloy.als" 1 1 >}}
|
|
|
|
We haven't seen the `MethodOrField` flag, but it's an important one. It
|
|
turns out that it's much more common to look for anything that could be
|
|
part of a class, rather than just its methods. This flag is itself an "or"
|
|
of two properties (something being a method and something being a class
|
|
field). Note that this is not the same as having two flags, `Method` and
|
|
`Field`, and always including them together (because that would be an
|
|
"and", not an "or").
|
|
|
|
Notice also that the list of flags doesn't include the negative versions.
|
|
Since the negative versions are one-for-one with the positive ones, I
|
|
instead chose to represent bitfields as simply two sets: one set of
|
|
"positive" flags, in which the presence of e.g. `Method` indicates that the
|
|
`METHOD` flag was set, and one set of "negative" flags, in which the
|
|
presence of `Method` indicates that `NOT_METHOD` was set. This way, I'm
|
|
guaranteed that there's a positive and negative version of each flag,
|
|
automatically. Here's how I wrote that in Alloy.
|
|
|
|
{{< codelines "Alloy" "dyno-alloy/DynoAlloy.als" 6 9 >}}
|
|
|
|
This definition (a _signature_ in Alloy terms) specifies what a bitfield is
|
|
like, but not any operations on it. My next order of business is to define
|
|
some common functionality on bitfields. Alloy is all about
|
|
[relations](https://en.wikipedia.org/wiki/Relation_(mathematics)) and
|
|
[predicates](https://en.wikipedia.org/wiki/Predicate_(mathematical_logic)),
|
|
so for all of these, I had to effectively write something that _checks_ if
|
|
some condition holds for some arguments. This might seem abstract; as an
|
|
example, here's `bitfieldEmpty`, which checks that a bitfield has no flags
|
|
set.
|
|
|
|
{{< codelines "Alloy" "dyno-alloy/DynoAlloy.als" 27 29 >}}
|
|
|
|
The `#` operator in Alloy is used to check the size of a set. So, to check
|
|
if a bitfield is empty, I simply check if there are neither positive nor
|
|
negative flags. Probably the most unusual aspect of this piece of code is
|
|
that equality is written as `=`, as opposed to `==` like in most common
|
|
languages. This is because, like I said, alloy is all about relations and
|
|
predicates, and not at all about imperative manipulation of data. So,
|
|
there's no need to reserve `=` for assignment.
|
|
|
|
The next step from here is a predicate that accepts two arguments,
|
|
`bitfieldEqual`. As its name suggests, this predicate accepts two
|
|
bitfields, and makes sure they have exactly the same flags set.
|
|
|
|
{{< codelines "Alloy" "dyno-alloy/DynoAlloy.als" 31 33 >}}
|
|
|
|
So far, this has been pretty similar to just writing boolean functions in a
|
|
language like C++. However, the similarity is only superficial. An easy way
|
|
to see that is to try to determine the _intersection_ of two bitfields --
|
|
that's the operation we will be having to model, since the Dyno
|
|
implementation uses `&` to combine filter sets. In a language like C++, you
|
|
might write a function like the following, in which you accept two bitfield
|
|
arguments and return a new bitfield.
|
|
|
|
```C++
|
|
Bitfield intersection(Bitfield b1, Bitfield b2) { /* ... */ }
|
|
```
|
|
|
|
However, in Alloy, you can't create a new bitfield, nor return something
|
|
from a `pred` that isn't a boolean. Instead, you describe how the inputs
|
|
will be related to the output. So, to model a binary function, you end up
|
|
with a three-parameter predicate: two inputs, and one output. But how
|
|
_does_ the output of a bitfield intersection connect to the two operands
|
|
being intersected? Well, its two flag sets will be intersections of the
|
|
flag sets of the inputs!
|
|
|
|
{{< codelines "Alloy" "dyno-alloy/DynoAlloy.als" 35 38 >}}
|
|
|
|
Next, let's talk about what flags _do_. They are used to include and
|
|
exclude symbols based on certain properties. One property is being a
|
|
method: a `METHOD` flag requires this property, whereas a `NOT_METHOD` flag
|
|
ensures that a symbol does not have it. Another property is being a public
|
|
definition: if a symbol isn't public, it'll be ignored by searches with the
|
|
`PUBLIC` flag set. Just like a bitfield can have multiple flags, a symbol
|
|
can have multiple properties (e.g., a public method). Unlike our bitfields,
|
|
though, we won't be modeling symbols as having both positive and negative
|
|
properties. That is to say, we won't have a "not public" property: the
|
|
absence of the "public" property will be enough to make something private.
|
|
Here's the alloy definition for everything I just said:
|
|
|
|
{{< codelines "Alloy" "dyno-alloy/DynoAlloy.als" 60 64 >}}
|
|
|
|
Now, we can specify how flags in a bitfield relates to properties on a
|
|
symbol. We can do so by saying which flags match which properties. The
|
|
`Method` flag, for instance, will be satisfied by the `PMethod` property.
|
|
The `MethodOrField` flag is more lenient, and will be satisfied by either
|
|
`PMethod` or `PField`. Here's a predicate `flagMatchesProperty` that
|
|
encodes all the flag-property combinations:
|
|
|
|
{{< codelines "Alloy" "dyno-alloy/DynoAlloy.als" 66 70 >}}
|
|
|
|
A bitfield matching a symbol is a little bit more complicated. Said
|
|
informally, the condition for a bitfield matching a symbol is twofold:
|
|
|
|
* Every single positive flag, like `METHOD`, must be satisfied by a
|
|
property on the symbol.
|
|
* None of the negative flags, like `NOT_METHOD`, must be satisfied by a
|
|
property on the symbol (that is to say, if `Method` is in the negative
|
|
flags set, then the symbol must not have `PMethod` property). It is more
|
|
conveniently to formulate this -- equivalently -- as follows: for each
|
|
negative flag, there most not be a property that satisfies it.
|
|
|
|
Each of the above two conditions translate quite literally into Alloy:
|
|
|
|
{{< codelines "Alloy" "dyno-alloy/DynoAlloy.als" 72 75 >}}
|
|
|
|
We can read line 73 as "for each flag in a bitfield's positive flags, there
|
|
must be some property in the symbol that matches it". Similarly, line 74
|
|
can be read out loud as "for each flag in the negative flags, no property
|
|
in the symbol must match it".
|
|
|
|
We've written a fair bit of alloy. If you're anything like me, you might be
|
|
getting a bit twitchy: how do we even check that any of this works? For
|
|
this, we'll need to run our model. We will give Alloy a claim, and ask
|
|
it to find a situation in which that claim holds true. The simplest claim
|
|
is "there exists a bitfield".
|
|
|
|
{{< codelines "Alloy" "dyno-alloy/DynoAlloy.als" 77 79 >}}
|
|
|
|
Executing this model yields a pretty interesting bitfield: one in which every single flag is set -- both the positive and negative versions.
|
|
|
|
{{< figure src="bitfield_exists.png" caption="Alloy's output satisfying \"a bit field exists\"" >}}
|
|
|
|
That's a little bit ridiculous: this bitfield will never match anything!
|
|
You can't be and not be a method at the same time, for instance. For for a
|
|
more interesting example, let's ask for a bitfield that matches some
|
|
symbol.
|
|
|
|
{{< codelines "Alloy" "dyno-alloy/DynoAlloy.als" 81 83 >}}
|
|
|
|
The output here is pretty interesting too. Alloy finds a symbol and a
|
|
bitfield that matches it, but they're both empty. In effect, it said:
|
|
"if you don't specify any filters, any private definition will match". Fair
|
|
enough, of course, but a curious departure from the previous maximalist
|
|
"put in all the flags!" approach.
|
|
|
|
{{< figure src="matching_bitfield_exists.png" caption="Alloy's output satisfying \"a bit field that matches a symbol exists\"" >}}
|
|
|
|
Let's try nudge it towards a more interesting case. I'm going to ask for a
|
|
filter with one positive and one negative flag, and a symbol with two
|
|
properties.
|
|
|
|
{{< codelines "Alloy" "dyno-alloy/DynoAlloy.als" 85 92 >}}
|
|
|
|
The results are more interesting this time: we get a filter for private
|
|
methods, and a private symbol that was... both a field and a method?
|
|
|
|
{{< figure src="matching_bitfield_exists_2.png" caption="Alloy's spiced up output satisfying \"a bit field that matches a symbol exists\"" >}}
|
|
|
|
We never told alloy that a symbol can't be both a field and a method. It
|
|
had no idea what the flags meant, just that they exist.
|
|
To let Alloy know what we do -- that the two properties are incompatible --
|
|
we can use a _fact_. To me, the most natural way of phrasing this is "there
|
|
is never a symbol that has both the method and field properties". Alas,
|
|
Alloy doesn't have a `never` keyword; it only has `always`. So I opt instead
|
|
for an alternative formulation: "there are always zero symbols that are
|
|
both methods and fields". In alloy, the claim looks like this:
|
|
|
|
{{< codelines "Alloy" "dyno-alloy/DynoAlloy.als" 94 99 >}}
|
|
|
|
Re-running the example program with this fact, alloy spits out a filter for
|
|
public non-method symbols, and a symbol that's a public field. Public
|
|
fields also aren't a thing in Chapel (all fields in a class are publicly
|
|
readable in the current version of the language). Perhaps it's time for
|
|
another fact.
|
|
|
|
{{< codelines "Alloy" "dyno-alloy/DynoAlloy.als" 100 104 >}}
|
|
|
|
But now, Alloy fails to come up with anything at all. That makes sense: by
|
|
restricting the search to a symbol with two properties, and making `PField`
|
|
incompatible with the other two possible properties, we've guaranteed that
|
|
our symbol would be a public method. But then, we also required a negative
|
|
flag in the filter; however, all the flags in the list match a public
|
|
method, so making any of them negative would guarantee that our symbol
|
|
would not be found. Let's change the example up a bit to only ask for
|
|
positive flags.
|
|
|
|
{{< codelines "Alloy" "dyno-alloy/DynoAlloy.als" 106 112 >}}
|
|
|
|
This time, alloy gives us a symbol that's a public method, and a filter
|
|
that only looks for public methods. Fair enough.
|
|
|
|
{{< figure src="matching_bitfield_exists_3.png" caption="Alloy's spiced up output satisfying \"a bit field that matches a symbol exists\"" >}}
|
|
|
|
{{< todo >}}
|
|
The rest of the article
|
|
{{< /todo >}}
|
|
|
|
### Scratch Work
|
|
{{< todo >}}
|
|
This section is temporary
|
|
{{< /todo >}}
|
|
|
|
a small-ish program to illustrate what I mean.
|
|
|
|
```Chapel
|
|
module M {
|
|
|
|
}
|
|
```
|
|
|
|
|
|
```Chapel {linenos=true}
|
|
module M1 {
|
|
public use super.M2;
|
|
}
|
|
module M2 {
|
|
private var x = 1;
|
|
|
|
module M3 {
|
|
public use super;
|
|
}
|
|
}
|
|
|
|
use M1;
|
|
use M2.M3;
|
|
writeln(x)
|
|
```
|
|
|
|
|
|
Moreover, a `public use` makes these definitions part
|
|
of the module that does the `use` -- that is, `M1` would now contain the
|
|
definitions from `M2`. However, since `M1` is not defined inside of `M2`, it
|
|
isn't able to access its private variables (like `x` on line 5), so this
|
|
particular use statement leaves `M1` just containing (a reference to) `M3`.
|
|
|
|
The `public use` on line
|