Add my initial slides

Signed-off-by: Danila Fedorin <daniel.fedorin@hpe.com>
This commit is contained in:
Danila Fedorin 2025-10-02 17:52:11 -07:00
commit 59fea61d2d
6 changed files with 1721 additions and 0 deletions

22
alloy/abstract.txt Normal file
View File

@ -0,0 +1,22 @@
Formal methods are a set of techniques that are used to validate the correctness
of software. A particular category of these methods, model checking, uses the
mathematical language of temporal logic to construct specifications of softwares
behavior. A solver can then validate the constraints described in the formal
language and ensure that undesirable states do not occur.
This talk will be an experience report of using formal methods, specifically
the Alloy analyzer, to detect a bug in Chapels Dyno compiler front-end library.
The area in which the bug was discovered is currently used in production, as
well as a part of editor tools such as chplcheck and chpl-language-server.
Specifically, Alloy was used to construct a formal specification of a part of
Chapels use/import lookup algorithm. Chapel has a number of complicated scoping
rules and possible edge cases in this area. By running this specification against
a solver, a sequence of steps was discovered that could cause the algorithm to
malfunction and produce incorrect results. A program that causes these steps to
occur was constructed and served as a concrete reproducer for the bug. This
reproducer was used to adjust the logic and fix the bug.
This talk will cover the fundamentals of temporal logic required for formal
specifications, the necessary parts of Chapels use/import lookup algorithm,
and the steps taken to encode and validate the compilers behavior.

BIN
alloy/bug.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 134 KiB

BIN
alloy/instancefound.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

688
alloy/slides.md Normal file
View File

@ -0,0 +1,688 @@
---
theme: gaia
title: Using Formal Methods to Discover a Bug in the Chapel Compiler
backgroundColor: #fff
---
<!-- _class: lead -->
<style>
section {
font-size: 25px;
}
blockquote {
padding: 15px;
background-color: #f0f0ff;
}
div.side-by-side {
display: flex;
justify-content: space-between;
gap: 20px;
}
div.side-by-side > div {
flex: 1;
}
</style>
# <!--fit--> **Using Formal Methods to Discover a Bug in the Chapel Compiler**
Daniel Fedorin, HPE
---
# Terms
* What are **formal methods**?
* Techniques rooted in computer science and mathematics to specify and verify systems
* What part of the **Chapel compiler**?
* The 'Dyno' compiler front-end, particularly its use/import resolution phase.
* This piece is used by the production Chapel compiler.
---
# The Story
I have a story in three parts.
1. I found it very hard to think through a section of compiler code.
- Specifically, code that performed lookups in `use`s/`import`s
2. I used the [Alloy Analyzer](https://alloytools.org/) to model the assumptions and behavior of the code.
- I had little background in Alloy, but some background in (formal) logic
3. This led me to discover a bug in the compiler that I then fixed.
- Re-creating the bug required some gymanstics that were unlikely to occur in practice.
---
<!-- _class: lead -->
# Background: Chapel's Scope Lookups
---
# The Humble `foo`
Imagine you see the following snippet of Chapel code:
```chapel
foo();
```
Where do you look for `foo`? The answer is quite complicated, and depends
strongly on the context of the call.
Moreover, the order of where to look matters: method calls are preferred
over global functions, "nearer" functions are preferred over "farther" ones.
---
# The Humble `foo` (example 1)
```chapel
module M1 {
record R {
proc foo() { writeln("R.foo"); }
}
proc foo() { writeln("M1.foo"); }
foo(); // which 'foo'?
}
```
Here, things are pretty straightforward: we look in scope `M1`. `R.foo` is not
in it, but `M1.foo` is. We return it.
---
# The Humble `foo` (example 2)
```chapel
module M1 {
record R {}
proc R.foo() { writeln("R.foo"); }
proc foo() { writeln("M1.foo"); }
foo(); // which 'foo'?
}
```
Here, things are bit trickier. Since we are not inside a method, we know
that `foo()` could not be a call to a method. Thus, we rule out `R.foo`,
and find `M1.foo` in `M1`.
---
# The Humble `foo` (example 3)
```chapel
module M1 {
record R {}
proc R.foo() { writeln("R.foo"); }
proc foo() { writeln("M1.foo"); }
proc R.someMethod() {
foo(); // which 'foo'?
}
}
```
* Both `R.foo` and `M1.foo` would be valid candidates.
* We give priority to methods over global functions. So, the compiler would:
* Search `R` and its scope (`M1`) for methods named `foo`.
* If that fails, search `M1` for any symbols `foo`.
* We've had to look at `M1` twice! (once for methods, once for non-methods)
---
# The Humble `foo` (example 4)
```chapel
module M1 {
record R {}
proc foo() { writeln("R.foo"); }
}
module M2 {
use M1;
proc R.someMethod() {
foo(); // which 'foo'?
}
}
```
Here, we search the scope of `R` and `M1`, but **only for public symbols**.
---
# How Chapel's Compiler Handles This
We want to:
- Respect the priority order
- Including prefering methods over non-methods
- As a result, we search the scopes multiple times
- Avoid any extra work
- This includes redundant re-searches
- Example redundant search: looked up "all symbols", then later "all public symbols"
```C++
enum { PUBLIC = 1, NOT_PUBLIC = 2, METHOD_FIELD = 4, NOT_METHOD_FIELD = 8, /* ... */ };
```
1. For each scope, save the flags we've already searched with
2. When searching a scope again, exclude the flags we've already searched with
This was handled by two bitfields: `filter` and `excludeFilter`.
---
# Populating `filter` and `excludeFilter`
```C++
if (skipPrivateVisibilities) { // depends on context of 'foo()'
filter |= IdAndFlags::PUBLIC;
}
if (onlyMethodsFields) {
filter |= IdAndFlags::METHOD_FIELD;
} else if (!includeMethods && receiverScopes.empty()) {
filter |= IdAndFlags::NOT_METHOD;
}
```
```C++
excludeFilter = previousFilter;
```
```C++
// scary!
previousFilter = filter & previousFilter;
```
Code notes `previousFilter` is an approximation.
---
# Possible Problems
* `previousFilter` is an approximation.
* For `previousFilter = PUBLIC` and `filter = METHOD_FIELD`, we get
`previousFilter = 0`, indicating no more searches should be donne.
* But we're missing private non-methods!
* However, no case we knew of hit this combination of searches, or any like it.
* All of our language tests passed.
* Code seemed to work.
* If only there was a way I could get a computer to check whether such a combination
could occur...
---
<!-- _class: lead -->
# Formal Methods
---
# Types of Formal Methods
- Model checking involves formally describing the behavior of a system, then having a solver check whether other desired properties hold.
- Alloy is an example of a model checker.
- TLA is another famous example.
- Theorem proving is a heavier weight approach that involves building a formal proof of correctness.
- Coq and Isabelle are examples of theorem provers.
---
<style scoped>
li:nth-child(2) { color: lightgrey; }
</style>
# Types of Formal Methods
- Model checking involves formally describing the behavior of a system, then having a solver check whether other desired properties hold.
- Alloy is an example of a model checker.
- TLA is another famous example.
- Theorem proving is a heavier weight approach that involves building a formal proof of correctness.
- Coq and Isabelle are examples of theorem provers.
**Reason**: I was in the middle of developing compiler code. I wanted to sketch
the assumptions I was making and see if they held up.
---
# A Primer on Logic
Model checkers like Alloy are rooted in temporal logic, which builds on
first-order logic. This includes:
- Variables (e.g. $x$, $y$)
- These represent any objects in the logical system.
- Predicates (e.g. $P(x)$, $Q(x,y)$)
- These represent properties of objects, or relationships between objects.
- Logical connectives (e.g. $\land$, $\lor$, $\neg$, $\Rightarrow$)
- These are used to combine predicates into more complex statements.
- $\land$ is "and", $\lor$ is "or", $\neg$ is "not", $\Rightarrow$ is "implies".
- Quantifiers (e.g. $\forall$, $\exists$)
- $\forall x. P(x)$ (in Alloy: `all x { P(x) }`) means "for all $x$, $P(x)$ is true".
- $\exists x. P(x)$ (in Alloy: `some x { P(x)}`) means "there exists an $x$ such that $P(x)$ is true".
---
# A Primer on Logic (example)
Example statement: "Bob has a son who likes all compilers".
$$
\exists x. (\text{Son}(x, \text{Bob}) \land \forall y. (\text{Compiler}(y) \Rightarrow \text{Likes}(x, y)))
$$
In Alloy:
```alloy
some x { Son[x, Bob] and all y { Compiler[y] implies Likes[x, y] } }
```
---
# A Primer on Temporal Logic
Temporal logic provides additional operators to reason about how properties change over time.
- $\Box p$ (in Alloy: `always p`): A statement that is always true.
- $\Diamond p$ (in Alloy: `eventually p`) : A statement that will be true at some point in the future.
In Alloy specifically, we can mention the next state of a variable using `'`.
```alloy
// pseudocode:
// the next future value of previousFilter will be the intersection of filter
// and the current value
previousFilter' = filter & previousFilter;
```
---
# A Primer on Temporal Logic (example)
Some examples:
$$
\Box(\text{like charges repel})
$$
$$
\Diamond(\text{the sun is in the sky})
$$
In Alloy:
```alloy
// likeChargesRepel and theSunIsInTheSky are predicates defined elsewhere
always likeChargesRepel
// good thing it's not `always`
eventually theSunIsInTheSky
```
---
# Search Configuration in Alloy
Instead of duplcating `METHOD` and `NOT_METHOD`, use two sets of flags (the regular and the "not").
```alloy
enum Flag {Method, MethodOrField, Public}
sig Bitfield {
, positiveFlags: set Flag
, negativeFlags: set Flag
}
sig FilterState { , curFilter: Bitfield }
```
__Takeaway__: We represent the search flags as a `Bitfield`, which encodes `PUBLIC`, `NOT_PUBLIC`, etc.
---
# Constructing Bitfields
Alloy doesn't allow us to define functions that somehow combine bitfields. We might want to write:
```chapel
proc addFlag(b: Bitfield, flag: Flag): Bitfield {
return Bitfield(b.positiveFlags + flag, b.negativeFlags);
}
```
Instead, we can _relate_ two bitfields using a predicate.
> This bitfield is like that bitfield, but with this flag added.
```chapel
pred addBitfieldFlag[b1: Bitfield, b2: Bitfield, flag: Flag] {
b2.positiveFlags = b1.positiveFlags + flag
b2.negativeFlags = b1.negativeFlags
}
```
---
# Constructing Bitfields
Alloy doesn't allow us to define functions that somehow combine bitfields. We might want to write:
```chapel
proc addFlag(b: Bitfield, flag: Flag): Bitfield {
return Bitfield(b.positiveFlags + flag, b.negativeFlags);
}
```
Instead, we can _relate_ two bitfields using a predicate.
> This bitfield is exactly like that bitfield.
```chapel
pred bitfieldEqual[b1: Bitfield, b2: Bitfield] {
b1.positiveFlags = b2.positiveFlags and b1.negativeFlags = b2.negativeFlags
}
```
---
# Modeling Possible Searches
Alloy isn't an imperative language. We can't mutate variables like we do in C++. Instead, we model how each statement changes the state, by relating the "current" state to the "next" state.
<div class="side-by-side">
<div>
```C++
filter |= IdAndFlags::PUBLIC;
```
</div>
<div>
```alloy
addBitfieldFlag[filterNow, filterNext, Public]
```
</div>
</div>
This might remind you of [Hoare Logic](https://en.wikipedia.org/wiki/Hoare_logic), where statements like:
$$
\{ P \} \; s \; \{ Q \}
$$
Read as:
> If $P$ is true before executing $s$, then $Q$ will be true after executing $s$.
$$
\{ \text{filter} = \text{filterNow} \} \; \texttt{filter |= PUBLIC} \; \{ \text{filter} = \text{filterNext} \}
$$
---
# Modeling Possible Searches
To combine several statements, we make it so that the "next" state of one statement is the "current" state of the next statement.
<div class="side-by-side">
<div>
```C++
curFilter |= IdAndFlags::PUBLIC;
curFilter |= IdAndFlags::METHOD_FIELD;
```
</div>
<div>
```alloy
addBitfieldFlag[filterNow, filterNext1, Public]
addBitfieldFlag[filterNext1, filterNext2, Method]
```
</div>
</div>
This is reminiscent of sequencing hoare triples:
$$
\{ P \} \; s_1 \; \{ Q \} \; s_2 \; \{ R \}
$$
---
# Modeling Possible Searches
Finally, if C++ code has conditionals, we need to allow for the possibility of either branch being taken. We do this by using "or" on descriptions of the next state.
<div class="side-by-side">
<div>
```C++
if (skipPrivateVisibilities) {
curFilter |= IdAndFlags::PUBLIC;
}
if (onlyMethodsFields) {
curFilter |= IdAndFlags::METHOD_FIELD;
} else if (!includeMethods && receiverScopes.empty()) {
curFilter |= IdAndFlags::NOT_METHOD;
}
```
</div>
<div>
```alloy
addBitfieldFlag[initialState.curFilter, bitfieldMiddle, Public] or
bitfieldEqual[initialState.curFilter, bitfieldMiddle]
// If it's a method receiver, add method or field restriction
addBitfieldFlag[bitfieldMiddle, filterState.curFilter, MethodOrField] or
// if it's not a receiver, filter to non-methods (could be overridden)
addBitfieldFlagNeg[bitfieldMiddle, filterState.curFilter, Method] or
// Maybe methods are not being curFilterd but it's not a receiver, so no change.
bitfieldEqual[bitfieldMiddle, filterState.curFilter]
```
</div>
</div>
Putting this into a predicate, `possibleState`, we encode what searches the compiler can undertake.
**Takeaway**: We encoded the logic that configures possible searches in Alloy. This instructs the analyzer about possible cases to consider.
---
# Modeling `previousFilter`
So far, all we've done is encoded what queries the compiler might make about a scope.
We still need to encode how we save the flags we've already searched with.
Model the search state with a "global" (really, unique) variable:
```alloy
/* Initially, no search has happeneed for a scope, so its 'previousFilter' is not set to anything. */
one sig NotSet {}
one sig SearchState {
, var previousFilter: Bitfield + NotSet
}
```
Above, `+` is used for union. `previousFilter` can either be a `Bitfield` or `NotSet`.
---
# Modeling `previousFilter`
If no previous search has happened, we set `previousFilter` to the current `filter`.
Otherwise, we set `previousFilter` to the intersection of `filter` and `previousFilter`, as mentioned before.
<div class="side-by-side">
<div>
```C++
if (hasPrevious) {
previousFilter = filter & previousFilter;
} else {
previousFilter = filter;
}
```
</div>
<div>
```alloy
pred update[toSet: Bitfield + NotSet, setTo: FilterState] {
toSet' != NotSet and bitfieldIntersection[toSet, setTo.include, toSet']
}
pred updateOrSet[toSet: Bitfield + NotSet, setTo: FilterState] {
(toSet = NotSet and toSet' = setTo.include) or
(toSet != NotSet and update[toSet, setTo])
}
```
</div>
</div>
---
# Putting it Together
We now have a model of what our C++ program is doing: it computes some set of filter flags, then runs a search, excluding the previous flags. It then updates the previous flags with the current search. We can encode
this as follows:
```
fact step {
always {
// Model that a new doLookupInScope could've occurred, with any combination of flags.
all searchState: SearchState {
some fs: FilterState {
// This is a possible combination of lookup flags
possibleState[fs]
// If a search has been performed before, take the intersection; otherwise,
// just insert the current filter flags.
updateOrSet[searchState.previousFilter, fs]
}
}
}
}
```
---
# Are there any bugs?
Model checkers ensure that all properties we want to hold, do hold. To find a counter example, we ask it to prove the negation of what we want.
```C
wontFindNeeded: run {
all searchState: SearchState {
eventually some props: Props, fs: FilterState, fsBroken: FilterState {
// Some search (fs) will cause a transition / modification of the search state...
configureState[fs]
updateOrSet[searchState.previousFilter, fs]
// Such that a later, valid search... (fsBroken)
configureState[fsBroken]
// Will allow for a set of properties...
// ... that are left out of the original search...
not bitfieldMatchesProperties[searchState.previousFilter, props]
// ... and out of the current search
not (bitfieldMatchesProperties[fs.include, props] and not bitfieldMatchesProperties[searchState.previousFilter, props])
// But would be matched by the broken search...
bitfieldMatchesProperties[fsBroken.include, props]
// ... to not be matched by a search with the new state:
not (bitfieldMatchesProperties[fsBroken.include, props] and not bitfieldOrNotSetMatchesProperties[searchState.previousFilter', props])
}
}
}
```
---
# Uh-Oh!
![](./instancefound.png)
---
# The Bug
![bg left width:600px](./bug.png)
We need some gymnastics to figure out what varibles make this model possible.
Alloy has a nice visualizer, but it has a lot of information.
In the interest of time, I found:
* If the compiler searches a scope firt for `PUBLIC` symbols, ...
* ...then for `METHOD_OR_FIELD`, ...
* ...then for any symbols, they will miss things!
---
<style scoped>
section {
font-size: 20px;
}
</style>
# The Reproducer
To trigger this sequence of searches, we needed a lot more gymnastics.
<div class="side-by-side">
<div>
```chapel
module TopLevel {
module XContainerUser {
public use TopLevel.XContainer;
}
module XContainer {
private var x: int;
record R {}
module MethodHaver {
use TopLevel.XContainerUser;
use TopLevel.XContainer;
proc R.foo() {
var y = x;
}
}
}
}
```
</div>
<div>
* the scope of `R` is searched with for methods
* The scope of `R`s parent (`XContainer`) is searched for methods
* The scope of `XContainerUser` is searched for public symbols (via the `use`)
* The scope of `XContainer` is searched with public symbols (via the `public use`)
* The scope of `XContainer` searched for with no filters via the second use; but the stored filter is bad, so the lookup returns early, not finding `x`.
</div>
</div>

32
type-level/abstract.txt Normal file
View File

@ -0,0 +1,32 @@
Chapels type system can be surprisingly powerful. In addition to “usual”
features such as generics and polymorphism, Chapel provides the ability to
manipulate types using functions; this involves both taking types as arguments
to functions and returning them from these functions. This can enable powerful
programming techniques that are typically confined to the domain of
metaprogramming.
For example, although Chapels notion of compile-time values — params — is
limited to primitive types such as integers, booleans, and strings, one can
encode compile-time lists of these values as types. Such encodings can be used
to create compile-time specializations of functions that would be otherwise
tedious to write by hand. One use case for such specializations is the
implementation of a family of functions for approximating differential
equations, the Adams-Bashforth methods. Some variants of these methods can be
encoded as lists of coefficients. Thus, it becomes possible to define a single
function that accepts a type-level list of coefficients and produces a
“stamped out” implementation of the corresponding method. This reduces the
need to implement each method explicitly by hand. Another use case of function
specialization is a type-safe printf function that validates that users
format specifiers match the type of arguments to the function.
More generally, Chapels types can be used to encode algebraic sums (disjoint
unions) and products (Cartesian) of types. This, in turn, makes it possible to
build arbitrary data structures at the type level. The lists-of-values case
above is an instance of this general principle. Functions can be defined on
type-level data structures by relying on overloading and type arguments.
Programming in this manner starts to resemble programming in a purely
functional language such as Haskell.
Though this style of programming has not seen much use thus far, it can be a
powerful technique for controlling types of arguments or constructing highly
customized functions with no runtime overhead.

979
type-level/slides.md Normal file
View File

@ -0,0 +1,979 @@
---
theme: gaia
title: Type-Level Programming in Chapel for Compile-Time Specialization
backgroundColor: #fff
---
<!-- _class: lead -->
<style>
section {
font-size: 25px;
}
blockquote {
padding: 15px;
background-color: #f0f0ff;
}
div.side-by-side {
display: flex;
justify-content: space-between;
gap: 20px;
}
div.side-by-side > div {
flex: 1;
}
</style>
# **Type-Level Programming in Chapel for Compile-Time Specialization**
Daniel Fedorin, HPE
---
# Compile-Time Programming in Chapel
* **Type variables**, as their name suggests, store types instead of values.
```Chapel
type myArgs = (int, real);
```
* **Procedures with `type` return intent** can construct new types.
```Chapel
proc toNilableIfClassType(type arg) type do
if isNonNilableClassType(arg) then return arg?; else return arg;
```
* **`param` variables** store values that are known at compile-time.
```Chapel
param numberOfElements = 3;
var threeInts: numberOfElements * int;
```
* **Compile-time conditionals** are inlined at compile-time.
```Chapel
if false then somethingThatWontCompile();
```
---
# Restrictions on Compile-Time Programming
* Compile-time operations do not have mutable state.
- Cannot change values of `param` or `type` variables.
* Chapel's compile-time programming does not support loops.
- `param` loops are kind of an aception, but are simply unrolled.
- Without mutability, this unrolling doesn't give us much.
* Without state, our `type` and `param` functions are pure.
---
# Did someone say "pure"?
I can think of another language that has pure functions...
* :white_check_mark: Haskell doesn't have mutable state by default.
* :white_check_mark: Haskell doesn't have imperative loops.
* :white_check_mark: Haskell functions are pure.
---
# Programming in Haskell
Without mutability and loops, Haskell programmers use pattern-matching and recursion to express their algorithms.
* Data structures are defined by enumerating their possible cases. A list is either empty, or a head element followed by a tail list.
```Haskell
data ListOfInts = Nil | Cons Int ListOfInts
-- [] = Nil
-- [1] = Cons 1 Nil
-- [1,2,3] = Cons 1 (Cons 2 (Cons 3 Nil))
```
* Pattern-matching is used to examine the cases of a data structure and act accordingly.
```Haskell
sum :: ListOfInts -> Int
sum Nil = 0
sum (Cons i tail) = i + sum tail
```
---
# Evaluating Haskell
Haskell simplifies calls to functions by pickign the case based on the arguments.
```Haskell
sum (Cons 1 (Cons 2 (Cons 3 Nil)))
-- case: sum (Cons i tail) = i + sum tail
= 1 + sum (Cons 2 (Cons 3 Nil))
-- case: sum (Cons i tail) = i + sum tail
= 1 + (2 + sum (Cons 3 Nil))
-- case: sum (Cons i tail) = i + sum tail
= 1 + (2 + (3 + sum Nil))
-- case: sum Nil = 0
= 1 + (2 + (3 + 0))
= 6
```
---
# A Familiar Pattern
Picking a case based on the arguments is very similar to Chapel's function overloading.
* **A very familiar example**:
```Chapel
proc foo(x: int) { writeln("int"); }
proc foo(x: real) { writeln("real"); }
foo(1); // prints "int"
```
* **A slightly less familiar example**:
```Chapel
proc foo(type x: int) { compilerWarning("int"); }
proc foo(type x: real) { compilerWarning("real"); }
foo(int); // compiler prints "int"
```
---
# A Type-Level List
Hypothesis: we can use Chapel's function overloading and types to write functional-ish programs.
<div class="side-by-side">
<div>
```Chapel
record Nil {}
record Cons { param head: int; type tail; }
type myList = Cons(1, Cons(2, Cons(3, Nil)));
proc sum(type x: Nil) param do return 0;
proc sum(type x: Cons(?i, ?tail)) param do return i + sum(tail);
compilerWarning(sum(myList) : string); // compiler prints 6
```
</div>
<div>
```Haskell
data ListOfInts = Nil
| Cons Int ListOfInts
myList = Cons 1 (Cons 2 (Cons 3 Nil))
sum :: ListOfInts -> Int
sum Nil = 0
sum (Cons i tail) = i + sum tail
```
</div>
</div>
---
# Type-Level Programming at Compile-Time
After resolution, our original program:
```Chapel
record Nil {}
record Cons { param head: int; type tail; }
type myList = Cons(1, Cons(2, Cons(3, Nil)));
proc sum(type x: Nil) param do return 0;
proc sum(type x: Cons(?i, ?tail)) param do return i + sum(tail);
writeln(sum(myList) : string); // compiler prints 6
```
Becomes:
```Chapel
writeln("6");
```
There is no runtime overhead!
---
# Type-Level Programming at Compile-Time
> Why would I want to do this?!
>
> \- You, probably
* Do you want to write parameterized code, without paying runtime overhead for the runtime parameters?
- **Worked example**: linear multi-step method approximator
* Do you want to have powerful compile-time checks and constraints on your function types?
- **Worked example**: type-safe `printf` function
---
<!-- _class: lead -->
# Linear Multi-Step Method Approximator
---
# Euler's Method
A first-order differential equation can be written in the following form:
$$
y' = f(t, y)
$$
In other words, the derivative of of $y$ depends on $t$ and $y$ itself. There is no solution to this equation in general; we have to approximate.
If we know an initial point $(t_0, y_0)$, we can approximate other points. To get the point at $t_1 = t_0 + h$, we can use the formula:
$$
\begin{align*}
y'(t_0) & = f(t_0, y_0) \\
y(t_0+h) & \approx y_0 + h \times y'(t_0) \\
& \approx y_0 + h \times f(t_0, y_0)
\end{align*}
$$
We can name the the first approximated $y$-value $y_1$, and set it:
$$
y_1 = y_0 + h \times f(t_0, y_0)
$$
---
# Euler's Method
On the previous slide, we got a new point $(t_1, y_1)$. We can repeat the process to get $y_2$:
$$
\begin{array}{c}
y_2 = y_1 + h \times f(t_1, y_1) \\
y_3 = y_2 + h \times f(t_2, y_2) \\
y_4 = y_3 + h \times f(t_3, y_3) \\
\cdots \\
y_{n+1} = y_n + h \times f(t_n, y_n) \\
\end{array}
$$
---
# Euler's Method in Chapel
This can be captured in a simple Chapel procedure:
```Chapel
proc runEulerMethod(step: real, count: int, t0: real, y0: real) {
var y = y0;
var t = t0;
for i in 1..count {
y += step*f(t,y);
t += step;
}
return y;
}
```
---
# Other Methods
* In Euler's method, we look at the slope of a function at a particular point, and use it to extrapolate the next point.
* Once we've computed a few points, we have more information we can incorporate.
- When computing $y_2$, we can use both $y_0$ and $y_1$.
- To get a good approximation, we have to weight the points differently.
$$
y_{n+2} = y_{n+1} + h \left(\frac{3}{2}f(t_{n+1}, y_{n+1}) - \frac{1}{2}f(t_{n}, y_{n})\right)
$$
- More points means better accuracy, but more computation.
* There are other methods that use more points and different weights.
- Another method is as follows:
$$
y_{n+3} = y_{n+2} + h \left(\frac{23}{12}f(t_{n+2}, y_{n+2}) - \frac{16}{12}f(t_{n+1}, y_{n+1}) + \frac{5}{12}f(t_{n}, y_{n})\right)
$$
---
# Generalizing Multi-Step Methods
Explicit Adams-Bashforth methods in general can be encoded as the coefficients used to weight the previous points.
| Method | Equation | Coefficient List
|---------------------------|--------------------------------------|-----------------
| Euler's method | $y_{n+1} = y_n + h \times f(t_n, y_n)$ | $1$
| Two-step A.B. | $y_{n+2} = y_{n+1} + h \left(\frac{3}{2}f(t_{n+1}, y_{n+1}) - \frac{1}{2}f(t_{n}, y_{n})\right)$ | $\frac{3}{2},-\frac{1}{2}$
---
# Generalizing Multi-Step Methods
Explicit Adams-Bashforth methods in general can be encoded as the coefficients used to weight the previous points.
| Method | Equation | Chapel Type Expression
|---------------------------|--------------------------------------|-----------------
| Euler's method | $y_{n+1} = y_n + h \times f(t_n, y_n)$ | `Cons(1,Nil)`
| Two-step A.B. | $y_{n+2} = y_{n+1} + h \left(\frac{3}{2}f(t_{n+1}, y_{n+1}) - \frac{1}{2}f(t_{n}, y_{n})\right)$ | `Cons(3/2,Cons(-1/2, Nil))`
---
# Supporting Functions for Coefficient Lists
```Chapel
proc length(type x: Cons(?w, ?t)) param do return 1 + length(t);
proc length(type x: Nil) param do return 0;
proc coeff(param x: int, type lst: Cons(?w, ?t)) param where x == 0 do return w;
proc coeff(param x: int, type lst: Cons(?w, ?t)) param where x > 0 do return coeff(x-1, t);
```
---
# A General Solver
```Chapel
proc runMethod(type method, h: real, count: int, start: real,
in ys: real ... length(method)): real {
```
* `type method` accepts a type-level list of coefficients.
* `h` encodes the step size.
* `start` is $t_0$, the initial time.
* `count` is the number of steps to take.
* `in ys` makes the function accept as many `real` values (for $y_0, y_1, \ldots$) as there are weights
---
# A General Solver
```Chapel
param coeffCount = length(method);
// Repeat the methods as many times as requested
for i in 1..count {
// We're computing by adding h*b_j*f(...) to y_n.
// Set total to y_n.
var total = ys(coeffCount - 1);
// 'for param' loops are unrolled at compile-time -- this is just
// like writing out each iteration by hand.
for param j in 1..coeffCount do
// For each coefficient b_j given by coeff(j, method),
// increment the total by h*bj*f(...)
total += step * coeff(j, method) *
f(start + step*(i-1+coeffCount-j), ys(coeffCount-j));
// Shift each y_i over by one, and set y_{n+s} to the
// newly computed total.
for param j in 0..< coeffCount - 1 do
ys(j) = ys(j+1);
ys(coeffCount - 1) = total;
}
// return final y_{n+s}
return ys(coeffCount - 1);
```
---
# Using the General Solver
```Chapel
type euler = cons(1.0, empty);
type adamsBashforth = cons(3.0/2.0, cons(-0.5, empty));
type someThirdMethod = cons(23.0/12.0, cons(-16.0/12.0, cons(5.0/12.0, empty)));
```
Take a simple differential equation $y' = y$. For this, define `f` as follows:
```Chapel
proc f(t: real, y: real) do return y;
```
Now, we can run Euler's method like so:
```Chapel
writeln(runMethod(euler, step=0.5, count=4, start=0, 1)); // 5.0625
```
To run the 2-step Adams-Bashforth method, we need two initial values:
```Chapel
var y0 = 1.0;
var y1 = runMethod(euler, step=0.5, count=1, start=0, 1);
writeln(runMethod(adamsBashforth, step=0.5, count=3, start=0.5, y0, y1)); // 6.02344
```
---
# The General Solver
We can now construct solvers for any explicit Adams-Bashforth method, without writing any new code.
---
<!-- _class: lead -->
# Type-Safe `printf`
---
# The `printf` Function
The `printf` function accepts a format string, followed by a variable number of arguments that should match:
```C
// totally fine:
printf("Hello, %s! Your ChapelCon submission is #%d\n", "Daniel", 18);
// not good:
printf("Hello, %s! Your ChapelCon submission is #%d\n", 18, "Daniel");
```
Can we define a `printf` function in Chapel that is type-safe?
---
# Yet Another Type-Level List
- The general idea for type-safe `printf`: take the format string, and extract a list of the expected argument types.
- To make for nicer error messages, include a human-readable description of each type in the list.
- I've found it more convenient to re-define lists for various problems when needed, rather than having a single canonical list definition.
```chapel
record _nil {
proc type length param do return 0;
}
record _cons {
type expectedType; // type of the argument to printf
param name: string; // human-readable name of the type
type rest;
proc type length param do return 1 + rest.length();
}
```
---
# Extracting Types from Format Strings
```Chapel
proc specifiers(param s: string, param i: int) type {
if i >= s.size then return _nil;
if s[i] == "%" {
if i + 1 >= s.size then
compilerError("Invalid format string: unterminted %");
select s[i + 1] {
when "%" do return specifiers(s, i + 2);
when "s" do return _cons(string, "a string", specifiers(s, i + 2));
when "i" do return _cons(int, "a signed integer", specifiers(s, i + 2));
when "u" do return _cons(uint, "an unsigned integer", specifiers(s, i + 2));
when "n" do return _cons(numeric, "a numeric value", specifiers(s, i + 2));
otherwise do compilerError("Invalid format string: unknown format type");
}
} else {
return specifiers(s, i + 1);
}
}
```
---
# Extracting Types from Format Strings
Let's give it a quick try:
```Chapel
writeln(specifiers("Hello, %s! Your ChapelCon submission is #%i\n", 0) : string);
```
The above prints:
```Chapel
_cons(string,"a string",_cons(int(64),"a signed integer",_nil))
```
---
# Validating Argument Types
* The Chapel standard library has a nice `isSubtype` function that we can use to check if an argument matches the expected type.
* Suppose the `.length` of our type specifiers matches the number of arguments to `printf`
* Chapel doesn't currently support empty tuples, so if the lengths match, we know that `specifiers` is non-empty.
* Then, we can validate the types as follows:
```Chapel
proc validate(type specifiers: _cons(?t, ?s, ?rest), type argTup, param idx) {
if !isSubtype(argTup[idx], t) then
compilerError("Argument " + (idx + 1) : string + " should be " + s + " but got " + argTup[idx]:string, idx+2);
if idx + 1 < argTup.size then
validate(rest, argTup, idx + 1);
}
```
* The `idx+2` argument to `compilerError` avoids printing the recursive `validate` calls in the error message.
---
# The `fprintln` overloads
* I named it `fprintln` for "formatted print line".
* To support the empty-specifier case (Chapel varargs don't allow zero arguments):
```Chapel
proc fprintln(param format: string) where specifiers(format, 0).length == 0 {
writeln(format);
}
```
* If we do have type specifiers, to ensure our earlier assumption of `size` matching:
```Chapel
proc fprintln(param format: string, args...)
where specifiers(format, 0).length != args.size {
compilerError("'fprintln' with this format string expects " +
specifiers(format, 0).length : string +
" argument(s) but got " + args.size : string);
}
```
---
# The `fprintln` overloads
* All that's left is the main `fprintln` implementation:
```Chapel
proc fprintln(param format: string, args...) {
validate(specifiers(format, 0), args.type, 0);
writef(format + "\n", (...args));
}
```
---
# Using `fprintln`
```Chapel
fprintln("Hello, world!"); // fine, prints "Hello, world!"
fprintln("The answer is %i", 42); // fine, prints "The answer is 42"
// compiler error: Argument 3 should be a string but got int(64)
fprintln("The answer is %i %i %s", 1, 2, 3);
```
More work could be done to support more format specifiers, escapes, etc., but the basic idea is there.
---
<!-- _class: lead -->
# Beyond Lists
---
# Beyond Lists
* I made grand claims earlier
- "Write functional-ish program at the type level!"
* So far, we've just used lists and some recursion.
* Is that all there is?
---
# Algebraic Data Types
* The kinds of data types that Haskell supports are called *algebraic data types*.
* At a fundamental level, they can be built up from two operations: _cartesian product_ and _disjoint union_.
* There are other concepts to build recursive data types, but we won't need them in Chapel.
- To prove to you I know what I'm talking about, some jargon:
_initial algebras_, _the fixedpoint functor_, _catamorphisms_...
- Check out _Bananas, Lenses, Envelopes and Barbed Wire_ by Meijer et al. for more.
* __This matters because, if Chapel has these operations, we can build any data type that Haskell can.__
---
<style scoped>
li:nth-child(3) { color: lightgrey; }
</style>
# Algebraic Data Types
- The kinds of data types that Haskell supports are called *algebraic data types*.
- At a fundamental level, they can be built up from two operations: _cartesian product_ and _disjoint union_.
- There are other concepts to build recursive data types, but we won't need them in Chapel.
- To prove to you I know what I'm talking about, some jargon:
_initial algebras_, _the fixedpoint functor_, _catamorphisms_...
- Check out _Bananas, Lenses, Envelopes and Barbed Wire_ by Meijer et al. for more.
- __This matters because, if Chapel has these operations, we can build any data type that Haskell can.__
---
# Cartesian Product
For any two types, the _cartesian product_ of these two types defines all pairs of values from these types.
- This is like a two-element tuple _at the value level_ in Chapel.
- We write this as $A \times B$ for two types $A$ and $B$.
- In (type-level) Chapel and Haskell:
<div class=side-by-side>
<div>
```Chapel
record Pair {
type fst;
type snd;
}
type myPair = Pair(myVal1, myVal2);
```
</div>
<div>
```Haskell
data Pair = MkPair
{ fst :: A
, snd :: B
}
myPair = MkPair myVal1 myVal2
```
</div>
</div>
---
# Disjoint Union
For any two types, the _disjoint union_ of these two types defines values that are either from one type or the other.
- This is _almost_ like a `union` in Chapel or C...
- But there's extra information to tell us which of the two types the value is from.
- We write this as $A + B$ for two types $A$ and $B$.
- In Chapel and Haskell:
<div class=side-by-side>
<div>
```Chapel
record InL { type value; }
record InR { type value; }
type myFirstCase = InL(myVal1);
type mySecondCase = InR(myVal2);
```
</div>
<div>
```Haskell
data Sum
= InL A
| InR B
myFirstCase = InL myVal1
mySecondCase = InR myVal2
```
</div>
</div>
---
# Algebraic Data Types
* We can build up more compelx types by combining these two operations.
* Need a triple of types $A$, $B$, and $C$? Use $A \times (B \times C)$.
* Similarly, "any one of three types" can be expressed as $A + (B + C)$.
* A `Result<T>` type (in Rust, or `optional<T>` in C++) is $T + \text{Unit}$.
* `Unit` is a type with a single value (there's only one `None` / `std::nullopt`).
* Notice that in Chapel, we moved up one level
| Thing | Chapel | Haskell |
|-------|------------------|-------------------|
| `Nil` | type | value |
| `Cons`| type constructor | value constructor |
| List | **???** | type |
---
# Algebraic Data Types
* Since Chapel has no notion of a type-of-types, we can't enforce that our values are _only_ `InL` or `InR` (in the case of `Sum`).
* This is why, in Chapel versions, type annotations like `A` and `B` are missing.
<div class=side-by-side>
<div>
```Chapel
record Pair {
type fst; /* : A */
type snd; /* : B */
}
```
</div>
<div>
```Haskell
data Pair = MkPair
{ fst :: A
, snd :: B
}
```
</div>
</div>
* So, we can't enforce that the user doesn't pass `int` to our `length` function defined on lists.
* We also can't enforce that `InL` is instantiated with the right type.
* So, we lose some safety compare to Haskell...
* ...but we're getting the compiler to do arbitrary computations for us at compile-time.
---
# Worked Example: Binary Search Tree
In Haskell, binary search trees can be defined as follows:
```Haskell
data BSTree = Empty
| Node Int BSTree BSTree
balancedOneTwoThree = Node 2 (Node 1 Empty Empty) (Node 3 Empty Empty)
```
Written using Algebraic Data Types, this is:
$$
\text{BSTree} = \text{Unit} + (\text{Int} \times \text{BSTree} \times \text{BSTree})
$$
In Haskell (using sums and products):
```Haskell
type BSTree' = Unit `Sum` (Int `Pair` (BSTree' `Pair` BSTree'))
balancedOneTwoThree' = InR (2 `MkPair` (InR (1 `MkPair` (InL MkUnit `MkPair` InL Unit)) `MkPair`
InR (3 `MkPair` (InL MkUnit `MkPair` InL Unit))))
```
---
# Worked Example: Binary Search Tree
* Recalling the Haskell version:
```Haskell
type BSTree' = Unit `Sum` (Int `Pair` (BSTree' `Pair` BSTree'))
balancedOneTwoThree' = InR (2 `MkPair` (InR (1 `MkPair` (InL MkUnit `MkPair` InL Unit)) `MkPair`
InR (3 `MkPair` (InL MkUnit `MkPair` InL Unit))))
```
* We can't define `BSTree'` in Chapel (no type-of-types), but we can define `balancedOneTwoThree'`:
```Chapel
type balancedOneTwoThree =
InR(Pair(2, Pair(InR(Pair(1, Pair(InL(), InL()))),
InR(Pair(3, Pair(InL(), InL()))))));
```
* :white_check_mark: We can use algebraic data types to build arbitrarily complex data structures ◼.
---
# Returning to Pragmatism
* We could've defined our list type in terms of `InL`, `InR`, and `Pair`.
* However, it was cleaner to make it look more like the non-ADT Haskell version.
* Recall that it looked like this:
<div class="side-by-side">
<div>
```Chapel
record Nil {}
record Cons { param head: int; type tail; }
type myList = Cons(1, Cons(2, Cons(3, Nil)));
```
</div>
<div>
```Haskell
data ListOfInts = Nil
| Cons Int ListOfInts
myList = Cons 1 (Cons 2 (Cons 3 Nil))
```
</div>
</div>
* We can do the same thing for our binary search tree:
<div class="side-by-side">
<div>
```Chapel
record Empty {}
record Node { param value: int; type left; type right; }
type balancedOneTwoThree = Node(2, Node(1, Empty, Empty),
Node(3, Empty, Empty));
```
</div>
<div>
```Haskell
data BSTree = Empty
| Node Int BSTree BSTree
balancedOneTwoThree = Node 2 (Node 1 Empty Empty)
(Node 3 Empty Empty)
```
</div>
</div>
---
# A General Recipe
To translate a Haskell data type definition to Chapel:
* For each constructor, define a `record` with that constructor's name
* The fields of that record are `type` fields for each argument of the constructor
- If the argument is a value (like `Int`), you can make it a `param` field instead
* A visual example, again:
<div class="side-by-side">
<div>
```Chapel
record C1 { type arg1; /* ... */ type argi; }
// ...
record Cn { type arg1; /* ... */ type argj; }
```
</div>
<div>
```Haskell
data T = C1 arg1 ... argi
| ...
| Cn arg1 ... argj
```
</div>
</div>
---
# Inserting and Looking Up in a BST
<div class="side-by-side">
<div>
```Chapel
proc insert(type t: Empty, param x: int) type do return Node(x, Empty, Empty);
proc insert(type t: Node(?v, ?left, ?right), param x: int) type do
select true {
when x < v do return Node(v, insert(left, x), right);
otherwise do return Node(v, left, insert(right, x));
}
type test = insert(insert(insert(Empty, 2), 1), 3);
proc lookup(type t: Empty, param x: int) param do return false;
proc lookup(type t: Node(?v, ?left, ?right), param x: int) param do
select true {
when x == v do return true;
when x < v do return lookup(left, x);
otherwise do return lookup(right, x);
}
```
</div>
<div>
```Haskell
insert :: Int -> BSTree -> BSTree
insert x Empty = Node x Empty Empty
insert x (Node v left right)
| x < v = Node v (insert x left) right
| otherwise = Node v left (insert x right)
test = insert 3 (insert 1 (insert 2 Empty))
lookup :: Int -> BSTree -> Bool
lookup x Empty = False
lookup x (Node v left right)
| x == v = True
| x < v = lookup x left
| otherwise = lookup x right
```
</div>
</div>
It really works!
```Chapel
writeln(test : string);
// prints Node(2,Node(1,Empty,Empty),Node(3,Empty,Empty))
writeln(lookup(test, 1));
// prints true for this one, but false for '4'
```
---
# A Key-Value Map
```Chapel
record Empty {}
record Node { param key: int; param value; type left; type right; }
proc insert(type t: Empty, param k: int, param v) type do return Node(k, v, Empty, Empty);
proc insert(type t: Node(?k, ?v, ?left, ?right), param nk: int, param nv) type do
select true {
when nk < k do return Node(k, v, insert(left, nk, nv), right);
otherwise do return Node(k, v, left, insert(right, nk, nv));
}
proc lookup(type t: Empty, param k: int) param do return "not found";
proc lookup(type t: Node(?k, ?v, ?left, ?right), param x: int) param do
select true {
when x == k do return v;
when x < k do return lookup(left, x);
otherwise do return lookup(right, x);
}
type test = insert(insert(insert(Empty, 2, "two"), 1, "one"), 3, "three");
writeln(lookup(test, 1)); // prints "one"
writeln(lookup(test, 3)); // prints "three"
writeln(lookup(test, 4)); // prints "not found"
```
---
# Conclusion
* Chapel's type-level programming is surprisingly powerful.
* We can write compile-time programs that are very similar to Haskell programs.
* This allows us to write highly parameterized code without paying runtime overhead.
* This also allows us to devise powerful compile-time checks and constraints on our code.
* This approach allows for general-purpose programming, which can be applied to `your use-case`