--- theme: gaia title: Type-Level Programming in Chapel for Compile-Time Specialization backgroundColor: #fff --- # **Type-Level Programming in Chapel for Compile-Time Specialization** Daniel Fedorin, HPE --- # Compile-Time Programming in Chapel * **Type variables**, as their name suggests, store types instead of values. ```Chapel type myArgs = (int, real); ``` * **Procedures with `type` return intent** can construct new types. ```Chapel proc toNilableIfClassType(type arg) type do if isNonNilableClassType(arg) then return arg?; else return arg; ``` * **`param` variables** store values that are known at compile-time. ```Chapel param numberOfElements = 3; var threeInts: numberOfElements * int; ``` * **Compile-time conditionals** are inlined at compile-time. ```Chapel if false then somethingThatWontCompile(); ``` --- # Restrictions on Compile-Time Programming * Compile-time operations do not have mutable state. - Cannot change values of `param` or `type` variables. * Chapel's compile-time programming does not support loops. - `param` loops are kind of an exception, but are simply unrolled. - Without mutability, this unrolling doesn't give us much. * Without state, our `type` and `param` functions are pure. --- # Did someone say "pure"? I can think of another language that has pure functions... * :white_check_mark: Haskell doesn't have mutable state by default. * :white_check_mark: Haskell doesn't have imperative loops. * :white_check_mark: Haskell functions are pure. --- # Programming in Haskell Without mutability and loops, (purely) functional programmers use pattern-matching and recursion to express their algorithms. * Data structures are defined by enumerating their possible cases. A list is either empty, or a head element followed by a tail list. ```Haskell data ListOfInts = Nil | Cons Int ListOfInts -- [] = Nil -- [1] = Cons 1 Nil -- [1,2,3] = Cons 1 (Cons 2 (Cons 3 Nil)) ``` * Pattern-matching is used to examine the cases of a data structure and act accordingly. ```Haskell sum :: ListOfInts -> Int sum Nil = 0 sum (Cons i tail) = i + sum tail ``` --- # Evaluating Haskell Haskell simplifies calls to functions by picking the case based on the arguments. ```Haskell sum (Cons 1 (Cons 2 (Cons 3 Nil))) -- case: sum (Cons i tail) = i + sum tail = 1 + sum (Cons 2 (Cons 3 Nil)) -- case: sum (Cons i tail) = i + sum tail = 1 + (2 + sum (Cons 3 Nil)) -- case: sum (Cons i tail) = i + sum tail = 1 + (2 + (3 + sum Nil)) -- case: sum Nil = 0 = 1 + (2 + (3 + 0)) = 6 ``` --- # A Familiar Pattern Picking a case based on the arguments is very similar to Chapel's function overloading. * **A very familiar example**: ```Chapel proc foo(x: int) { writeln("int"); } proc foo(x: real) { writeln("real"); } foo(1); // prints "int" ``` * **A slightly less familiar example**: ```Chapel proc foo(type x: int) { compilerWarning("int"); } proc foo(type x: real) { compilerWarning("real"); } foo(int); // compiler prints "int" ``` --- # A Type-Level List Hypothesis: we can use Chapel's function overloading and types to write functional-ish programs.

```Chapel record Nil {} record Cons { param head: int; type tail; } type myList = Cons(1, Cons(2, Cons(3, Nil))); proc sum(type x: Nil) param do return 0; proc sum(type x: Cons(?i, ?tail)) param do return i + sum(tail); compilerWarning(sum(myList) : string); // compiler prints 6 ```

```Haskell data ListOfInts = Nil | Cons Int ListOfInts myList = Cons 1 (Cons 2 (Cons 3 Nil)) sum :: ListOfInts -> Int sum Nil = 0 sum (Cons i tail) = i + sum tail ```

--- # Type-Level Programming at Compile-Time After resolution, our original program: ```Chapel record Nil {} record Cons { param head: int; type tail; } type myList = Cons(1, Cons(2, Cons(3, Nil))); proc sum(type x: Nil) param do return 0; proc sum(type x: Cons(?i, ?tail)) param do return i + sum(tail); writeln(sum(myList) : string); // compiler prints 6 ``` Becomes: ```Chapel writeln("6"); ``` There is no runtime overhead! --- # Type-Level Programming at Compile-Time ![width:400px](./why.png) --- # Type-Level Programming at Compile-Time > Why would I want to do this?! > > \- You, probably * Do you want to write parameterized code, without paying runtime overhead for the runtime parameters? - **Worked example**: linear multi-step method approximator * Do you want to have powerful compile-time checks and constraints on your function types? - **Worked example**: type-safe `printf` function --- # Linear Multi-Step Method Approximator --- # Euler's Method A first-order differential equation can be written in the following form: $$ y' = f(t, y) $$ In other words, the derivative of of $y$ depends on $t$ and $y$ itself. There is no solution to this equation in general; we have to approximate. If we know an initial point $(t_0, y_0)$, we can approximate other points. To get the point at $t_1 = t_0 + h$, we can use the formula: $$ \begin{align*} y'(t_0) & = f(t_0, y_0) \\ y(t_0+h) & \approx y_0 + h \times y'(t_0) \\ & \approx y_0 + h \times f(t_0, y_0) \end{align*} $$ We can name the first approximated $y$-value $y_1$, and set it: $$ y_1 = y_0 + h \times f(t_0, y_0) $$ --- # Euler's Method On the previous slide, we got a new point $(t_1, y_1)$. We can repeat the process to get $y_2$: $$ \begin{array}{c} y_2 = y_1 + h \times f(t_1, y_1) \\ y_3 = y_2 + h \times f(t_2, y_2) \\ y_4 = y_3 + h \times f(t_3, y_3) \\ \cdots \\ y_{n+1} = y_n + h \times f(t_n, y_n) \\ \end{array} $$ --- # Euler's Method in Chapel This can be captured in a simple Chapel procedure: ```Chapel proc runEulerMethod(step: real, count: int, t0: real, y0: real) { var y = y0; var t = t0; for i in 1..count { y += step*f(t,y); t += step; } return y; } ``` --- # Other Methods * In Euler's method, we look at the slope of a function at a particular point, and use it to extrapolate the next point. * Once we've computed a few points, we have more information we can incorporate. - When computing $y_2$, we can use both $y_0$ and $y_1$. - To get a good approximation, we have to weight the points differently. $$ y_{n+2} = y_{n+1} + h \left(\frac{3}{2}f(t_{n+1}, y_{n+1}) - \frac{1}{2}f(t_{n}, y_{n})\right) $$ - More points means better accuracy, but more computation. * There are other methods that use more points and different weights. - Another method is as follows: $$ y_{n+3} = y_{n+2} + h \left(\frac{23}{12}f(t_{n+2}, y_{n+2}) - \frac{16}{12}f(t_{n+1}, y_{n+1}) + \frac{5}{12}f(t_{n}, y_{n})\right) $$ --- # Generalizing Multi-Step Methods Explicit Adams-Bashforth methods in general can be encoded as the coefficients used to weight the previous points. | Method | Equation | Coefficient List |---------------------------|--------------------------------------|----------------- | Euler's method | $y_{n+1} = y_n + h \times f(t_n, y_n)$ | $1$ | Two-step A.B. | $y_{n+2} = y_{n+1} + h \left(\frac{3}{2}f(t_{n+1}, y_{n+1}) - \frac{1}{2}f(t_{n}, y_{n})\right)$ | $\frac{3}{2},-\frac{1}{2}$ --- # Generalizing Multi-Step Methods Explicit Adams-Bashforth methods in general can be encoded as the coefficients used to weight the previous points. | Method | Equation | Chapel Type Expression |---------------------------|--------------------------------------|----------------- | Euler's method | $y_{n+1} = y_n + h \times f(t_n, y_n)$ | `Cons(1,Nil)` | Two-step A.B. | $y_{n+2} = y_{n+1} + h \left(\frac{3}{2}f(t_{n+1}, y_{n+1}) - \frac{1}{2}f(t_{n}, y_{n})\right)$ | `Cons(3/2,Cons(-1/2, Nil))` --- # Supporting Functions for Coefficient Lists ```Chapel proc length(type x: Cons(?w, ?t)) param do return 1 + length(t); proc length(type x: Nil) param do return 0; proc coeff(param x: int, type lst: Cons(?w, ?t)) param where x == 0 do return w; proc coeff(param x: int, type lst: Cons(?w, ?t)) param where x > 0 do return coeff(x-1, t); ``` --- # A General Solver ```Chapel proc runMethod(type method, h: real, count: int, start: real, in ys: real ... length(method)): real { ``` * `type method` accepts a type-level list of coefficients. * `h` encodes the step size. * `start` is $t_0$, the initial time. * `count` is the number of steps to take. * `in ys` makes the function accept as many `real` values (for $y_0, y_1, \ldots$) as there are weights --- # A General Solver ```Chapel param coeffCount = length(method); // Repeat the methods as many times as requested for i in 1..count { // We're computing by adding h*b_j*f(...) to y_n. // Set total to y_n. var total = ys(coeffCount - 1); // 'for param' loops are unrolled at compile-time -- this is just // like writing out each iteration by hand. for param j in 1..coeffCount do // For each coefficient b_j given by coeff(j, method), // increment the total by h*bj*f(...) total += step * coeff(j, method) * f(start + step*(i-1+coeffCount-j), ys(coeffCount-j)); // Shift each y_i over by one, and set y_{n+s} to the // newly computed total. for param j in 0..< coeffCount - 1 do ys(j) = ys(j+1); ys(coeffCount - 1) = total; } // return final y_{n+s} return ys(coeffCount - 1); ``` --- # Using the General Solver ```Chapel type euler = cons(1.0, empty); type adamsBashforth = cons(3.0/2.0, cons(-0.5, empty)); type someThirdMethod = cons(23.0/12.0, cons(-16.0/12.0, cons(5.0/12.0, empty))); ``` Take a simple differential equation $y' = y$. For this, define `f` as follows: ```Chapel proc f(t: real, y: real) do return y; ``` Now, we can run Euler's method like so: ```Chapel writeln(runMethod(euler, step=0.5, count=4, start=0, 1)); // 5.0625 ``` To run the 2-step Adams-Bashforth method, we need two initial values: ```Chapel var y0 = 1.0; var y1 = runMethod(euler, step=0.5, count=1, start=0, 1); writeln(runMethod(adamsBashforth, step=0.5, count=3, start=0.5, y0, y1)); // 6.02344 ``` --- # The General Solver We can now construct solvers for any explicit Adams-Bashforth method, without writing any new code. --- # Type-Safe `printf` --- # The `printf` Function The `printf` function accepts a format string, followed by a variable number of arguments that should match: ```C // totally fine: printf("Hello, %s! Your ChapelCon submission is #%d\n", "Daniel", 18); // not good: printf("Hello, %s! Your ChapelCon submission is #%d\n", 18, "Daniel"); ``` Can we define a `printf` function in Chapel that is type-safe? --- # Yet Another Type-Level List - The general idea for type-safe `printf`: take the format string, and extract a list of the expected argument types. - To make for nicer error messages, include a human-readable description of each type in the list. - I've found it more convenient to re-define lists for various problems when needed, rather than having a single canonical list definition. ```chapel record _nil { proc type length param do return 0; } record _cons { type expectedType; // type of the argument to printf param name: string; // human-readable name of the type type rest; proc type length param do return 1 + rest.length(); } ``` --- # Extracting Types from Format Strings ```Chapel proc specifiers(param s: string, param i: int = 0) type { if i >= s.size then return _nil; if s[i] == "%" { if i + 1 >= s.size then compilerError("Invalid format string: unterminted %"); select s[i + 1] { when "%" do return specifiers(s, i + 2); when "s" do return _cons(string, "a string", specifiers(s, i + 2)); when "i" do return _cons(int, "a signed integer", specifiers(s, i + 2)); when "u" do return _cons(uint, "an unsigned integer", specifiers(s, i + 2)); when "n" do return _cons(numeric, "a numeric value", specifiers(s, i + 2)); otherwise do compilerError("Invalid format string: unknown format type"); } } else { return specifiers(s, i + 1); } } ``` --- # Extracting Types from Format Strings Let's give it a quick try: ```Chapel writeln(specifiers("Hello, %s! Your ChapelCon submission is #%i\n") : string); ``` The above prints: ```Chapel _cons(string,"a string",_cons(int(64),"a signed integer",_nil)) ``` --- # Validating Argument Types * The Chapel standard library has a nice `isSubtype` function that we can use to check if an argument matches the expected type. * Suppose the `.length` of our type specifiers matches the number of arguments to `printf` * Chapel doesn't currently support empty tuples, so if the lengths match, we know that `specifiers` is non-empty. * Then, we can validate the types as follows: ```Chapel proc validate(type specifiers: _cons(?t, ?s, ?rest), type argTup, param idx) { if !isSubtype(argTup[idx], t) then compilerError("Argument " + (idx + 1) : string + " should be " + s + " but got " + argTup[idx]:string, idx+2); if idx + 1 < argTup.size then validate(rest, argTup, idx + 1); } ``` * The `idx+2` argument to `compilerError` avoids printing the recursive `validate` calls in the error message. --- # The `fprintln` overloads * I named it `fprintln` for "formatted print line". * To support the empty-specifier case (Chapel varargs don't allow zero arguments): ```Chapel proc fprintln(param format: string) where specifiers(format).length == 0 { writeln(format); } ``` * If we do have type specifiers, to ensure our earlier assumption of `size` matching: ```Chapel proc fprintln(param format: string, args...) where specifiers(format).length != args.size { compilerError("'fprintln' with this format string expects " + specifiers(format).length : string + " argument(s) but got " + args.size : string); } ``` --- # The `fprintln` overloads * All that's left is the main `fprintln` implementation: ```Chapel proc fprintln(param format: string, args...) { validate(specifiers(format), args.type, 0); writef(format + "\n", (...args)); } ``` --- # Using `fprintln` ```Chapel fprintln("Hello, world!"); // fine, prints "Hello, world!" fprintln("The answer is %i", 42); // fine, prints "The answer is 42" // compiler error: Argument 3 should be a string but got int(64) fprintln("The answer is %i %i %s", 1, 2, 3); ``` More work could be done to support more format specifiers, escapes, etc., but the basic idea is there. --- # Beyond Lists --- # Beyond Lists * I made grand claims earlier - "Write functional-ish program at the type level!" * So far, we've just used lists and some recursion. * Is that all there is? --- # Algebraic Data Types * The kinds of data types that Haskell supports are called *algebraic data types*. * At a fundamental level, they can be built up from two operations: _Cartesian product_ and _disjoint union_. * There are other concepts to build recursive data types, but we won't need them in Chapel. - To prove to you I know what I'm talking about, some jargon: _initial algebras_, _the fixedpoint functor_, _catamorphisms_... - Check out _Bananas, Lenses, Envelopes and Barbed Wire_ by Meijer et al. for more. * __This matters because, if Chapel has these operations, we can build any data type that Haskell can.__ --- # Algebraic Data Types - The kinds of data types that Haskell supports are called *algebraic data types*. - At a fundamental level, they can be built up from two operations: _Cartesian product_ and _disjoint union_. - There are other concepts to build recursive data types, but we won't need them in Chapel. - To prove to you I know what I'm talking about, some jargon: _initial algebras_, _the fixedpoint functor_, _catamorphisms_... - Check out _Bananas, Lenses, Envelopes and Barbed Wire_ by Meijer et al. for more. - __This matters because, if Chapel has these operations, we can build any data type that Haskell can.__ --- # Cartesian Product For any two types, the _Cartesian product_ of these two types defines all pairs of values from these types. - This is like a two-element tuple _at the value level_ in Chapel. - We write this as $A \times B$ for two types $A$ and $B$. - In (type-level) Chapel and Haskell:

```Chapel record Pair { type fst; type snd; } type myPair = Pair(myVal1, myVal2); ```

```Haskell data Pair = MkPair { fst :: A , snd :: B } myPair = MkPair myVal1 myVal2 ```

--- # Disjoint Union For any two types, the _disjoint union_ of these two types defines values that are either from one type or the other. - This is _almost_ like a `union` in Chapel or C... - But there's extra information to tell us which of the two types the value is from. - We write this as $A + B$ for two types $A$ and $B$. - In Chapel and Haskell:

```Chapel record InL { type value; } record InR { type value; } type myFirstCase = InL(myVal1); type mySecondCase = InR(myVal2); ```

```Haskell data Sum = InL A | InR B myFirstCase = InL myVal1 mySecondCase = InR myVal2 ```

--- # Algebraic Data Types * We can build up more complex types by combining these two operations. * Need a triple of types $A$, $B$, and $C$? Use $A \times (B \times C)$. * Similarly, "any one of three types" can be expressed as $A + (B + C)$. * A `Option` type (in Rust, or `optional` in C++) is $T + \text{Unit}$. * `Unit` is a type with a single value (there's only one `None` / `std::nullopt`). * Notice that in Chapel, we moved up one level | Thing | Chapel | Haskell | |-------|------------------|-------------------| | `Nil` | type | value | | `Cons`| type constructor | value constructor | | List | **???** | type | --- # Algebraic Data Types * Since Chapel has no notion of a type-of-types, we can't enforce that our values are _only_ `InL` or `InR` (in the case of `Sum`). * This is why, in Chapel versions, type annotations like `A` and `B` are missing.

```Chapel record Pair { type fst; /* : A */ type snd; /* : B */ } ```

```Haskell data Pair = MkPair { fst :: A , snd :: B } ```

* So, we can't enforce that the user doesn't pass `int` to our `length` function defined on lists. * We also can't enforce that `InL` is instantiated with the right type. * So, we lose some safety compared to Haskell... * ...but we're getting the compiler to do arbitrary computations for us at compile-time. --- # Worked Example: Binary Search Tree In Haskell, binary search trees can be defined as follows: ```Haskell data BSTree = Empty | Node Int BSTree BSTree balancedOneTwoThree = Node 2 (Node 1 Empty Empty) (Node 3 Empty Empty) ``` Written using Algebraic Data Types, this is: $$ \text{BSTree} = \text{Unit} + (\text{Int} \times \text{BSTree} \times \text{BSTree}) $$ In Haskell (using sums and products): ```Haskell type BSTree' = Unit `Sum` (Int `Pair` (BSTree' `Pair` BSTree')) balancedOneTwoThree' = InR (2 `MkPair` (InR (1 `MkPair` (InL MkUnit `MkPair` InL Unit)) `MkPair` InR (3 `MkPair` (InL MkUnit `MkPair` InL Unit)))) ``` --- # Worked Example: Binary Search Tree * Recalling the Haskell version: ```Haskell type BSTree' = Unit `Sum` (Int `Pair` (BSTree' `Pair` BSTree')) balancedOneTwoThree' = InR (2 `MkPair` (InR (1 `MkPair` (InL MkUnit `MkPair` InL Unit)) `MkPair` InR (3 `MkPair` (InL MkUnit `MkPair` InL Unit)))) ``` * We can't define `BSTree'` in Chapel (no type-of-types), but we can define `balancedOneTwoThree'`: ```Chapel type balancedOneTwoThree = InR(Pair(2, Pair(InR(Pair(1, Pair(InL(), InL()))), InR(Pair(3, Pair(InL(), InL())))))); ``` * :white_check_mark: We can use algebraic data types to build arbitrarily complex data structures ◼. --- # Returning to Pragmatism * We could've defined our list type in terms of `InL`, `InR`, and `Pair`. * However, it was cleaner to make it look more like the non-ADT Haskell version. * Recall that it looked like this:

```Chapel record Nil {} record Cons { param head: int; type tail; } type myList = Cons(1, Cons(2, Cons(3, Nil))); ```

```Haskell data ListOfInts = Nil | Cons Int ListOfInts myList = Cons 1 (Cons 2 (Cons 3 Nil)) ```

* We can do the same thing for our binary search tree:

```Chapel record Empty {} record Node { param value: int; type left; type right; } type balancedOneTwoThree = Node(2, Node(1, Empty, Empty), Node(3, Empty, Empty)); ```

```Haskell data BSTree = Empty | Node Int BSTree BSTree balancedOneTwoThree = Node 2 (Node 1 Empty Empty) (Node 3 Empty Empty) ```

--- # A General Recipe To translate a Haskell data type definition to Chapel: * For each constructor, define a `record` with that constructor's name * The fields of that record are `type` fields for each argument of the constructor - If the argument is a value (like `Int`), you can make it a `param` field instead * A visual example, again:

```Chapel record C1 { type arg1; /* ... */ type argi; } // ... record Cn { type arg1; /* ... */ type argj; } ```

```Haskell data T = C1 arg1 ... argi | ... | Cn arg1 ... argj ```

--- # Inserting and Looking Up in a BST

```Chapel proc insert(type t: Empty, param x: int) type do return Node(x, Empty, Empty); proc insert(type t: Node(?v, ?left, ?right), param x: int) type do select true { when x < v do return Node(v, insert(left, x), right); otherwise do return Node(v, left, insert(right, x)); } type test = insert(insert(insert(Empty, 2), 1), 3); proc lookup(type t: Empty, param x: int) param do return false; proc lookup(type t: Node(?v, ?left, ?right), param x: int) param do select true { when x == v do return true; when x < v do return lookup(left, x); otherwise do return lookup(right, x); } ```

```Haskell insert :: Int -> BSTree -> BSTree insert x Empty = Node x Empty Empty insert x (Node v left right) | x < v = Node v (insert x left) right | otherwise = Node v left (insert x right) test = insert 3 (insert 1 (insert 2 Empty)) lookup :: Int -> BSTree -> Bool lookup x Empty = False lookup x (Node v left right) | x == v = True | x < v = lookup x left | otherwise = lookup x right ```

It really works! ```Chapel writeln(test : string); // prints Node(2,Node(1,Empty,Empty),Node(3,Empty,Empty)) writeln(lookup(test, 1)); // prints true for this one, but false for '4' ``` --- # A Key-Value Map ```Chapel record Empty {} record Node { param key: int; param value; type left; type right; } proc insert(type t: Empty, param k: int, param v) type do return Node(k, v, Empty, Empty); proc insert(type t: Node(?k, ?v, ?left, ?right), param nk: int, param nv) type do select true { when nk < k do return Node(k, v, insert(left, nk, nv), right); otherwise do return Node(k, v, left, insert(right, nk, nv)); } proc lookup(type t: Empty, param k: int) param do return "not found"; proc lookup(type t: Node(?k, ?v, ?left, ?right), param x: int) param do select true { when x == k do return v; when x < k do return lookup(left, x); otherwise do return lookup(right, x); } type test = insert(insert(insert(Empty, 2, "two"), 1, "one"), 3, "three"); writeln(lookup(test, 1)); // prints "one" writeln(lookup(test, 3)); // prints "three" writeln(lookup(test, 4)); // prints "not found" ``` --- # Conclusion * Chapel's type-level programming is surprisingly powerful. * We can write compile-time programs that are very similar to Haskell programs. * This allows us to write highly parameterized code without paying runtime overhead. * This also allows us to devise powerful compile-time checks and constraints on our code. * This approach allows for general-purpose programming, which can be applied to `your use-case`