ChapelCon2025-Slides/type-level/slides.md
Danila Fedorin 86f1ba686a Apply Jade's feedback
Signed-off-by: Danila Fedorin <daniel.fedorin@hpe.com>
2025-10-09 17:34:46 -07:00

25 KiB

theme title backgroundColor
gaia Type-Level Programming in Chapel for Compile-Time Specialization

Type-Level Programming in Chapel for Compile-Time Specialization

Daniel Fedorin, HPE


Compile-Time Programming in Chapel

  • Type variables, as their name suggests, store types instead of values.

    type myArgs = (int, real);
    
  • Procedures with type return intent can construct new types.

    proc toNilableIfClassType(type arg) type do
      if isNonNilableClassType(arg) then return arg?; else return arg;
    
  • param variables store values that are known at compile-time.

    param numberOfElements = 3;
    var threeInts: numberOfElements * int;
    
  • Compile-time conditionals are inlined at compile-time.

    if false then somethingThatWontCompile();
    

Restrictions on Compile-Time Programming

  • Compile-time operations do not have mutable state.
    • Cannot change values of param or type variables.
  • Chapel's compile-time programming does not support loops.
    • param loops are kind of an exception, but are simply unrolled.
    • Without mutability, this unrolling doesn't give us much.
  • Without state, our type and param functions are pure.

Did someone say "pure"?

I can think of another language that has pure functions...

  • Haskell doesn't have mutable state by default.
  • Haskell doesn't have imperative loops.
  • Haskell functions are pure.

Programming in Haskell

Without mutability and loops, (purely) functional programmers use pattern-matching and recursion to express their algorithms.

  • Data structures are defined by enumerating their possible cases. A list is either empty, or a head element followed by a tail list.

    data ListOfInts = Nil | Cons Int ListOfInts
    
    -- [] = Nil
    -- [1] = Cons 1 Nil
    -- [1,2,3] = Cons 1 (Cons 2 (Cons 3 Nil))
    
  • Pattern-matching is used to examine the cases of a data structure and act accordingly.

    sum :: ListOfInts -> Int
    sum Nil = 0
    sum (Cons i tail) = i + sum tail
    

Evaluating Haskell

Haskell simplifies calls to functions by picking the case based on the arguments.

sum (Cons 1 (Cons 2 (Cons 3 Nil)))

-- case: sum (Cons i tail) = i + sum tail
= 1 + sum (Cons 2 (Cons 3 Nil))

-- case: sum (Cons i tail) = i + sum tail
= 1 + (2 + sum (Cons 3 Nil))

-- case: sum (Cons i tail) = i + sum tail
= 1 + (2 + (3 + sum Nil))

-- case: sum Nil = 0
= 1 + (2 + (3 + 0))

= 6

A Familiar Pattern

Picking a case based on the arguments is very similar to Chapel's function overloading.

  • A very familiar example:
    proc foo(x: int) { writeln("int"); }
    proc foo(x: real) { writeln("real"); }
    foo(1);   // prints "int"
    
  • A slightly less familiar example:
    proc foo(type x: int) { compilerWarning("int"); }
    proc foo(type x: real) { compilerWarning("real"); }
    foo(int);   // compiler prints "int"
    

A Type-Level List

Hypothesis: we can use Chapel's function overloading and types to write functional-ish programs.

record Nil {}
record Cons { param head: int; type tail; }


type myList = Cons(1, Cons(2, Cons(3, Nil)));



proc sum(type x: Nil) param do return 0;
proc sum(type x: Cons(?i, ?tail)) param do return i + sum(tail);

compilerWarning(sum(myList) : string); // compiler prints 6
data ListOfInts = Nil
                | Cons Int ListOfInts


myList = Cons 1 (Cons 2 (Cons 3 Nil))


sum :: ListOfInts -> Int
sum Nil = 0
sum (Cons i tail) = i + sum tail

                                                                

Type-Level Programming at Compile-Time

After resolution, our original program:

record Nil {}
record Cons { param head: int; type tail; }

type myList = Cons(1, Cons(2, Cons(3, Nil)));

proc sum(type x: Nil) param do return 0;
proc sum(type x: Cons(?i, ?tail)) param do return i + sum(tail);

writeln(sum(myList) : string); // compiler prints 6

Becomes:

writeln("6");

There is no runtime overhead!


Type-Level Programming at Compile-Time

width:400px


Type-Level Programming at Compile-Time

Why would I want to do this?!

- You, probably

  • Do you want to write parameterized code, without paying runtime overhead for the runtime parameters?
    • Worked example: linear multi-step method approximator
  • Do you want to have powerful compile-time checks and constraints on your function types?
    • Worked example: type-safe printf function

Linear Multi-Step Method Approximator


Euler's Method

A first-order differential equation can be written in the following form:


y' = f(t, y)

In other words, the derivative of of y depends on t and y itself. There is no solution to this equation in general; we have to approximate.

If we know an initial point (t_0, y_0), we can approximate other points. To get the point at t_1 = t_0 + h, we can use the formula:


\begin{align*}
y'(t_0) & = f(t_0, y_0) \\
y(t_0+h) & \approx y_0 + h \times y'(t_0) \\
         & \approx y_0 + h \times f(t_0, y_0)
\end{align*}

We can name the first approximated $y$-value y_1, and set it:


y_1 = y_0 + h \times f(t_0, y_0)

Euler's Method

On the previous slide, we got a new point (t_1, y_1). We can repeat the process to get y_2:


\begin{array}{c}
y_2 = y_1 + h \times f(t_1, y_1) \\
y_3 = y_2 + h \times f(t_2, y_2) \\
y_4 = y_3 + h \times f(t_3, y_3) \\
\cdots \\
y_{n+1} = y_n + h \times f(t_n, y_n) \\
\end{array}

Euler's Method in Chapel

This can be captured in a simple Chapel procedure:

proc runEulerMethod(step: real, count: int, t0: real, y0: real) {
    var y = y0;
    var t = t0;
    for i in 1..count {
        y += step*f(t,y);
        t += step;
    }
    return y;
}

Other Methods

  • In Euler's method, we look at the slope of a function at a particular point, and use it to extrapolate the next point.

  • Once we've computed a few points, we have more information we can incorporate.

    • When computing y_2, we can use both y_0 and y_1.
    • To get a good approximation, we have to weight the points differently.
      
      y_{n+2} = y_{n+1} + h \left(\frac{3}{2}f(t_{n+1}, y_{n+1}) - \frac{1}{2}f(t_{n}, y_{n})\right)
      
    • More points means better accuracy, but more computation.
  • There are other methods that use more points and different weights.

    • Another method is as follows:
      
      y_{n+3} = y_{n+2} + h \left(\frac{23}{12}f(t_{n+2}, y_{n+2}) - \frac{16}{12}f(t_{n+1}, y_{n+1}) + \frac{5}{12}f(t_{n}, y_{n})\right)
      

Generalizing Multi-Step Methods

Explicit Adams-Bashforth methods in general can be encoded as the coefficients used to weight the previous points.

Method Equation Coefficient List
Euler's method y_{n+1} = y_n + h \times f(t_n, y_n) 1
Two-step A.B. y_{n+2} = y_{n+1} + h \left(\frac{3}{2}f(t_{n+1}, y_{n+1}) - \frac{1}{2}f(t_{n}, y_{n})\right) \frac{3}{2},-\frac{1}{2}

Generalizing Multi-Step Methods

Explicit Adams-Bashforth methods in general can be encoded as the coefficients used to weight the previous points.

Method Equation Chapel Type Expression
Euler's method y_{n+1} = y_n + h \times f(t_n, y_n) Cons(1,Nil)
Two-step A.B. y_{n+2} = y_{n+1} + h \left(\frac{3}{2}f(t_{n+1}, y_{n+1}) - \frac{1}{2}f(t_{n}, y_{n})\right) Cons(3/2,Cons(-1/2, Nil))

Supporting Functions for Coefficient Lists

proc length(type x: Cons(?w, ?t)) param do return 1 + length(t);
proc length(type x: Nil)          param do return 0;

proc coeff(param x: int, type lst: Cons(?w, ?t)) param where x == 0 do return w;
proc coeff(param x: int, type lst: Cons(?w, ?t)) param where x  > 0 do return coeff(x-1, t);

A General Solver

proc runMethod(type method, h: real, count: int, start: real,
               in ys: real ... length(method)): real {
  • type method accepts a type-level list of coefficients.
  • h encodes the step size.
  • start is t_0, the initial time.
  • count is the number of steps to take.
  • in ys makes the function accept as many real values (for y_0, y_1, \ldots) as there are weights

A General Solver

param coeffCount = length(method);
// Repeat the methods as many times as requested
for i in 1..count {
    // We're computing by adding h*b_j*f(...) to y_n.
    // Set total to y_n.
    var total = ys(coeffCount - 1);
    // 'for param' loops are unrolled at compile-time -- this is just
    // like writing out each iteration by hand.
    for param j in 1..coeffCount do
        // For each coefficient b_j given by coeff(j, method),
        // increment the total by h*bj*f(...)
        total += step * coeff(j, method) *
            f(start + step*(i-1+coeffCount-j), ys(coeffCount-j));
    // Shift each y_i over by one, and set y_{n+s} to the
    // newly computed total.
    for param j in 0..< coeffCount - 1 do
        ys(j) = ys(j+1);
    ys(coeffCount - 1) = total;
}
// return final y_{n+s}
return ys(coeffCount - 1);

Using the General Solver

type euler = cons(1.0, empty);
type adamsBashforth = cons(3.0/2.0, cons(-0.5, empty));
type someThirdMethod = cons(23.0/12.0, cons(-16.0/12.0, cons(5.0/12.0, empty)));

Take a simple differential equation y' = y. For this, define f as follows:

proc f(t: real, y: real) do return y;

Now, we can run Euler's method like so:

writeln(runMethod(euler, step=0.5, count=4, start=0, 1)); // 5.0625

To run the 2-step Adams-Bashforth method, we need two initial values:

var y0 = 1.0;
var y1 = runMethod(euler, step=0.5, count=1, start=0, 1);
writeln(runMethod(adamsBashforth, step=0.5, count=3, start=0.5, y0, y1)); // 6.02344

The General Solver

We can now construct solvers for any explicit Adams-Bashforth method, without writing any new code.


Type-Safe printf


The printf Function

The printf function accepts a format string, followed by a variable number of arguments that should match:

// totally fine:
printf("Hello, %s! Your ChapelCon submission is #%d\n", "Daniel", 18);

// not good:
printf("Hello, %s! Your ChapelCon submission is #%d\n", 18, "Daniel");

Can we define a printf function in Chapel that is type-safe?


Yet Another Type-Level List

  • The general idea for type-safe printf: take the format string, and extract a list of the expected argument types.

  • To make for nicer error messages, include a human-readable description of each type in the list.

  • I've found it more convenient to re-define lists for various problems when needed, rather than having a single canonical list definition.

record _nil {
  proc type length param do return 0;
}
record _cons {
  type expectedType;  // type of the argument to printf
  param name: string; // human-readable name of the type
  type rest;

  proc type length param do return 1 + rest.length();
}

Extracting Types from Format Strings

proc specifiers(param s: string, param i: int = 0) type {
  if i >= s.size then return _nil;

  if s[i] == "%" {
    if i + 1 >= s.size then
        compilerError("Invalid format string: unterminted %");

    select s[i + 1] {
      when "%" do return specifiers(s, i + 2);
      when "s" do return _cons(string, "a string", specifiers(s, i + 2));
      when "i" do return _cons(int, "a signed integer", specifiers(s, i + 2));
      when "u" do return _cons(uint, "an unsigned integer", specifiers(s, i + 2));
      when "n" do return _cons(numeric, "a numeric value", specifiers(s, i + 2));
      otherwise do compilerError("Invalid format string: unknown format type");
    }
  } else {
    return specifiers(s, i + 1);
  }
}

Extracting Types from Format Strings

Let's give it a quick try:

writeln(specifiers("Hello, %s! Your ChapelCon submission is #%i\n") : string);

The above prints:

_cons(string,"a string",_cons(int(64),"a signed integer",_nil))

Validating Argument Types

  • The Chapel standard library has a nice isSubtype function that we can use to check if an argument matches the expected type.

  • Suppose the .length of our type specifiers matches the number of arguments to printf

  • Chapel doesn't currently support empty tuples, so if the lengths match, we know that specifiers is non-empty.

  • Then, we can validate the types as follows:

    proc validate(type specifiers: _cons(?t, ?s, ?rest), type argTup, param idx) {
      if !isSubtype(argTup[idx], t) then
        compilerError("Argument " + (idx + 1) : string + " should be " + s + " but got " + argTup[idx]:string, idx+2);
    
      if idx + 1 < argTup.size then
        validate(rest, argTup, idx + 1);
    }
    
  • The idx+2 argument to compilerError avoids printing the recursive validate calls in the error message.


The fprintln overloads

  • I named it fprintln for "formatted print line".

  • To support the empty-specifier case (Chapel varargs don't allow zero arguments):

    proc fprintln(param format: string) where specifiers(format).length == 0 {
      writeln(format);
    }
    
  • If we do have type specifiers, to ensure our earlier assumption of size matching:

    proc fprintln(param format: string, args...)
      where specifiers(format).length != args.size {
      compilerError("'fprintln' with this format string expects " +
                    specifiers(format).length : string +
                    " argument(s) but got " + args.size : string);
    }
    

The fprintln overloads

  • All that's left is the main fprintln implementation:

    proc fprintln(param format: string, args...) {
      validate(specifiers(format), args.type, 0);
    
      writef(format + "\n", (...args));
    }
    

Using fprintln

fprintln("Hello, world!");        // fine, prints "Hello, world!"
fprintln("The answer is %i", 42); // fine, prints "The answer is 42"

// compiler error: Argument 3 should be a string but got int(64)
fprintln("The answer is %i %i %s", 1, 2, 3);

More work could be done to support more format specifiers, escapes, etc., but the basic idea is there.


Beyond Lists


Beyond Lists

  • I made grand claims earlier
    • "Write functional-ish program at the type level!"
  • So far, we've just used lists and some recursion.
  • Is that all there is?

Algebraic Data Types

  • The kinds of data types that Haskell supports are called algebraic data types.
  • At a fundamental level, they can be built up from two operations: Cartesian product and disjoint union.
  • There are other concepts to build recursive data types, but we won't need them in Chapel.
    • To prove to you I know what I'm talking about, some jargon: initial algebras, the fixedpoint functor, catamorphisms...
    • Check out Bananas, Lenses, Envelopes and Barbed Wire by Meijer et al. for more.
  • This matters because, if Chapel has these operations, we can build any data type that Haskell can.

Algebraic Data Types

  • The kinds of data types that Haskell supports are called algebraic data types.
  • At a fundamental level, they can be built up from two operations: Cartesian product and disjoint union.
  • There are other concepts to build recursive data types, but we won't need them in Chapel.
    • To prove to you I know what I'm talking about, some jargon: initial algebras, the fixedpoint functor, catamorphisms...
    • Check out Bananas, Lenses, Envelopes and Barbed Wire by Meijer et al. for more.
  • This matters because, if Chapel has these operations, we can build any data type that Haskell can.

Cartesian Product

For any two types, the Cartesian product of these two types defines all pairs of values from these types.

  • This is like a two-element tuple at the value level in Chapel.

  • We write this as A \times B for two types A and B.

  • In (type-level) Chapel and Haskell:

    record Pair {
        type fst;
        type snd;
    }
    
    type myPair = Pair(myVal1, myVal2);
    
    data Pair = MkPair
        { fst :: A
        , snd :: B
        }
    
    myPair = MkPair myVal1 myVal2
    

Disjoint Union

For any two types, the disjoint union of these two types defines values that are either from one type or the other.

  • This is almost like a union in Chapel or C...

  • But there's extra information to tell us which of the two types the value is from.

  • We write this as A + B for two types A and B.

  • In Chapel and Haskell:

    
    record InL { type value; }
    record InR { type value; }
    
    type myFirstCase = InL(myVal1);
    type mySecondCase = InR(myVal2);
    
    data Sum
      = InL A
      | InR B
    
    myFirstCase = InL myVal1
    mySecondCase = InR myVal2
    

Algebraic Data Types

  • We can build up more complex types by combining these two operations.

    • Need a triple of types A, B, and C? Use A \times (B \times C).
    • Similarly, "any one of three types" can be expressed as A + (B + C).
    • A Option<T> type (in Rust, or optional<T> in C++) is T + \text{Unit}.
      • Unit is a type with a single value (there's only one None / std::nullopt).
  • Notice that in Chapel, we moved up one level

    Thing Chapel Haskell
    Nil type value
    Cons type constructor value constructor
    List ??? type

Algebraic Data Types

  • Since Chapel has no notion of a type-of-types, we can't enforce that our values are only InL or InR (in the case of Sum).

  • This is why, in Chapel versions, type annotations like A and B are missing.

    record Pair {
        type fst; /* : A */
        type snd; /* : B */
    }
    
    data Pair = MkPair
        { fst :: A
        , snd :: B
        }
    
  • So, we can't enforce that the user doesn't pass int to our length function defined on lists.

  • We also can't enforce that InL is instantiated with the right type.

  • So, we lose some safety compared to Haskell...

  • ...but we're getting the compiler to do arbitrary computations for us at compile-time.


Worked Example: Binary Search Tree

In Haskell, binary search trees can be defined as follows:

data BSTree = Empty
            | Node Int BSTree BSTree

balancedOneTwoThree = Node 2 (Node 1 Empty Empty) (Node 3 Empty Empty)

Written using Algebraic Data Types, this is:


\text{BSTree} = \text{Unit} + (\text{Int} \times \text{BSTree} \times \text{BSTree})

In Haskell (using sums and products):

type BSTree' = Unit `Sum` (Int `Pair` (BSTree' `Pair` BSTree'))

balancedOneTwoThree' = InR (2 `MkPair` (InR (1 `MkPair` (InL MkUnit `MkPair` InL Unit)) `MkPair`
                                        InR (3 `MkPair` (InL MkUnit `MkPair` InL Unit))))

Worked Example: Binary Search Tree

  • Recalling the Haskell version:

    type BSTree' = Unit `Sum` (Int `Pair` (BSTree' `Pair` BSTree'))
    
    balancedOneTwoThree' = InR (2 `MkPair` (InR (1 `MkPair` (InL MkUnit `MkPair` InL Unit)) `MkPair`
                                            InR (3 `MkPair` (InL MkUnit `MkPair` InL Unit))))
    
  • We can't define BSTree' in Chapel (no type-of-types), but we can define balancedOneTwoThree':

    type balancedOneTwoThree =
      InR(Pair(2, Pair(InR(Pair(1, Pair(InL(), InL()))),
                       InR(Pair(3, Pair(InL(), InL()))))));
    
  • We can use algebraic data types to build arbitrarily complex data structures ◼.


Returning to Pragmatism

  • We could've defined our list type in terms of InL, InR, and Pair.

  • However, it was cleaner to make it look more like the non-ADT Haskell version.

  • Recall that it looked like this:

    record Nil {}
    record Cons { param head: int; type tail; }
    
    type myList = Cons(1, Cons(2, Cons(3, Nil)));
    
    data ListOfInts = Nil
                    | Cons Int ListOfInts
    
    myList = Cons 1 (Cons 2 (Cons 3 Nil))
    
  • We can do the same thing for our binary search tree:

    record Empty {}
    record Node { param value: int; type left; type right; }
    
    type balancedOneTwoThree = Node(2, Node(1, Empty, Empty),
                                       Node(3, Empty, Empty));
    
    data BSTree = Empty
                | Node Int BSTree BSTree
    
    balancedOneTwoThree = Node 2 (Node 1 Empty Empty)
                                 (Node 3 Empty Empty)
    

A General Recipe

To translate a Haskell data type definition to Chapel:

  • For each constructor, define a record with that constructor's name

  • The fields of that record are type fields for each argument of the constructor

    • If the argument is a value (like Int), you can make it a param field instead
  • A visual example, again:

    record C1 { type arg1; /* ... */ type argi; }
    // ...
    record Cn { type arg1; /* ... */ type argj; }
    
    data T = C1 arg1 ... argi
           | ...
           | Cn arg1 ... argj
    

Inserting and Looking Up in a BST


proc insert(type t: Empty, param x: int) type do return Node(x, Empty, Empty);
proc insert(type t: Node(?v, ?left, ?right), param x: int) type do
    select true {
      when x < v do return Node(v, insert(left, x), right);
      otherwise do return Node(v, left, insert(right, x));
    }

type test = insert(insert(insert(Empty, 2), 1), 3);



proc lookup(type t: Empty, param x: int) param do return false;
proc lookup(type t: Node(?v, ?left, ?right), param x: int) param do
    select true {
      when x == v do return true;
      when x < v do return lookup(left, x);
      otherwise do return lookup(right, x);
    }
insert :: Int -> BSTree -> BSTree
insert x Empty = Node x Empty Empty
insert x (Node v left right)

    | x < v     = Node v (insert x left) right
    | otherwise = Node v left (insert x right)
                                                                              

test = insert 3 (insert 1 (insert 2 Empty))


lookup :: Int -> BSTree -> Bool
lookup x Empty = False
lookup x (Node v left right)

    | x == v    = True
    | x < v     = lookup x left
    | otherwise = lookup x right

It really works!

writeln(test : string);
// prints Node(2,Node(1,Empty,Empty),Node(3,Empty,Empty))

writeln(lookup(test, 1));
// prints true for this one, but false for '4'

A Key-Value Map

record Empty {}
record Node { param key: int; param value; type left; type right; }

proc insert(type t: Empty, param k: int, param v) type do return Node(k, v, Empty, Empty);
proc insert(type t: Node(?k, ?v, ?left, ?right), param nk: int, param nv) type do
    select true {
      when nk < k do return Node(k, v, insert(left, nk, nv), right);
      otherwise do return Node(k, v, left, insert(right, nk, nv));
    }

proc lookup(type t: Empty, param k: int) param do return "not found";
proc lookup(type t: Node(?k, ?v, ?left, ?right), param x: int) param do
    select true {
      when x == k do return v;
      when x < k do return lookup(left, x);
      otherwise do return lookup(right, x);
    }

type test = insert(insert(insert(Empty, 2, "two"), 1, "one"), 3, "three");
writeln(lookup(test, 1)); // prints "one"
writeln(lookup(test, 3)); // prints "three"
writeln(lookup(test, 4)); // prints "not found"

Conclusion

  • Chapel's type-level programming is surprisingly powerful.
  • We can write compile-time programs that are very similar to Haskell programs.
  • This allows us to write highly parameterized code without paying runtime overhead.
  • This also allows us to devise powerful compile-time checks and constraints on our code.
  • This approach allows for general-purpose programming, which can be applied to your use-case