Add a post on Chapel's runtime types
Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
This commit is contained in:
parent
40ea9ec637
commit
62c338e382
591
content/blog/chapel_runtime_types.md
Normal file
591
content/blog/chapel_runtime_types.md
Normal file
@ -0,0 +1,591 @@
|
||||
---
|
||||
title: "Chapel's Runtime Types as an Interesting Alternative to Dependent Types"
|
||||
date: 2025-03-02T22:52:01-08:00
|
||||
tags: ["Chapel", "C++", "Idris", "Programming Languages"]
|
||||
description: "In this post, I discuss Chapel's runtime types as a limited alternative to dependent types."
|
||||
---
|
||||
|
||||
One day, when I was in graduate school, the Programming Languages research
|
||||
group was in a pub for a little gathering. Amidst beers, fries, and overpriced
|
||||
sandwiches, the professor and I were talking about [dependent types](https://en.wikipedia.org/wiki/Dependent_type). Speaking
|
||||
loosely and imprecisely, these are types that are somehow constructed from
|
||||
_values_ in a language, like numbers.
|
||||
|
||||
For example, in C++, [`std::array`](https://en.cppreference.com/w/cpp/container/array)
|
||||
is a dependent type. An instantiation of the _type_ `array`, like `array<string, 3>`
|
||||
is constructed from the type of its elements (here, `string`) and a value
|
||||
representing the number of elements (here, `3`). This is in contrast with types
|
||||
like `std::vector`, which only depends on a type (e.g., `vector<string>` would
|
||||
be a dynamically-sized collection of strings).
|
||||
|
||||
I was extolling the virtues of general dependent types, like you might find
|
||||
in [Idris](https://www.idris-lang.org/) or [Agda](https://agda.readthedocs.io/en/latest/getting-started/what-is-agda.html):
|
||||
more precise function signatures! The
|
||||
{{< sidenote "right" "curry-howard-note" "Curry-Howard isomorphism!" >}}
|
||||
The Curry-Howard isomorphism is a common theme on this blog. I've
|
||||
<a href="{{< relref "typesafe_interpreter_revisited#curry-howard-correspondence" >}}">
|
||||
written about it myself</a>, but you can also take a look at the
|
||||
<a href="https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspondence">
|
||||
Wikipedia page</a>.
|
||||
{{< /sidenote >}} The professor was skeptical. He had been excited about
|
||||
dependent types in the past, but nowadays he felt over them. They were cool, he
|
||||
said, but there are few practical uses. In fact, he posed a challenge:
|
||||
|
||||
> Give me one good reason to use dependent types in practice that doesn't
|
||||
> involve keeping track of bounds for lists and matrices!
|
||||
{#bounds-quote}
|
||||
|
||||
This challenge alludes to fixed-length lists -- [vectors](https://agda.github.io/agda-stdlib/master/Data.Vec.html)
|
||||
-- which are one of the first dependently-typed data structures one learns about.
|
||||
Matrices are effectively vectors-of-vectors. In fact, even in giving my introductory
|
||||
example above, I demonstrated the C++ equivalent of a fixed-length list, retroactively
|
||||
supporting the professor's point.
|
||||
|
||||
It's not particularly important to write down how I addressed the challenge;
|
||||
suffice it to say that the notion resonated with some of the other
|
||||
students present in the pub. In the midst of practical development, how much
|
||||
of dependent types' power can you leverage, and how much power do you pay
|
||||
for but never use?
|
||||
|
||||
A second round of beers arrived. The argument was left largely unresolved,
|
||||
and conversation flowed to other topics. Eventually, I graduated, and started
|
||||
working on the [Chapel language](https://chapel-lang.org/) team (I also
|
||||
[write on the team's blog](https://chapel-lang.org/blog/authors/daniel-fedorin/)).
|
||||
|
||||
When I started looking at Chapel programs, I could not believe my eyes...
|
||||
|
||||
### A Taste of Chapel's Array Types
|
||||
|
||||
Here's a simple Chapel program that creates an array of 10 integers.
|
||||
|
||||
```Chapel
|
||||
var A: [0..9] int;
|
||||
```
|
||||
|
||||
Do you see the similarity to the `std::array` example above? Of course, the
|
||||
syntax is quite different, but in _essence_ I think the resemblance is
|
||||
uncanny. Let's mangle the type a bit --- producing invalid Chapel programs ---
|
||||
just for the sake of demonstration.
|
||||
|
||||
```Chapel
|
||||
var B: array(0..9, int); // first, strip the syntax sugar
|
||||
var C: array(int, 0..9); // swap the order of the arguments to match C++
|
||||
```
|
||||
|
||||
Only one difference remains: in C++, arrays are always indexed from zero. Thus,
|
||||
writing `array<int, 10>` would implicitly create an array whose indices start
|
||||
with `0` and end in `9`. In Chapel, array indices can start at values other
|
||||
than zero (it happens to be useful for elegantly writing numerical programs),
|
||||
so the type explicitly specifies a lower and a higher bound. Other than that,
|
||||
though, the two types look very similar.
|
||||
|
||||
In general, Chapel arrays have a _domain_, typically stored in variables like `D`.
|
||||
The domain of `A` above is `{0..9}`. This domain is part of the array's type.
|
||||
|
||||
Before I move on, I'd like to pause and state a premise that is crucial
|
||||
for the rest of this post: __I think knowing the size of a data structure,
|
||||
like `std::array` or Chapel's `[0..9] int`, is valuable__. If this premise
|
||||
were not true, there'd be no reason to prefer `std::array` to `std::vector`, or
|
||||
care that Chapel has indexed arrays. However, having this information
|
||||
can help in numerous ways, such as:
|
||||
|
||||
* __Enforcing compatible array shapes.__ For instance, the following Chapel
|
||||
code would require two arrays passed to function `foo` to have the same size.
|
||||
|
||||
```Chapel
|
||||
proc doSomething(people: [?D] person, data: [D] personInfo) {}
|
||||
```
|
||||
|
||||
Similarly, we can enforce the fact that an input to a function has the same shape
|
||||
as the output:
|
||||
|
||||
```Chapel
|
||||
proc transform(input: [?D] int): [D] string;
|
||||
```
|
||||
* __Consistency in generics__. Suppose you have a generic function that declares
|
||||
a new variable of a given type, and just returns it:
|
||||
|
||||
```Chapel
|
||||
proc defaultValue(type argType) {
|
||||
var x: argType;
|
||||
return x;
|
||||
}
|
||||
```
|
||||
|
||||
Code like this exists in "real" Chapel software, by the way --- the example
|
||||
is not contrived. By including the bounds etc. into the array type, we can
|
||||
ensure that `x` is appropriately allocated. Then, `defaultValue([1,2,3].type)`
|
||||
would return an array of three default-initialized integers.
|
||||
* __Eliding boundary checking__. Boundary checking is useful for safety,
|
||||
since it ensures that programs don't read or write past the end of allocated
|
||||
memory. However, bounds checking is also slow. Consider the following function that
|
||||
sums two arrays:
|
||||
|
||||
```Chapel
|
||||
proc sumElementwise(A: [?D] int, B: [D] int) {
|
||||
var C: [D] int;
|
||||
for idx in D do
|
||||
C[idx] = A[idx] + B[idx];
|
||||
}
|
||||
```
|
||||
|
||||
Since arrays `A`, `B`, and `C` have the same domain `D`, we don't need
|
||||
to do bound checking when accessing any of their elements. I don't believe
|
||||
this is currently an optimisation in Chapel, but it's certainly on the
|
||||
table.
|
||||
* __Documentation__. Including the size of the array as part of type
|
||||
signature clarifies the intent of the code being written. For instance,
|
||||
in the following function:
|
||||
|
||||
```Chapel
|
||||
proc sendEmails(numEmails: int, destinationAddrs: [1..numEmails] address) { /* ... */ }
|
||||
```
|
||||
|
||||
It's clear from the type of the `destinationAddrs`s that there ought to
|
||||
be exactly as many `destinationAddrs` as the number of emails that should
|
||||
be sent.
|
||||
|
||||
Okay, recap: C++ has `std::array`, which is a dependently-typed container
|
||||
that represents an array with a fixed number of elements. Chapel has something
|
||||
similar. I think these types are valuable.
|
||||
|
||||
At this point, it sort of looks like I'm impressed with Chapel for copying a C++
|
||||
feature from 2011. Not so! As I played with Chapel programs more and more,
|
||||
arrays miraculously supported patterns that I knew I couldn't write in C++.
|
||||
The underlying foundation of Chapel's array types is quite unlike any other.
|
||||
Before we get to that, though, let's take a look at how dependent types
|
||||
are normally used (by us mere mortal software engineers).
|
||||
|
||||
### Difficulties with Dependent Types
|
||||
|
||||
Let's start by looking at a simple operation on fixed-length lists: reversing them.
|
||||
One might write a reverse function for "regular" lists, ignoring details
|
||||
like ownership, copying, that looks like this:
|
||||
|
||||
```C++
|
||||
std::vector<int> reverse(std::vector<int>);
|
||||
```
|
||||
|
||||
This function is not general: it won't help us reverse lists of
|
||||
strings, for instance. The "easy fix" is to replace `int` with some kind
|
||||
of placeholder that can be replaced with any type.
|
||||
|
||||
```C++
|
||||
std::vector<T> reverse(std::vector<T>);
|
||||
```
|
||||
|
||||
You can try compiling this code, but you will immediately run into an error.
|
||||
What the heck is `T`? Normally,
|
||||
when we name a variable, function, or type (e.g., by writing `vector`, `reverse`),
|
||||
we are referring to its declaration somewhere else. At this time, `T` is not
|
||||
declared anywhere. It just "appears" in our function's type. To fix this,
|
||||
we add a declaration for `T` by turning `reverse` into a template:
|
||||
|
||||
```C++
|
||||
template <typename T>
|
||||
std::vector<T> reverse(std::vector<T>);
|
||||
```
|
||||
|
||||
The new `reverse` above takes two arguments: a type and a list of values of
|
||||
that type. So, to _really_ call this `reverse`, we need to feed the type
|
||||
of our list's elements into it. This is normally done automatically
|
||||
(in C++ and otherwise) but under the hood, invocations might look like this:
|
||||
|
||||
```C++
|
||||
reverse<int>({1,2,3}); // produces 3, 2, 1
|
||||
reverse<string>({"world", "hello"}) // produces "hello", "world"
|
||||
```
|
||||
|
||||
This is basically what we have to do to write `reverse` on `std::array`, which,
|
||||
includes an additional parameter that encodes its length. We might start with
|
||||
the following (using `n` as a placeholder for length, and observing that
|
||||
reversing an array doesn't change its length):
|
||||
|
||||
```C++
|
||||
std::array<T, n> reverse(std::array<T, n>);
|
||||
```
|
||||
|
||||
Once again, to make this compile, we need to add template parameters for `T` and `n`.
|
||||
|
||||
```C++
|
||||
template <typename T, size_t n>
|
||||
std::array<T, n> reverse(std::array<T, n>);
|
||||
```
|
||||
|
||||
Now, you might be asking...
|
||||
|
||||
{{< dialog >}}
|
||||
{{< message "question" "reader" >}}
|
||||
This section is titled "Difficulties with Dependent Types". What's the difficulty?
|
||||
{{< /message >}}
|
||||
{{< /dialog >}}
|
||||
|
||||
Well, here's the kicker. C++ templates are a __compile-time mechanism__. As
|
||||
a result, arguments to `template` (like `T` and `n`) must be known when the
|
||||
program is being compiled. This, in turn, means
|
||||
{{< sidenote "right" "deptype-note" "the following program doesn't work:" >}}
|
||||
The observant reader might have noticed that one of the Chapel programs we
|
||||
saw above, <code>sendEmails</code>, does something similar. The
|
||||
<code>numEmails</code> argument is used in the type of the
|
||||
<code>destinationAddrs</code> parameter. That program is valid Chapel.
|
||||
{{< /sidenote >}}
|
||||
|
||||
```C++
|
||||
void buildArray(size_t len) {
|
||||
std::array<int, len> myArray;
|
||||
// do something with myArray
|
||||
}
|
||||
```
|
||||
|
||||
You can't use these known-length types like `std::array` with any length
|
||||
that is not known at compile-time. But that's a lot of things! If you're reading
|
||||
from an input file, chances are, you don't know how big that file is. If you're
|
||||
writing a web server, you likely don't know the length the HTTP requests.
|
||||
With every setting a user can tweak when running your code, you sacrifice the
|
||||
ability to use templated types.
|
||||
|
||||
Also, how do you _return_ a `std::array`? If the size of the returned array is
|
||||
known in advance, you just list that size:
|
||||
|
||||
```C++
|
||||
std::array<int, 10> createArray();
|
||||
```
|
||||
|
||||
If the size is not known at compile-time, you might want to do something like
|
||||
the following --- using an argument `n` in the type of the returned array ---
|
||||
but it would not compile:
|
||||
|
||||
```C++
|
||||
auto computeNNumbers(size_t n) -> std::array<int, n>; // not valid C++
|
||||
```
|
||||
|
||||
Moreover, you actually can't use `createArray` to figure out the required
|
||||
array size, and _then_ return an array that big, even if in the end you
|
||||
only used compile-time-only computations in the body of `createArray`.
|
||||
What you would need is to provide a "bundle" of a value and a type that is somehow
|
||||
built from that value.
|
||||
|
||||
```C++
|
||||
// magic_pair is invented syntax, will not even remotely work
|
||||
auto createArray() -> magic_pair<size_t size, std::array<int, size>>;
|
||||
```
|
||||
|
||||
This pair contains a `size` (suppose it's known at compilation time for
|
||||
the purposes of appeasing C++) as well as an array that uses that `size`
|
||||
as its template argument. This is not real C++ -- not even close -- but
|
||||
such pairs are a well-known concept. They are known as
|
||||
[dependent pairs](https://unimath.github.io/agda-unimath/foundation.dependent-pair-types.html),
|
||||
or, if you're trying to impress people, \(\Sigma\)-types. In Idris, you
|
||||
could write `createArray` like this:
|
||||
|
||||
```Idris
|
||||
createArray : () -> (n : Nat ** Vec n Int)
|
||||
```
|
||||
|
||||
There are languages out there -- that are not C++, alas -- that support
|
||||
dependent pairs, and as a result make it more convenient to use types that
|
||||
depend on values. Not only that, but a lot of these languages do not force
|
||||
dependent types to be determined at compile-time. You could write that
|
||||
coveted `readArrayFromFile` function:
|
||||
|
||||
```Idris
|
||||
readArrayFromFile : String -> IO (n : Nat ** Vec n String)
|
||||
```
|
||||
|
||||
Don't mind `IO`; in pure languages like Idris, this type is a necessity when
|
||||
interacting when reading data in and sending it out. The key is that
|
||||
`readArrayFromFile` produces, at runtime, a pair of `n`, which is the size
|
||||
of the resulting array, and a `Vec` of that many `String`s (e.g., one string
|
||||
per line of the file).
|
||||
|
||||
Dependent pairs are cool and very general. However, the end result of
|
||||
types with bounds which are not determined at compile-time is that you're
|
||||
_required_ to use dependent pairs. Thus, you must always carry the array's length
|
||||
together with the array itself.
|
||||
|
||||
The bottom line is this:
|
||||
|
||||
* In true dependently typed languages, a type that depends on a value (like `Vec`
|
||||
in Idris) lists that value in its type. When this value is listed by
|
||||
referring to an identifier --- like `n` in `Vec n String` above --- this
|
||||
identifier has to be defined somewhere, too. This necessitates dependent pairs,
|
||||
in which the first element is used syntactically as the "definition point"
|
||||
of a type-level value. For example, in the following piece of code:
|
||||
|
||||
```Idris
|
||||
(n : Nat ** Vec n String)
|
||||
```
|
||||
|
||||
The `n : Nat` part of the pair serves both to say that the first element
|
||||
is a natural number, and to introduce a variable `n` that refers to
|
||||
this number so that the second type (`Vec n String`) can refer to it.
|
||||
|
||||
A lot of the time, you end up carrying this extra value (bound to `n` above)
|
||||
with your type.
|
||||
* In more mainstream languages, things are even more restricted: dependently
|
||||
typed values are a compile-time property, and thus, cannot be used with
|
||||
runtime values like data read from a file, arguments passed in to a function,
|
||||
etc..
|
||||
|
||||
### Hiding Runtime Values from the Type
|
||||
|
||||
Let's try to think of ways to make things more convenient. First of all, as
|
||||
we saw, in Idris, it's possible to use runtime values in types. Not only that,
|
||||
but Idris is a compiled language, so presumably we can compile dependently typed programs
|
||||
with runtime-enabled dependent types. The trick is to forget some information:
|
||||
turn a vector `Vec n String` into two values (the size of the vector and the
|
||||
vector itself), and forget -- for the purposes of generating code -- that they're
|
||||
related. Whenever you pass in a `Vec n String`, you can compile that similarly
|
||||
to how you'd compile passing in a `Nat` and `List String`. Since the program has
|
||||
already been type checked, you can be assured that you don't encounter cases
|
||||
when the size and the actual vector are mismatched, or anything else of that
|
||||
nature.
|
||||
|
||||
Additionally, you don't always need the length of the vector at all. In a
|
||||
good chunk of Idris code, the size arguments are only used to ensure type
|
||||
correctness and rule out impossible cases; they are never accessed at runtime.
|
||||
As a result, you can _erase_ the size of the vector altogether. In fact,
|
||||
[Idris 2](https://github.com/idris-lang/Idris2/) leans on [Quantitative Type Theory](https://bentnib.org/quantitative-type-theory.html)
|
||||
to make erasure easier.
|
||||
|
||||
At this point, one way or another, we've "entangled" the vector with a value
|
||||
representing its size:
|
||||
|
||||
* When a vector of some (unknown, but fixed) length needs to be produced from
|
||||
a function, we use dependent pairs.
|
||||
* Even in other cases, when compiling, we end up treating a vector as a
|
||||
length value and the vector itself.
|
||||
|
||||
Generally speaking, a good language design practice is to hide extraneous
|
||||
complexity, and to remove as much boilerplate as necessary. If the size
|
||||
value of a vector is always joined at the hip with the vector, can we
|
||||
avoid having to explicitly write it?
|
||||
|
||||
This is pretty much exactly what Chapel does. It _allows_ explicitly writing
|
||||
the domain of an array as part of its type, but doesn't _require_ it. When
|
||||
you do write it (re-using my original snippet above):
|
||||
|
||||
```Chapel
|
||||
var A: [0..9] int;
|
||||
```
|
||||
|
||||
What you are really doing is creating a value (the [range](https://chapel-lang.org/docs/primers/ranges.html) `0..9`),
|
||||
and entangling it with the type of `A`. This is very similar to what a language
|
||||
like Idris would do under the hood to compile a `Vec`, though it's not quite
|
||||
the same.
|
||||
|
||||
At the same time, you can write code that omits the bounds altogether:
|
||||
|
||||
```Chapel
|
||||
proc processArray(A: [] int): int;
|
||||
proc createArray(): [] int;
|
||||
```
|
||||
|
||||
In all of these examples, there is an implicit runtime value (the bounds)
|
||||
that is associated with the array's type. However, we are never forced to
|
||||
explicitly thread through or include a size. Where reasoning about them is not
|
||||
necessary, Chapel's domains are hidden away. Chapel refers to the implicitly
|
||||
present value associated with an array type as its _runtime type_.
|
||||
|
||||
I hinted earlier that things are not quite the same in this representation
|
||||
as they are in my simplified model of Idris. In Idris, as I mentioned earlier,
|
||||
the values corresponding to vectors' indices can be erased if they are not used.
|
||||
In Chapel, this is not the case --- a domain always exists at runtime. At the
|
||||
surface level, this means that you may pay for more than what you use. However,
|
||||
domains enable a number of interesting patterns of array code. We'll get
|
||||
to that in a moment; first, I want to address a question that may be on
|
||||
your mind:
|
||||
|
||||
{{< dialog >}}
|
||||
{{< message "question" "reader" >}}
|
||||
At this point, this looks just like keeping a <code>.length</code> field as
|
||||
part of the array value. Most languages do this. What's the difference
|
||||
between this and Chapel's approach?
|
||||
{{< /message >}}
|
||||
{{< /dialog >}}
|
||||
|
||||
This is a fair question. The key difference is that the length exists even if an array
|
||||
does not. The following is valid Chapel code (re-using the `defaultValue`
|
||||
snippet above):
|
||||
|
||||
```Chapel
|
||||
proc defaultValue(type argType) {
|
||||
var x: argType;
|
||||
return x;
|
||||
}
|
||||
|
||||
proc doSomething() {
|
||||
type MyArray = [1..10] int;
|
||||
var A = defaultValue(MyArray);
|
||||
}
|
||||
```
|
||||
|
||||
Here, we created an array `A` with the right size (10 integer elements)
|
||||
without having another existing array as a reference. This might seem like
|
||||
a contrived example (I could've just as well written `var A: [1..10] int`),
|
||||
but the distinction is incredibly helpful for generic programming. Here's
|
||||
a piece of code from the Chapel standard library, which implements
|
||||
a part of Chapel's [reduction](https://chapel-lang.org/docs/primers/reductions.html) support:
|
||||
|
||||
{{< githubsnippet "chapel-lang/chapel" "e8ff8ee9a67950408cc6d4c3220ac647817ddae3" "modules/internal/ChapelReduce.chpl" "Chapel" 146 >}}
|
||||
inline proc identity {
|
||||
var x: chpl__sumType(eltType); return x;
|
||||
}
|
||||
{{< /githubsnippet >}}
|
||||
|
||||
Identity elements are important when performing operations like sums and products,
|
||||
for many reasons. For one, they tell you what the sum (e.g.) should be when there
|
||||
are no elements at all. For another, they can be used as an initial value for
|
||||
an accumulator. In Chapel, when you are performing a reduction, there is a
|
||||
good chance you will need several accumulators --- one for each thread performing
|
||||
a part of the reduction.
|
||||
|
||||
That `identity` function looks almost like `defaultValue`! Since it builds the
|
||||
identity element from the type, and since the type includes the array's dimensions,
|
||||
summing an array-of-arrays, even if it's empty, will produce the correct output.
|
||||
|
||||
```Chapel
|
||||
type Coordinate = [1..3] real;
|
||||
|
||||
var Empty: [0..<0] Coordinate;
|
||||
writeln(+ reduce Empty); // sum up an empty list of coordinates
|
||||
```
|
||||
|
||||
As I mentioned before, having the domain be part of the type can also enable
|
||||
indexing optimizations --- without any need for [interprocedural analysis](https://en.wikipedia.org/wiki/Interprocedural_optimization) ---
|
||||
in functions like `sumElementwise`:
|
||||
|
||||
```Chapel
|
||||
proc sumElementwise(A: [?D] int, B: [D] int) {
|
||||
var C: [D] int;
|
||||
for idx in D do
|
||||
C[idx] = A[idx] + B[idx];
|
||||
}
|
||||
```
|
||||
|
||||
The C++ equivalent of this function -- using `vectors` to enable arbitrary-size
|
||||
lists of numbers read from user input, and `.at` to enable bounds checks ---
|
||||
does not include enough information for this optimization to be possible.
|
||||
|
||||
```C++
|
||||
void sumElementwise(std::vector<int> A, std::vector<int> B) {
|
||||
std::vector<int> C(A.size());
|
||||
|
||||
for (size_t i = 0; i < A.size(); i++) {
|
||||
C.at(i) = A.at(i) + B.at(i);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
All in all, this makes for a very interesting mix of features:
|
||||
|
||||
* __Chapel arrays have their bounds as part of types__, like `std::array` in C++
|
||||
and `Vec` in Idris. This enables all the benefits I've described above.
|
||||
* __The bounds don't have to be known at compile-time__, like all dependent
|
||||
types in Idris. This means you can read arrays from files (e.g.) and still
|
||||
reason about their bounds as part of the type system.
|
||||
* __Domain information can be hidden when it's not used__, and does not require
|
||||
explicit additional work like template parameters or dependent pairs.
|
||||
|
||||
Most curiously, runtime types only extend to arrays and domains. In that sense,
|
||||
they are not a general purpose replacement for dependent types. Rather,
|
||||
they make arrays and domains special, and single out the exact case my
|
||||
professor was [talking about in the introduction](#bounds-quote). Although
|
||||
at times I've [twisted Chapel's type system in unconventional ways](https://chapel-lang.org/blog/posts/linear-multistep/)
|
||||
to simulate dependent types, rarely have I felt a need for them while
|
||||
programming in Chapel. In that sense --- and in the "practical software engineering"
|
||||
domain --- I may have been proven wrong.
|
||||
|
||||
### Pitfalls of Runtime Types
|
||||
|
||||
Should all languages do things the way Chapel does? I don't think so. Like
|
||||
most features, runtime types like that in Chapel are a language design
|
||||
tradeoff. Though I've covered their motivation and semantics, perhaps
|
||||
I should mention the downsides.
|
||||
|
||||
The greatest downside is that, generally speaking, _types are not always a
|
||||
compile-time property_. We saw this earlier with `MyArray`:
|
||||
|
||||
```Chapel
|
||||
type MyArray = [1..10] int;
|
||||
```
|
||||
|
||||
Here, the domain of `MyArray` (one-dimensional with bounds `1..10`) is a runtime
|
||||
value. It has an
|
||||
{{< sidenote "right" "dce-note" "execution-time cost." >}}
|
||||
The execution-time cost is, of course, modulo <a href="https://en.wikipedia.org/wiki/Dead-code_elimination">dead code elimination</a> etc.. If
|
||||
my snippet made up the entire program being compiled, the end result would
|
||||
likely do nothing, since <code>MyArray</code> isn't used anywhere.
|
||||
{{< /sidenote >}}
|
||||
Moreover, types that serve as arguments to functions (like `argType` for
|
||||
`defaultValue`), or as their return values (like the result of `chpl__sumType`)
|
||||
also have an execution-time backing. This is quite different from most
|
||||
compiled languages. For instance, in C++, templates are "stamped out" when
|
||||
the program is compiled. A function with a `typename T` template parameter
|
||||
called with type `int`, in terms of generated code, is always the same as
|
||||
a function where you search-and-replaced `T` with `int`. This is called
|
||||
[monomorphization](https://en.wikipedia.org/wiki/Monomorphization), by the
|
||||
way. In Chapel, however, if the function is instantiated with an array type,
|
||||
it will have an additional parameter, which represents the runtime component
|
||||
of the array's type.
|
||||
|
||||
The fact that types are runtime entities means that compile-time type checking
|
||||
is insufficient. Take, for instance, the above `sendEmails` function:
|
||||
|
||||
```Chapel
|
||||
proc sendEmails(numEmails: int, destinationAddrs: [1..numEmails] address) { /* ... */ }
|
||||
```
|
||||
|
||||
Since `numEmails` is a runtime value (it's a regular argument!), we can't ensure
|
||||
at compile-time that a value of some array matches the `[1..numEmails] address`
|
||||
type. As a result, Chapel defers bounds checking to when the `sendEmails`
|
||||
function is invoked.
|
||||
|
||||
This leads to some interesting performance considerations. Take two Chapel records
|
||||
(similar to `struct`s in C++) that simply wrap a value. In one of them,
|
||||
we provide an explicit type for the field, and in the other, we leave the field
|
||||
type generic.
|
||||
|
||||
```Chapel
|
||||
record R1 { var field: [1..10] int; }
|
||||
record R2 { var field; }
|
||||
|
||||
var A = [1,2,3,4,5,6,7,8,9,10];
|
||||
var r1 = new R1(A);
|
||||
var r2 = new R2(A);
|
||||
```
|
||||
|
||||
In a conversation with a coworker, I learned that these are not the same.
|
||||
That's because the record `R1` explicitly specifies a type
|
||||
for `field`. Since the type has a runtime component, the constructor
|
||||
of `R1` will actually perform a runtime check to ensure that the argument
|
||||
has 10 elements. `R2` will not do this, since there isn't any other type
|
||||
to check against.
|
||||
|
||||
Of course, the mere existence of an additional runtime component is a performance
|
||||
consideration. To ensure that Chapel programs perform as well as possible,
|
||||
the Chapel standard library attempts to avoid using runtime components
|
||||
wherever possible. This leads to a distinction between a "static type"
|
||||
(known at compile-time) and a "dynamic type" (requiring a runtime value).
|
||||
The `chpl__sumType` function we saw mentioned above uses static components of
|
||||
types, because we don't want each call to `+ reduce` to attempt to run a number
|
||||
of extraneous runtime queries.
|
||||
|
||||
### Conclusion
|
||||
|
||||
Though runtime types are not a silver bullet, I find them to be an elegant
|
||||
middle-ground solution to the problem of tracking array bounds. They enable
|
||||
optimizations, generic programming, and more, without the complexity of
|
||||
a fully dependently-typed language. They are also quite unlike anything I've
|
||||
seen in any other language.
|
||||
|
||||
What's more, this post only scratches the surface of what's possible using
|
||||
arrays and domains. Besides encoding array bounds, domains include information
|
||||
about how an array is distributed across several nodes (see the
|
||||
[distributions primer](https://chapel-lang.org/docs/primers/distributions.html)),
|
||||
and how it's stored in memory (see the [sparse computations](https://chapel-lang.org/blog/posts/announcing-chapel-2.3/#sparse-computations)
|
||||
section of the recent 2.3 release announcement). In general, they are a very
|
||||
flavorful component to Chapel's "special sauce" as a language for parallel
|
||||
computing.
|
||||
|
||||
You can read more about arrays and domains in the [corresponding primer](https://chapel-lang.org/docs/primers/arrays.html).
|
Loading…
Reference in New Issue
Block a user