2022-12-08 22:33:10 -08:00

370 lines
14 KiB

// Advent of Code 2022, Day 1: Counting Calories, Daniel's Take
// tags: ["Advent of Code 2022", "Parallel Programming", "Debugging"]
// summary: "Daniel's take on day 1 of advent of code, featuring reduce expressions, iterators, and custom reductions"
// authors: ["Daniel Fedorin"]
// date: 2022-12-01
{{< whole_file_min >}}
And so Advent of Code begins! Today's challenge is, as usual for the first
day, a fairly easy one. Brad has [already written]({{< relref "aoc2022-day01-calories" >}}) a wonderful introduction for
this challenge, and provided his own solution to the first part. In that
article, Brad is careful to not use too many complicated or unstable features,
and makes sure that they are well explained. I, on the other hand, am
quite excited about a few fancier features of Chapel, and already have a few
in mind for this day's programming challenge. Let's give them a go!
First things first, though -- we need to be able to read our puzzle input.
To this end, let's `use` the `IO` module.
use IO;
### Iterators and Injecting an Extra Line
The numbers come to us in blank-line-separated groups. The easiest way to
process all of these groups is to keep an intermediate accumulator
that represents the total number within a group, and report that accumulator
each time we hit an empty line.
On the other hand, the last group is not terminated by an empty line, so we
can't _just_ look at the accumulator whenever we see an empty line. If
we did, we'd forget the last elf! We could add another condition checking
for the end-of-file (which is what Brad does), but what if we just
added an empty line at the end? That would solve our problem, too.
The [`channel`](
data type in Chapel's `IO` module (of which `stdin`, the input stream,
is one example) provides a method called [`lines`](
This method creates an _iterator_. Simply put, an iterator gives you data
(like `string`s representing the lines of a file!) one at a time. It can
be used in combination with a `for` loop like this:
for item in theIterator do writeln(item)
The above loop will print each of the items that the iterator will give to
it. In our particular case, the above could be specialized to:
for line in stdin.lines() do writeln(line)
This would simply print the input stream back out to the console. Alas,
there's no way to add to the end of an iterator, which is what we seem
to want to do with that "last empty line" idea. What we can do, though,
is make a new iterator. In Chapel, we can create custom iterators using
the `iter` keyword, followed by the name of our new iterator. Just
like a Chapel [procedure](,
this iterator can accept arguments. Since we're _making_ the iterator, it
is our responsibility now to "give" items -- we do this using the `yield`
keyword. For instance, we could make a simple iterator that gives
the numbers `1`, then `2`, then `3`:
iter giveOneTwoThree() {
yield 1;
yield 2;
yield 3;
// will print 1, 2, 3, each on a new line.
for i in giveOneTwoThree() do writeln(i);
So to make our new iterator that gives all the lines in the file, and
then one more blank one, we can first use a `for` loop and forward
all the lines from the `stdin.lines()` iterator, and the just yield
once more, giving that last empty line.
iter linesWithEnding() {
for line in stdin.lines() do yield line;
yield "";
That was a lot of background, but as you can see, the actual implementation
is only 4 lines long.
### Computing Calories per Elf
On to the actual intermediate accumulator logic described above. We'll have
a `current` variable that will keep the running total of the calories in
the current elf's snacks. Whenever we hit an empty line, we know we've
finished processing a group, so we report the value of `current`. Once again
we'll make this logic an iterator; each time it finishes up with a group, it
will yield the group's sum.
iter elves() {
var current = 0;
for line in linesWithEnding() {
const trimmedLine = line.strip();
if trimmedLine == "" {
yield current;
current = 0;
} else {
current += trimmedLine : int;
### Reductions
If we printed each item from this iterator, it would give us the total
calories for each of the elves, one at a time.
Another cool feature of Chapel is [reductions](
A reduction can combine all of the items in an iterator or array using some kind
of operation. For example, `+ reduce [1,2,3,4]` will sum the numbers one
through four, giving 10. Another example is `* reduce (1..n)`, which computes
the factorial of `n` (where the factorial of a number $n$, aka
$n!$, is defined as $n! = 1\times 2\times ... \times n$). Another operation
that Chapel reductions support is `max`, or computing the maximum.
At this point, part 1 can be solved simply as:
writeln(max reduce elves());
We could stop here, if we wanted. However, so far, none of this has _really_
showcased the "special" features of Chapel. Iterators are cool, but also
a thing in Python (and many other languages). Lots of languages have some
form of reduction,
{{< sidenote "right" "reduce-note" "though perhaps not as convenient." >}}
For instance, in Haskell, one might write <code>foldr max 0 array</code>.
In JavaScript, you could do something very similar, using <code>reduce</code>.
In <a href="">J</a>,
you could just write <code>>./array</code> and get its maximum value.
{{< /sidenote >}}
What makes Chapel cool, though, is its natural support for parallelism.
Its one-sentence summary is, after all,
> Chapel is a programming language designed for productive parallel computing at scale.
Well, it so happens that reductions can be parallelized, automatically.
Chapel can spread the computation across multiple threads, and combine
the results, all without our intervention. You might be wondering,
is it happening now? Did we write our first parallel solution to an Advent
of Code puzzle?
### Debugging Parallel Execution
Confirming that we're running in parallel is a little bit tricky. There are a
couple of compile-time flags we can enable to print out parallelism statistics,
and as far as I can tell, they are not documented in many places. Here they
-sdebugDataPar=true -sdebugDefaultDist=true
So, what do we get? The output is in the (hidden-by-default) block below.
{{< details summary="(program output with just our iterator...)" >}}
*** DR alloc locale 0
*** DR calling postalloc locale 0
*** DR alloc locale 1
*** DR calling postalloc locale 1
*** DR alloc locale 1
*** DR calling postalloc locale 1
<puzzle answer>
*** In defRectArr simple-dd serial iterator
*** DR calling dealloc locale
*** In defRectArr simple-dd serial iterator
*** DR calling dealloc locale
*** DR calling dealloc locale
{{< /details >}}
It's a lot of output, but there's not much there about parallelism. The
only mention of "iterator" in here is preceded by the word "serial", which
is the opposite of "parallel". The only real output seems to be the
allocation (and subsequent deallocation) of locales, which are Chapel's
generalization of "places where computation can occur".
What might be causing this? We don't have to dig too deep; the
[documentation for `channel.lines`](,
which I also linked earlier, notes:
> Only serial iteration is supported.
Since our other iterators build on top of `lines()` by transforming the things
it yields, our iterators become serial, too. There's no way to distribute
a serial iterator - it can _only_ be read one item at a time, without
the ability to look ahead (and thus give other threads something to work on).
Not all is lost, though. Plain old [arrays](
support parallel iteration. We can easily read an iterator into an array, just
by assigning it to a variable.
var elfArray = elves();
writeln(max reduce elfArray);
This time, I get a lot more output:
{{< details summary="(program output using an intermediate array...)" >}}
*** DR alloc locale 0
*** DR calling postalloc locale 0
*** DR alloc locale 1
*** DR calling postalloc locale 1
*** DR alloc locale 1
*** DR calling postalloc locale 1
*** In defRectArr simple-dd standalone iterator
*** In domain standalone code:
numTasks=10 (false), minIndicesPerTask=1
numChunks=10 parDim=0 ranges(0).size=143999
### numTasksPerLoc = 10
### ignoreRunning = false
### minIndicesPerTask = 1
### numChunks = 10 (parDim = 0)
### nranges = (0..143998)
*** DI: ranges = (0..143998)
*** DI[0]: block = (0..14399)
*** DI[1]: block = (14400..28799)
*** DI[5]: block = (72000..86399)
*** DI[6]: block = (86400..100799)
*** DI[8]: block = (115200..129599)
*** DI[2]: block = (28800..43199)
*** DI[3]: block = (43200..57599)
*** DI[4]: block = (57600..71999)
*** DI[7]: block = (100800..115199)
*** DI[9]: block = (129600..143998)
<puzzle answer>
*** DR calling dealloc int(64)
*** In defRectArr simple-dd serial iterator
*** DR calling dealloc locale
*** In defRectArr simple-dd serial iterator
*** DR calling dealloc locale
*** DR calling dealloc locale
{{< /details >}}
Even more output! There are a few signs of parallelism in there. For instance,
the following line indicates that our workload is being split into chunks.
### numChunks = 10 (parDim = 0)
The reason to split data into chunks is to that each independent task can
have its own piece of the workload. I'm further reassured by the actual
number of chunks. It so happens that my computer has ten logical cores.
A Python script can be used to check:
import multiprocessing as mp;
On my machine, this prints `10`. So Chapel is automatically distributing
the work across all my cores! We did have to tweak the code a little
bit (specifically, we needed to make sure that what we're giving to the
reduction can be traversed in parallel). However, it's still very simple.
/* {{< skip >}} */
For part 2, I'm going to do something a bit more unusual. Chapel has support
for reduction expressions, which can even be run in parallel over many
threads. I'll implement picking the top `k` elements as a
custom reduction. If I implement all the methods on this reduction
class, I'll be able to automatically make my code run on multuple threads!
class MaxK : ReduceScanOp {
param k: int;
/* Reductions have an element type, the thing-that's-being-processed.
This element type is left generic to support reductions over different
types of things. */
type eltType;
/* The value our reduction is building up is a top-`k` list of the largest
numbers. This top-`k` list is represented by a `k`-element tuple
of `eltType`, written as `k*eltType`. */
var value: k*eltType;
/* Reductions need an identity element. This is an element that doesn't
do anything when processed. For instance, for summing, the identity
element is zero (adding zero to a sum doesn't change the sum). For
finding a product, the identity element is one (multiplying by one
leaves the product intact). When finding the _largest_ `k` numbers
in a list, the identity element is `k` [infinums](
of that list. We'll assume that the default value of the `eltType`
is its infinum, which means default-initializing a tuple of `k`
`eltTypes` will give us such a `k`-infinum tuple.
proc identity {
var val: value.type;
return val;
Next are accumulation functions. These describe how to combine partial
results from substs of the list of numbers, or how to update the top
`k` given a new number. We only need to _really_ implement one version of
these functions - one that combines two k-tuples. The rest can be defined
in terms of that function.
proc accumulate(x: eltType) { accumulateOntoState(value, x); }
proc accumulateOntoState(ref state: k*eltType, x: eltType) { accumulateOntoState(state, (0, 0, x)); }
proc accumulate(x: k*eltType) { accumulateOntoState(value, x); }
/* The accumulation function uses a standard algorithm for merging two sorted
lists. */
proc accumulateOntoState(ref state: k*eltType, x: k*eltType) {
var result: state.type;
var ptr1, ptr2: int = k-1;
for param idx in (0..<k by -1) {
if x[ptr1] > state[ptr2] {
result[idx] = x[ptr1];
ptr1 -= 1;
} else {
result[idx] = state[ptr2];
ptr2 -= 1;
state = result;
proc combine(other: MaxK(k, eltType)) {
/* The Chapel reduction feature requires a couple of other methods,
which we implement below. */
proc clone() return new unmanaged MaxK(k=k, eltType=eltType);
proc generate() return value;
Let's make it possible to select which part we want to solve from the
command line. This can be easily achieved via a `config const`. A
variable like this can be set when running the program from the command
line as follows:
./my-program --part=1
config const part = 1;
/* Here's how we use our solution. */
var elfArray = elves();
if part == 1 {
/* For part 1, the code remains the same, since we're still just finding
the one maximum number. */
writeln(max reduce elfArray);
} else if part == 2 {
var reducer = new unmanaged MaxK(k=3, eltType=int);
var topThree = (0,0,0);
forall elf in elfArray with (reducer reduce topThree) {
topThree reduce= elf;
writeln(+ reduce topThree);
/* {{< /skip >}} */