2022-12-01 20:12:26 -08:00
|
|
|
// Advent of Code 2022, Day 1: Counting Calories, Daniel's Take
|
|
|
|
// tags: ["Advent of Code 2022", "Parallel Programming", "Debugging"]
|
|
|
|
// summary: "Daniel's take on day 1 of advent of code, featuring reduce expressions, iterators, and custom reductions"
|
|
|
|
// authors: ["Daniel Fedorin"]
|
|
|
|
// date: 2022-12-01
|
|
|
|
/*
|
2022-12-08 22:33:10 -08:00
|
|
|
{{< whole_file_min >}}
|
|
|
|
|
2022-12-01 20:12:26 -08:00
|
|
|
And so Advent of Code begins! Today's challenge is, as usual for the first
|
|
|
|
day, a fairly easy one. Brad has [already written]({{< relref "aoc2022-day01-calories" >}}) a wonderful introduction for
|
|
|
|
this challenge, and provided his own solution to the first part. In that
|
|
|
|
article, Brad is careful to not use too many complicated or unstable features,
|
|
|
|
and makes sure that they are well explained. I, on the other hand, am
|
|
|
|
quite excited about a few fancier features of Chapel, and already have a few
|
|
|
|
in mind for this day's programming challenge. Let's give them a go!
|
|
|
|
|
|
|
|
First things first, though -- we need to be able to read our puzzle input.
|
|
|
|
To this end, let's `use` the `IO` module.
|
|
|
|
*/
|
2022-11-30 22:46:47 -08:00
|
|
|
use IO;
|
|
|
|
|
2022-12-01 00:01:54 -08:00
|
|
|
/*
|
2022-12-01 20:12:26 -08:00
|
|
|
|
|
|
|
### Iterators and Injecting an Extra Line
|
|
|
|
|
2022-12-01 00:01:54 -08:00
|
|
|
The numbers come to us in blank-line-separated groups. The easiest way to
|
|
|
|
process all of these groups is to keep an intermediate accumulator
|
2022-12-01 20:12:26 -08:00
|
|
|
that represents the total number within a group, and report that accumulator
|
|
|
|
each time we hit an empty line.
|
|
|
|
|
|
|
|
On the other hand, the last group is not terminated by an empty line, so we
|
|
|
|
can't _just_ look at the accumulator whenever we see an empty line. If
|
|
|
|
we did, we'd forget the last elf! We could add another condition checking
|
|
|
|
for the end-of-file (which is what Brad does), but what if we just
|
|
|
|
added an empty line at the end? That would solve our problem, too.
|
|
|
|
|
|
|
|
The [`channel`](https://chapel-lang.org/docs/modules/standard/IO.html#IO.channel)
|
|
|
|
data type in Chapel's `IO` module (of which `stdin`, the input stream,
|
|
|
|
is one example) provides a method called [`lines`](https://chapel-lang.org/docs/modules/standard/IO.html#IO.channel.lines).
|
|
|
|
This method creates an _iterator_. Simply put, an iterator gives you data
|
|
|
|
(like `string`s representing the lines of a file!) one at a time. It can
|
|
|
|
be used in combination with a `for` loop like this:
|
|
|
|
|
|
|
|
```Chapel
|
|
|
|
for item in theIterator do writeln(item)
|
|
|
|
```
|
|
|
|
|
|
|
|
The above loop will print each of the items that the iterator will give to
|
|
|
|
it. In our particular case, the above could be specialized to:
|
|
|
|
|
|
|
|
```Chapel
|
|
|
|
for line in stdin.lines() do writeln(line)
|
|
|
|
```
|
|
|
|
|
|
|
|
This would simply print the input stream back out to the console. Alas,
|
|
|
|
there's no way to add to the end of an iterator, which is what we seem
|
|
|
|
to want to do with that "last empty line" idea. What we can do, though,
|
|
|
|
is make a new iterator. In Chapel, we can create custom iterators using
|
|
|
|
the `iter` keyword, followed by the name of our new iterator. Just
|
|
|
|
like a Chapel [procedure](https://chapel-lang.org/docs/language/spec/procedures.html),
|
|
|
|
this iterator can accept arguments. Since we're _making_ the iterator, it
|
|
|
|
is our responsibility now to "give" items -- we do this using the `yield`
|
|
|
|
keyword. For instance, we could make a simple iterator that gives
|
|
|
|
the numbers `1`, then `2`, then `3`:
|
|
|
|
|
|
|
|
```Chapel
|
|
|
|
iter giveOneTwoThree() {
|
|
|
|
yield 1;
|
|
|
|
yield 2;
|
|
|
|
yield 3;
|
|
|
|
}
|
|
|
|
|
|
|
|
// will print 1, 2, 3, each on a new line.
|
|
|
|
for i in giveOneTwoThree() do writeln(i);
|
|
|
|
```
|
|
|
|
|
|
|
|
So to make our new iterator that gives all the lines in the file, and
|
|
|
|
then one more blank one, we can first use a `for` loop and forward
|
|
|
|
all the lines from the `stdin.lines()` iterator, and the just yield
|
|
|
|
once more, giving that last empty line.
|
2022-12-01 00:01:54 -08:00
|
|
|
*/
|
2022-11-30 22:46:47 -08:00
|
|
|
iter linesWithEnding() {
|
|
|
|
for line in stdin.lines() do yield line;
|
|
|
|
yield "";
|
|
|
|
}
|
|
|
|
|
2022-12-01 00:01:54 -08:00
|
|
|
/*
|
2022-12-01 20:12:26 -08:00
|
|
|
That was a lot of background, but as you can see, the actual implementation
|
|
|
|
is only 4 lines long.
|
|
|
|
|
|
|
|
### Computing Calories per Elf
|
|
|
|
|
|
|
|
On to the actual intermediate accumulator logic described above. We'll have
|
|
|
|
a `current` variable that will keep the running total of the calories in
|
|
|
|
the current elf's snacks. Whenever we hit an empty line, we know we've
|
|
|
|
finished processing a group, so we report the value of `current`. Once again
|
|
|
|
we'll make this logic an iterator; each time it finishes up with a group, it
|
|
|
|
will yield the group's sum.
|
2022-12-01 00:01:54 -08:00
|
|
|
*/
|
2022-11-30 22:46:47 -08:00
|
|
|
iter elves() {
|
|
|
|
var current = 0;
|
|
|
|
for line in linesWithEnding() {
|
|
|
|
const trimmedLine = line.strip();
|
|
|
|
if trimmedLine == "" {
|
|
|
|
yield current;
|
|
|
|
current = 0;
|
|
|
|
} else {
|
|
|
|
current += trimmedLine : int;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-12-01 00:01:54 -08:00
|
|
|
/*
|
2022-12-01 20:12:26 -08:00
|
|
|
|
|
|
|
### Reductions
|
|
|
|
|
|
|
|
If we printed each item from this iterator, it would give us the total
|
|
|
|
calories for each of the elves, one at a time.
|
|
|
|
|
|
|
|
Another cool feature of Chapel is [reductions](https://chapel-lang.org/docs/primers/reductions.html).
|
|
|
|
A reduction can combine all of the items in an iterator or array using some kind
|
|
|
|
of operation. For example, `+ reduce [1,2,3,4]` will sum the numbers one
|
|
|
|
through four, giving 10. Another example is `* reduce (1..n)`, which computes
|
|
|
|
the factorial of `n` (where the factorial of a number $n$, aka
|
|
|
|
$n!$, is defined as $n! = 1\times 2\times ... \times n$). Another operation
|
|
|
|
that Chapel reductions support is `max`, or computing the maximum.
|
|
|
|
|
2022-12-01 00:01:54 -08:00
|
|
|
At this point, part 1 can be solved simply as:
|
|
|
|
|
|
|
|
```Chapel
|
|
|
|
writeln(max reduce elves());
|
|
|
|
```
|
2022-12-01 20:12:26 -08:00
|
|
|
|
|
|
|
We could stop here, if we wanted. However, so far, none of this has _really_
|
|
|
|
showcased the "special" features of Chapel. Iterators are cool, but also
|
|
|
|
a thing in Python (and many other languages). Lots of languages have some
|
|
|
|
form of reduction,
|
|
|
|
{{< sidenote "right" "reduce-note" "though perhaps not as convenient." >}}
|
|
|
|
For instance, in Haskell, one might write <code>foldr max 0 array</code>.
|
|
|
|
In JavaScript, you could do something very similar, using <code>reduce</code>.
|
|
|
|
In <a href="https://www.jsoftware.com/#/">J</a>,
|
|
|
|
you could just write <code>>./array</code> and get its maximum value.
|
|
|
|
{{< /sidenote >}}
|
|
|
|
What makes Chapel cool, though, is its natural support for parallelism.
|
|
|
|
Its one-sentence summary is, after all,
|
|
|
|
> Chapel is a programming language designed for productive parallel computing at scale.
|
|
|
|
|
|
|
|
Well, it so happens that reductions can be parallelized, automatically.
|
|
|
|
Chapel can spread the computation across multiple threads, and combine
|
|
|
|
the results, all without our intervention. You might be wondering,
|
|
|
|
is it happening now? Did we write our first parallel solution to an Advent
|
|
|
|
of Code puzzle?
|
|
|
|
|
|
|
|
### Debugging Parallel Execution
|
|
|
|
|
|
|
|
Confirming that we're running in parallel is a little bit tricky. There are a
|
|
|
|
couple of compile-time flags we can enable to print out parallelism statistics,
|
|
|
|
and as far as I can tell, they are not documented in many places. Here they
|
|
|
|
are:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
-sdebugDataPar=true -sdebugDefaultDist=true
|
|
|
|
```
|
|
|
|
|
|
|
|
So, what do we get? The output is in the (hidden-by-default) block below.
|
|
|
|
{{< details summary="(program output with just our iterator...)" >}}
|
|
|
|
```
|
|
|
|
*** DR alloc locale 0
|
|
|
|
*** DR calling postalloc locale 0
|
|
|
|
*** DR alloc locale 1
|
|
|
|
*** DR calling postalloc locale 1
|
|
|
|
*** DR alloc locale 1
|
|
|
|
*** DR calling postalloc locale 1
|
|
|
|
<puzzle answer>
|
|
|
|
*** In defRectArr simple-dd serial iterator
|
|
|
|
*** DR calling dealloc locale
|
|
|
|
*** In defRectArr simple-dd serial iterator
|
|
|
|
*** DR calling dealloc locale
|
|
|
|
*** DR calling dealloc locale
|
|
|
|
```
|
|
|
|
{{< /details >}}
|
|
|
|
|
|
|
|
It's a lot of output, but there's not much there about parallelism. The
|
|
|
|
only mention of "iterator" in here is preceded by the word "serial", which
|
|
|
|
is the opposite of "parallel". The only real output seems to be the
|
|
|
|
allocation (and subsequent deallocation) of locales, which are Chapel's
|
|
|
|
generalization of "places where computation can occur".
|
|
|
|
|
|
|
|
What might be causing this? We don't have to dig too deep; the
|
|
|
|
[documentation for `channel.lines`](https://chapel-lang.org/docs/modules/standard/IO.html#IO.channel.lines),
|
|
|
|
which I also linked earlier, notes:
|
|
|
|
|
|
|
|
> Only serial iteration is supported.
|
|
|
|
|
|
|
|
Since our other iterators build on top of `lines()` by transforming the things
|
|
|
|
it yields, our iterators become serial, too. There's no way to distribute
|
|
|
|
a serial iterator - it can _only_ be read one item at a time, without
|
|
|
|
the ability to look ahead (and thus give other threads something to work on).
|
|
|
|
|
|
|
|
Not all is lost, though. Plain old [arrays](https://chapel-lang.org/docs/language/spec/arrays.html)
|
|
|
|
support parallel iteration. We can easily read an iterator into an array, just
|
|
|
|
by assigning it to a variable.
|
|
|
|
|
|
|
|
```Chapel
|
|
|
|
var elfArray = elves();
|
|
|
|
writeln(max reduce elfArray);
|
|
|
|
```
|
|
|
|
|
|
|
|
This time, I get a lot more output:
|
|
|
|
|
|
|
|
{{< details summary="(program output using an intermediate array...)" >}}
|
|
|
|
```
|
|
|
|
*** DR alloc locale 0
|
|
|
|
*** DR calling postalloc locale 0
|
|
|
|
*** DR alloc locale 1
|
|
|
|
*** DR calling postalloc locale 1
|
|
|
|
*** DR alloc locale 1
|
|
|
|
*** DR calling postalloc locale 1
|
|
|
|
*** In defRectArr simple-dd standalone iterator
|
|
|
|
*** In domain standalone code:
|
|
|
|
numTasks=10 (false), minIndicesPerTask=1
|
|
|
|
numChunks=10 parDim=0 ranges(0).size=143999
|
|
|
|
### numTasksPerLoc = 10
|
|
|
|
### ignoreRunning = false
|
|
|
|
### minIndicesPerTask = 1
|
|
|
|
### numChunks = 10 (parDim = 0)
|
|
|
|
### nranges = (0..143998)
|
|
|
|
*** DI: ranges = (0..143998)
|
|
|
|
*** DI[0]: block = (0..14399)
|
|
|
|
*** DI[1]: block = (14400..28799)
|
|
|
|
*** DI[5]: block = (72000..86399)
|
|
|
|
*** DI[6]: block = (86400..100799)
|
|
|
|
*** DI[8]: block = (115200..129599)
|
|
|
|
*** DI[2]: block = (28800..43199)
|
|
|
|
*** DI[3]: block = (43200..57599)
|
|
|
|
*** DI[4]: block = (57600..71999)
|
|
|
|
*** DI[7]: block = (100800..115199)
|
|
|
|
*** DI[9]: block = (129600..143998)
|
|
|
|
<puzzle answer>
|
|
|
|
*** DR calling dealloc int(64)
|
|
|
|
*** In defRectArr simple-dd serial iterator
|
|
|
|
*** DR calling dealloc locale
|
|
|
|
*** In defRectArr simple-dd serial iterator
|
|
|
|
*** DR calling dealloc locale
|
|
|
|
*** DR calling dealloc locale
|
|
|
|
```
|
|
|
|
{{< /details >}}
|
|
|
|
|
|
|
|
Even more output! There are a few signs of parallelism in there. For instance,
|
|
|
|
the following line indicates that our workload is being split into chunks.
|
|
|
|
```
|
|
|
|
### numChunks = 10 (parDim = 0)
|
|
|
|
```
|
|
|
|
The reason to split data into chunks is to that each independent task can
|
|
|
|
have its own piece of the workload. I'm further reassured by the actual
|
|
|
|
number of chunks. It so happens that my computer has ten logical cores.
|
|
|
|
A Python script can be used to check:
|
|
|
|
|
|
|
|
```Python
|
|
|
|
import multiprocessing as mp;
|
|
|
|
print(mp.cpu_count())
|
|
|
|
```
|
|
|
|
|
|
|
|
On my machine, this prints `10`. So Chapel is automatically distributing
|
|
|
|
the work across all my cores! We did have to tweak the code a little
|
|
|
|
bit (specifically, we needed to make sure that what we're giving to the
|
|
|
|
reduction can be traversed in parallel). However, it's still very simple.
|
2022-12-01 00:01:54 -08:00
|
|
|
*/
|
|
|
|
|
2022-12-01 20:12:26 -08:00
|
|
|
/* {{< skip >}} */
|
|
|
|
|
2022-12-01 00:01:54 -08:00
|
|
|
/*
|
|
|
|
For part 2, I'm going to do something a bit more unusual. Chapel has support
|
|
|
|
for reduction expressions, which can even be run in parallel over many
|
2022-12-01 20:12:26 -08:00
|
|
|
threads. I'll implement picking the top `k` elements as a
|
2022-12-01 00:01:54 -08:00
|
|
|
custom reduction. If I implement all the methods on this reduction
|
|
|
|
class, I'll be able to automatically make my code run on multuple threads!
|
|
|
|
*/
|
2022-12-01 20:12:26 -08:00
|
|
|
class MaxK : ReduceScanOp {
|
|
|
|
param k: int;
|
2022-12-01 00:01:54 -08:00
|
|
|
/* Reductions have an element type, the thing-that's-being-processed.
|
|
|
|
This element type is left generic to support reductions over different
|
|
|
|
types of things. */
|
2022-11-30 22:46:47 -08:00
|
|
|
type eltType;
|
2022-12-01 20:12:26 -08:00
|
|
|
/* The value our reduction is building up is a top-`k` list of the largest
|
|
|
|
numbers. This top-`k` list is represented by a `k`-element tuple
|
|
|
|
of `eltType`, written as `k*eltType`. */
|
|
|
|
var value: k*eltType;
|
2022-11-30 22:46:47 -08:00
|
|
|
|
2022-12-01 00:01:54 -08:00
|
|
|
/* Reductions need an identity element. This is an element that doesn't
|
|
|
|
do anything when processed. For instance, for summing, the identity
|
|
|
|
element is zero (adding zero to a sum doesn't change the sum). For
|
|
|
|
finding a product, the identity element is one (multiplying by one
|
2022-12-01 20:12:26 -08:00
|
|
|
leaves the product intact). When finding the _largest_ `k` numbers
|
|
|
|
in a list, the identity element is `k` [infinums](https://en.wikipedia.org/wiki/Infimum_and_supremum)
|
2022-12-01 00:01:54 -08:00
|
|
|
of that list. We'll assume that the default value of the `eltType`
|
2022-12-01 20:12:26 -08:00
|
|
|
is its infinum, which means default-initializing a tuple of `k`
|
|
|
|
`eltTypes` will give us such a `k`-infinum tuple.
|
2022-12-01 00:01:54 -08:00
|
|
|
*/
|
2022-11-30 22:46:47 -08:00
|
|
|
proc identity {
|
|
|
|
var val: value.type;
|
|
|
|
return val;
|
|
|
|
}
|
2022-12-01 00:01:54 -08:00
|
|
|
/*
|
|
|
|
Next are accumulation functions. These describe how to combine partial
|
|
|
|
results from substs of the list of numbers, or how to update the top
|
2022-12-01 20:12:26 -08:00
|
|
|
`k` given a new number. We only need to _really_ implement one version of
|
|
|
|
these functions - one that combines two k-tuples. The rest can be defined
|
2022-12-01 00:01:54 -08:00
|
|
|
in terms of that function.
|
|
|
|
*/
|
2022-11-30 22:46:47 -08:00
|
|
|
proc accumulate(x: eltType) { accumulateOntoState(value, x); }
|
2022-12-01 20:12:26 -08:00
|
|
|
proc accumulateOntoState(ref state: k*eltType, x: eltType) { accumulateOntoState(state, (0, 0, x)); }
|
|
|
|
proc accumulate(x: k*eltType) { accumulateOntoState(value, x); }
|
2022-11-30 22:46:47 -08:00
|
|
|
|
2022-12-01 00:01:54 -08:00
|
|
|
/* The accumulation function uses a standard algorithm for merging two sorted
|
|
|
|
lists. */
|
2022-12-01 20:12:26 -08:00
|
|
|
proc accumulateOntoState(ref state: k*eltType, x: k*eltType) {
|
2022-11-30 22:46:47 -08:00
|
|
|
var result: state.type;
|
2022-12-01 20:12:26 -08:00
|
|
|
var ptr1, ptr2: int = k-1;
|
|
|
|
for param idx in (0..<k by -1) {
|
2022-11-30 22:46:47 -08:00
|
|
|
if x[ptr1] > state[ptr2] {
|
|
|
|
result[idx] = x[ptr1];
|
|
|
|
ptr1 -= 1;
|
|
|
|
} else {
|
|
|
|
result[idx] = state[ptr2];
|
|
|
|
ptr2 -= 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
state = result;
|
|
|
|
}
|
2022-12-01 20:12:26 -08:00
|
|
|
proc combine(other: MaxK(k, eltType)) {
|
2022-11-30 22:46:47 -08:00
|
|
|
accumulate(other.value);
|
|
|
|
}
|
2022-12-01 00:01:54 -08:00
|
|
|
|
|
|
|
/* The Chapel reduction feature requires a couple of other methods,
|
|
|
|
which we implement below. */
|
2022-12-01 20:12:26 -08:00
|
|
|
proc clone() return new unmanaged MaxK(k=k, eltType=eltType);
|
2022-11-30 22:46:47 -08:00
|
|
|
proc generate() return value;
|
|
|
|
}
|
|
|
|
|
2022-12-01 00:01:54 -08:00
|
|
|
/*
|
|
|
|
Let's make it possible to select which part we want to solve from the
|
2022-12-01 09:19:29 -08:00
|
|
|
command line. This can be easily achieved via a `config const`. A
|
|
|
|
variable like this can be set when running the program from the command
|
|
|
|
line as follows:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
./my-program --part=1
|
|
|
|
```
|
|
|
|
*/
|
2022-11-30 22:46:47 -08:00
|
|
|
config const part = 1;
|
|
|
|
|
2022-12-01 00:01:54 -08:00
|
|
|
/* Here's how we use our solution. */
|
2022-12-01 20:12:26 -08:00
|
|
|
var elfArray = elves();
|
2022-11-30 22:46:47 -08:00
|
|
|
if part == 1 {
|
2022-12-01 00:01:54 -08:00
|
|
|
/* For part 1, the code remains the same, since we're still just finding
|
|
|
|
the one maximum number. */
|
2022-12-01 20:12:26 -08:00
|
|
|
writeln(max reduce elfArray);
|
2022-11-30 22:46:47 -08:00
|
|
|
} else if part == 2 {
|
2022-12-01 20:12:26 -08:00
|
|
|
var reducer = new unmanaged MaxK(k=3, eltType=int);
|
|
|
|
var topThree = (0,0,0);
|
|
|
|
forall elf in elfArray with (reducer reduce topThree) {
|
|
|
|
topThree reduce= elf;
|
|
|
|
}
|
|
|
|
|
|
|
|
writeln(+ reduce topThree);
|
2022-11-30 22:46:47 -08:00
|
|
|
}
|
2022-12-01 00:01:54 -08:00
|
|
|
|
2022-12-01 20:12:26 -08:00
|
|
|
/* {{< /skip >}} */
|