Make the day one chapel file into a full-fledged blog

This commit is contained in:
Danila Fedorin 2022-12-01 20:12:26 -08:00
parent 277d198d17
commit 5c25497318

304
day1.chpl
View File

@ -1,14 +1,83 @@
use IO; // Advent of Code 2022, Day 1: Counting Calories, Daniel's Take
use List; // tags: ["Advent of Code 2022", "Parallel Programming", "Debugging"]
// summary: "Daniel's take on day 1 of advent of code, featuring reduce expressions, iterators, and custom reductions"
// authors: ["Daniel Fedorin"]
// date: 2022-12-01
// draft: true
/* /*
And so Advent of Code begins! Today's challenge is, as usual for the first
day, a fairly easy one. Brad has [already written]({{< relref "aoc2022-day01-calories" >}}) a wonderful introduction for
this challenge, and provided his own solution to the first part. In that
article, Brad is careful to not use too many complicated or unstable features,
and makes sure that they are well explained. I, on the other hand, am
quite excited about a few fancier features of Chapel, and already have a few
in mind for this day's programming challenge. Let's give them a go!
First things first, though -- we need to be able to read our puzzle input.
To this end, let's `use` the `IO` module.
*/
use IO;
/*
### Iterators and Injecting an Extra Line
The numbers come to us in blank-line-separated groups. The easiest way to The numbers come to us in blank-line-separated groups. The easiest way to
process all of these groups is to keep an intermediate accumulator process all of these groups is to keep an intermediate accumulator
that represents the total number within a group, record that accumulator that represents the total number within a group, and report that accumulator
each time we hit an empty line. On the other hand, the last group is each time we hit an empty line.
not terminated by an empty line, so we'd need special logic to handle
that case. Unless, of course, we just pretended there's an empty line On the other hand, the last group is not terminated by an empty line, so we
at the end, too. We can do this with a custom iterator, `linesWithEnding`. can't _just_ look at the accumulator whenever we see an empty line. If
we did, we'd forget the last elf! We could add another condition checking
for the end-of-file (which is what Brad does), but what if we just
added an empty line at the end? That would solve our problem, too.
The [`channel`](https://chapel-lang.org/docs/modules/standard/IO.html#IO.channel)
data type in Chapel's `IO` module (of which `stdin`, the input stream,
is one example) provides a method called [`lines`](https://chapel-lang.org/docs/modules/standard/IO.html#IO.channel.lines).
This method creates an _iterator_. Simply put, an iterator gives you data
(like `string`s representing the lines of a file!) one at a time. It can
be used in combination with a `for` loop like this:
```Chapel
for item in theIterator do writeln(item)
```
The above loop will print each of the items that the iterator will give to
it. In our particular case, the above could be specialized to:
```Chapel
for line in stdin.lines() do writeln(line)
```
This would simply print the input stream back out to the console. Alas,
there's no way to add to the end of an iterator, which is what we seem
to want to do with that "last empty line" idea. What we can do, though,
is make a new iterator. In Chapel, we can create custom iterators using
the `iter` keyword, followed by the name of our new iterator. Just
like a Chapel [procedure](https://chapel-lang.org/docs/language/spec/procedures.html),
this iterator can accept arguments. Since we're _making_ the iterator, it
is our responsibility now to "give" items -- we do this using the `yield`
keyword. For instance, we could make a simple iterator that gives
the numbers `1`, then `2`, then `3`:
```Chapel
iter giveOneTwoThree() {
yield 1;
yield 2;
yield 3;
}
// will print 1, 2, 3, each on a new line.
for i in giveOneTwoThree() do writeln(i);
```
So to make our new iterator that gives all the lines in the file, and
then one more blank one, we can first use a `for` loop and forward
all the lines from the `stdin.lines()` iterator, and the just yield
once more, giving that last empty line.
*/ */
iter linesWithEnding() { iter linesWithEnding() {
for line in stdin.lines() do yield line; for line in stdin.lines() do yield line;
@ -16,12 +85,17 @@ iter linesWithEnding() {
} }
/* /*
On to the actual intermediate accumulator logic described above. The That was a lot of background, but as you can see, the actual implementation
`current` variable will keep the "up-to-this-point" total within a group. is only 4 lines long.
Whenever we hit an empty line, we know we've finished processing a group,
so we report the value of `current`. Once again we'll make this logic ### Computing Calories per Elf
an iterator; each time it finishes up with a group, it will yield the
group's sum. On to the actual intermediate accumulator logic described above. We'll have
a `current` variable that will keep the running total of the calories in
the current elf's snacks. Whenever we hit an empty line, we know we've
finished processing a group, so we report the value of `current`. Once again
we'll make this logic an iterator; each time it finishes up with a group, it
will yield the group's sum.
*/ */
iter elves() { iter elves() {
var current = 0; var current = 0;
@ -37,39 +111,191 @@ iter elves() {
} }
/* /*
### Reductions
If we printed each item from this iterator, it would give us the total
calories for each of the elves, one at a time.
Another cool feature of Chapel is [reductions](https://chapel-lang.org/docs/primers/reductions.html).
A reduction can combine all of the items in an iterator or array using some kind
of operation. For example, `+ reduce [1,2,3,4]` will sum the numbers one
through four, giving 10. Another example is `* reduce (1..n)`, which computes
the factorial of `n` (where the factorial of a number $n$, aka
$n!$, is defined as $n! = 1\times 2\times ... \times n$). Another operation
that Chapel reductions support is `max`, or computing the maximum.
At this point, part 1 can be solved simply as: At this point, part 1 can be solved simply as:
```Chapel ```Chapel
writeln(max reduce elves()); writeln(max reduce elves());
``` ```
We could stop here, if we wanted. However, so far, none of this has _really_
showcased the "special" features of Chapel. Iterators are cool, but also
a thing in Python (and many other languages). Lots of languages have some
form of reduction,
{{< sidenote "right" "reduce-note" "though perhaps not as convenient." >}}
For instance, in Haskell, one might write <code>foldr max 0 array</code>.
In JavaScript, you could do something very similar, using <code>reduce</code>.
In <a href="https://www.jsoftware.com/#/">J</a>,
you could just write <code>>./array</code> and get its maximum value.
{{< /sidenote >}}
What makes Chapel cool, though, is its natural support for parallelism.
Its one-sentence summary is, after all,
> Chapel is a programming language designed for productive parallel computing at scale.
Well, it so happens that reductions can be parallelized, automatically.
Chapel can spread the computation across multiple threads, and combine
the results, all without our intervention. You might be wondering,
is it happening now? Did we write our first parallel solution to an Advent
of Code puzzle?
### Debugging Parallel Execution
Confirming that we're running in parallel is a little bit tricky. There are a
couple of compile-time flags we can enable to print out parallelism statistics,
and as far as I can tell, they are not documented in many places. Here they
are:
```bash
-sdebugDataPar=true -sdebugDefaultDist=true
```
So, what do we get? The output is in the (hidden-by-default) block below.
{{< details summary="(program output with just our iterator...)" >}}
```
*** DR alloc locale 0
*** DR calling postalloc locale 0
*** DR alloc locale 1
*** DR calling postalloc locale 1
*** DR alloc locale 1
*** DR calling postalloc locale 1
<puzzle answer>
*** In defRectArr simple-dd serial iterator
*** DR calling dealloc locale
*** In defRectArr simple-dd serial iterator
*** DR calling dealloc locale
*** DR calling dealloc locale
```
{{< /details >}}
It's a lot of output, but there's not much there about parallelism. The
only mention of "iterator" in here is preceded by the word "serial", which
is the opposite of "parallel". The only real output seems to be the
allocation (and subsequent deallocation) of locales, which are Chapel's
generalization of "places where computation can occur".
What might be causing this? We don't have to dig too deep; the
[documentation for `channel.lines`](https://chapel-lang.org/docs/modules/standard/IO.html#IO.channel.lines),
which I also linked earlier, notes:
> Only serial iteration is supported.
Since our other iterators build on top of `lines()` by transforming the things
it yields, our iterators become serial, too. There's no way to distribute
a serial iterator - it can _only_ be read one item at a time, without
the ability to look ahead (and thus give other threads something to work on).
Not all is lost, though. Plain old [arrays](https://chapel-lang.org/docs/language/spec/arrays.html)
support parallel iteration. We can easily read an iterator into an array, just
by assigning it to a variable.
```Chapel
var elfArray = elves();
writeln(max reduce elfArray);
```
This time, I get a lot more output:
{{< details summary="(program output using an intermediate array...)" >}}
```
*** DR alloc locale 0
*** DR calling postalloc locale 0
*** DR alloc locale 1
*** DR calling postalloc locale 1
*** DR alloc locale 1
*** DR calling postalloc locale 1
*** In defRectArr simple-dd standalone iterator
*** In domain standalone code:
numTasks=10 (false), minIndicesPerTask=1
numChunks=10 parDim=0 ranges(0).size=143999
### numTasksPerLoc = 10
### ignoreRunning = false
### minIndicesPerTask = 1
### numChunks = 10 (parDim = 0)
### nranges = (0..143998)
*** DI: ranges = (0..143998)
*** DI[0]: block = (0..14399)
*** DI[1]: block = (14400..28799)
*** DI[5]: block = (72000..86399)
*** DI[6]: block = (86400..100799)
*** DI[8]: block = (115200..129599)
*** DI[2]: block = (28800..43199)
*** DI[3]: block = (43200..57599)
*** DI[4]: block = (57600..71999)
*** DI[7]: block = (100800..115199)
*** DI[9]: block = (129600..143998)
<puzzle answer>
*** DR calling dealloc int(64)
*** In defRectArr simple-dd serial iterator
*** DR calling dealloc locale
*** In defRectArr simple-dd serial iterator
*** DR calling dealloc locale
*** DR calling dealloc locale
```
{{< /details >}}
Even more output! There are a few signs of parallelism in there. For instance,
the following line indicates that our workload is being split into chunks.
```
### numChunks = 10 (parDim = 0)
```
The reason to split data into chunks is to that each independent task can
have its own piece of the workload. I'm further reassured by the actual
number of chunks. It so happens that my computer has ten logical cores.
A Python script can be used to check:
```Python
import multiprocessing as mp;
print(mp.cpu_count())
```
On my machine, this prints `10`. So Chapel is automatically distributing
the work across all my cores! We did have to tweak the code a little
bit (specifically, we needed to make sure that what we're giving to the
reduction can be traversed in parallel). However, it's still very simple.
*/ */
/* {{< skip >}} */
/* /*
For part 2, I'm going to do something a bit more unusual. Chapel has support For part 2, I'm going to do something a bit more unusual. Chapel has support
for reduction expressions, which can even be run in parallel over many for reduction expressions, which can even be run in parallel over many
threads. I'll implement picking the top three elements as a threads. I'll implement picking the top `k` elements as a
custom reduction. If I implement all the methods on this reduction custom reduction. If I implement all the methods on this reduction
class, I'll be able to automatically make my code run on multuple threads! class, I'll be able to automatically make my code run on multuple threads!
*/ */
class MaxThree : ReduceScanOp { class MaxK : ReduceScanOp {
param k: int;
/* Reductions have an element type, the thing-that's-being-processed. /* Reductions have an element type, the thing-that's-being-processed.
This element type is left generic to support reductions over different This element type is left generic to support reductions over different
types of things. */ types of things. */
type eltType; type eltType;
/* The value our reduction is building up is a top-three list of the largest /* The value our reduction is building up is a top-`k` list of the largest
numbers. This top-three list is represented by a three-element tuple numbers. This top-`k` list is represented by a `k`-element tuple
of `eltType`, written as `3*eltType`. */ of `eltType`, written as `k*eltType`. */
var value: 3*eltType; var value: k*eltType;
/* Reductions need an identity element. This is an element that doesn't /* Reductions need an identity element. This is an element that doesn't
do anything when processed. For instance, for summing, the identity do anything when processed. For instance, for summing, the identity
element is zero (adding zero to a sum doesn't change the sum). For element is zero (adding zero to a sum doesn't change the sum). For
finding a product, the identity element is one (multiplying by one finding a product, the identity element is one (multiplying by one
leaves the product intact). When finding the _largest_ three numbers leaves the product intact). When finding the _largest_ `k` numbers
in a list, the identity element is three [infinums](https://en.wikipedia.org/wiki/Infimum_and_supremum) in a list, the identity element is `k` [infinums](https://en.wikipedia.org/wiki/Infimum_and_supremum)
of that list. We'll assume that the default value of the `eltType` of that list. We'll assume that the default value of the `eltType`
is its infinum, which means default-initializing a tuple of three is its infinum, which means default-initializing a tuple of `k`
`eltTypes` will give us such a three-infinum tuple. `eltTypes` will give us such a `k`-infinum tuple.
*/ */
proc identity { proc identity {
var val: value.type; var val: value.type;
@ -78,20 +304,20 @@ class MaxThree : ReduceScanOp {
/* /*
Next are accumulation functions. These describe how to combine partial Next are accumulation functions. These describe how to combine partial
results from substs of the list of numbers, or how to update the top results from substs of the list of numbers, or how to update the top
three given a new number. We only need to _really_ implement one version of `k` given a new number. We only need to _really_ implement one version of
these functions - one that combines two 3-tuples. The rest can be defined these functions - one that combines two k-tuples. The rest can be defined
in terms of that function. in terms of that function.
*/ */
proc accumulate(x: eltType) { accumulateOntoState(value, x); } proc accumulate(x: eltType) { accumulateOntoState(value, x); }
proc accumulateOntoState(ref state: 3*eltType, x: eltType) { accumulateOntoState(state, (0, 0, x)); } proc accumulateOntoState(ref state: k*eltType, x: eltType) { accumulateOntoState(state, (0, 0, x)); }
proc accumulate(x: 3*eltType) { accumulateOntoState(value, x); } proc accumulate(x: k*eltType) { accumulateOntoState(value, x); }
/* The accumulation function uses a standard algorithm for merging two sorted /* The accumulation function uses a standard algorithm for merging two sorted
lists. */ lists. */
proc accumulateOntoState(ref state: 3*eltType, x: 3*eltType) { proc accumulateOntoState(ref state: k*eltType, x: k*eltType) {
var result: state.type; var result: state.type;
var ptr1, ptr2: int = 3-1; var ptr1, ptr2: int = k-1;
for param idx in (0..<3 by -1) { for param idx in (0..<k by -1) {
if x[ptr1] > state[ptr2] { if x[ptr1] > state[ptr2] {
result[idx] = x[ptr1]; result[idx] = x[ptr1];
ptr1 -= 1; ptr1 -= 1;
@ -102,13 +328,13 @@ class MaxThree : ReduceScanOp {
} }
state = result; state = result;
} }
proc combine(other: MaxThree(eltType)) { proc combine(other: MaxK(k, eltType)) {
accumulate(other.value); accumulate(other.value);
} }
/* The Chapel reduction feature requires a couple of other methods, /* The Chapel reduction feature requires a couple of other methods,
which we implement below. */ which we implement below. */
proc clone() return new unmanaged MaxThree(eltType=eltType); proc clone() return new unmanaged MaxK(k=k, eltType=eltType);
proc generate() return value; proc generate() return value;
} }
@ -125,13 +351,19 @@ class MaxThree : ReduceScanOp {
config const part = 1; config const part = 1;
/* Here's how we use our solution. */ /* Here's how we use our solution. */
var elfArray = elves();
if part == 1 { if part == 1 {
/* For part 1, the code remains the same, since we're still just finding /* For part 1, the code remains the same, since we're still just finding
the one maximum number. */ the one maximum number. */
writeln(max reduce elves()); writeln(max reduce elfArray);
} else if part == 2 { } else if part == 2 {
// Need to read all the numbers into memory to make sure we can distribute var reducer = new unmanaged MaxK(k=3, eltType=int);
var elfList = elves(); var topThree = (0,0,0);
writeln(+ reduce (MaxThree reduce elfList)); forall elf in elfArray with (reducer reduce topThree) {
topThree reduce= elf;
} }
writeln(+ reduce topThree);
}
/* {{< /skip >}} */