2022-12-07 00:02:53 -08:00
|
|
|
|
// Advent of Code 2022, Day 7: Traversing Directories
|
|
|
|
|
// authors: ["Daniel Fedorin"]
|
|
|
|
|
// summary: "A solution to day seven of AoC 2022, introducing classes and memory management."
|
|
|
|
|
// tags: ["Advent of Code 2022"]
|
|
|
|
|
// date: 2022-12-07
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
Welcome to day 7 of Chapel's Advent of Code 2022 series. We're over halway
|
|
|
|
|
through the twelve days of Chapel AoC! In case you haven't been following
|
|
|
|
|
the series, check out the introductory [Advent of Code 2022: Twelve
|
|
|
|
|
Days of Chapel](../aoc2022-day00-intro/) article for more context.
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
### The Task at Hand and My Approach
|
|
|
|
|
|
2022-12-08 22:33:30 -08:00
|
|
|
|
In [today's puzzle](https://adventofcode.com/2022/day/7), we are given a list of terminal-like commands (
|
2022-12-07 00:02:53 -08:00
|
|
|
|
[`ls`](https://man7.org/linux/man-pages/man1/ls.1.html) and [`cd`](https://man7.org/linux/man-pages/man1/cd.1p.html)
|
|
|
|
|
), as well as output corresponding to running these commands. The commands
|
|
|
|
|
explore a fictional file system, which can have files (objects with size)
|
|
|
|
|
as well as directories that group files and other (sub-)directories. The
|
|
|
|
|
problem then asks to compute the sizes of each folder, and to total up the
|
|
|
|
|
sizes of all folders that are smaller than a particular threshold.
|
|
|
|
|
|
|
|
|
|
The tree-like nature of the file system does not make it amenable to
|
2022-12-08 22:33:30 -08:00
|
|
|
|
representations based on arrays, lists, or maps alone. The trouble with
|
|
|
|
|
these data types is that they're flat. Our input could have arbitrary
|
2022-12-07 00:02:53 -08:00
|
|
|
|
levels of nested directories. However, arrays, lists, and maps cannot have
|
2022-12-08 22:33:30 -08:00
|
|
|
|
such arbitrary nesting --- we'd need something like a list of lists of lists of...
|
2022-12-07 00:02:53 -08:00
|
|
|
|
We could, of course, use the `map` and `list` data types to represent the
|
|
|
|
|
file system with some sort of [adjacency list](https://en.wikipedia.org/wiki/Adjacency_list).
|
|
|
|
|
However, such an implementation would be somewhat clunky and hard to use.
|
|
|
|
|
|
2022-12-08 22:33:30 -08:00
|
|
|
|
Instead, in this article I use a different tool from the repertoire of Chapel language
|
|
|
|
|
features, one we haven't seen so far: classes. Specifically, I use a class, `Dir`, to represent
|
|
|
|
|
directories in the file system, and build up a tree of these directories
|
|
|
|
|
while reading the input. I then create an iterator over this tree that
|
|
|
|
|
computes and yields the sizes of the folders. From there, it's easy to
|
|
|
|
|
pick out all directory sizes smaller than the threshold and sum them up.
|
2022-12-07 00:02:53 -08:00
|
|
|
|
|
2022-12-08 22:33:30 -08:00
|
|
|
|
**If you skip right to your favorite parts of a movie, here's a full solution for the day:**
|
|
|
|
|
{{< whole_file_min >}}
|
|
|
|
|
|
|
|
|
|
And now, on to the explanation train. Before the train departs, let's import
|
|
|
|
|
a few of the modules we'll use today. `IO` is a permanent fixture in our
|
|
|
|
|
solutions (we always need to read input!), and `List` is a familiar face.
|
|
|
|
|
The only newcomer here is `Map`, which helps us associate keys with values,
|
|
|
|
|
much like a dictionary in Python, a hash in Ruby, or a map in C++.
|
|
|
|
|
We'll use maps and lists for storing the various files and directories
|
|
|
|
|
on the file system.
|
2022-12-07 00:02:53 -08:00
|
|
|
|
*/
|
|
|
|
|
|
2022-12-06 23:02:10 -08:00
|
|
|
|
use IO, Map, List;
|
|
|
|
|
|
2022-12-08 22:33:30 -08:00
|
|
|
|
/*
|
|
|
|
|
With that, our train's first stop: classes!
|
|
|
|
|
|
|
|
|
|
### Classes in Chapel
|
|
|
|
|
|
|
|
|
|
Like in most languages, classes in Chapel are a way to group together related
|
|
|
|
|
pieces of data. Up until now, we've used tuples for this purpose. Tuples,
|
|
|
|
|
however, have a couple of limitations when it comes to solving today's
|
|
|
|
|
Advent of Code problem:
|
|
|
|
|
|
|
|
|
|
* We can't name a tuple's elements. Whenever you make and use a tuple,
|
|
|
|
|
it is up to _you_ to remember the order of the elements within it, and
|
|
|
|
|
what each element represents.
|
|
|
|
|
* Tuples can be nested, but their precise element types, including
|
|
|
|
|
nesting depth, must be known at compile-time. As a result, tuples aren’t
|
|
|
|
|
flexible enough to support the arbitrary levels of nesting that would be
|
|
|
|
|
required by a program that didn’t know the directory structure _a priori_
|
|
|
|
|
(e.g. one that was reading it from disk). We simply don’t have the
|
|
|
|
|
information at compile-time to describe the tuple’s types and “shape”.
|
|
|
|
|
|
|
|
|
|
Classes have neither of these limitations. They do, however, need to be
|
|
|
|
|
explicitly created within Chapel code. For example, one might create a
|
|
|
|
|
class to store information about a person:
|
|
|
|
|
|
|
|
|
|
```Chapel
|
|
|
|
|
class person {
|
|
|
|
|
var firstName, lastName: string;
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
We've seen plenty of `var` statements used to create variables; when used
|
|
|
|
|
within a class, `var` declares a _member variable_ (also known as a _field_)
|
|
|
|
|
for the class. Our `person` contains two pieces of data in its fields: the
|
|
|
|
|
person's first name (`firstName`) and last name (`lastName`).
|
|
|
|
|
|
|
|
|
|
With that class definition in hand, we can create instances of the `person` class
|
|
|
|
|
using the `new` keyword.
|
|
|
|
|
|
|
|
|
|
```Chapel
|
|
|
|
|
var biggestCandyFan = new person("Daniel", "Fedorin");
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
As usual, we can rely on type inference to only write the type `person` once;
|
|
|
|
|
Chapel figures out that `biggestCandyFan` is a `person`. Now, it's easy to get
|
|
|
|
|
the various fields back out of a class:
|
|
|
|
|
|
|
|
|
|
```Chapel
|
|
|
|
|
writeln("The biggest fan of candy is ", biggestCandyFan.firstName);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Believe it or not, we've already seen enough of classes to see how to represent
|
|
|
|
|
nested data structures. The key observation is that classes have names, which
|
|
|
|
|
means that we can create fields that refer back to instances of the same class. Here's
|
|
|
|
|
an example of what I mean, in the form of a modified `person` class:
|
|
|
|
|
|
|
|
|
|
```Chapel {hl_lines=3}
|
|
|
|
|
class person {
|
|
|
|
|
var firstName, lastName: string;
|
|
|
|
|
var children: list(owned person);
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The highlighted line is new. We've added a list of children to our person.
|
|
|
|
|
These children are themselves instances of `person`, which means they too
|
|
|
|
|
can have children of their own. _Et voilà_ - we've got a nested data structure!
|
|
|
|
|
|
|
|
|
|
#### Memory Management Strategies
|
|
|
|
|
You probably noticed that `children`'s type is `list(owned person)` ---
|
|
|
|
|
note the `owned`. This keyword is an indication of the way that memory is
|
|
|
|
|
allocated and maintained for classes: their _memory management_. To create
|
|
|
|
|
a class, a Chapel program asks for some memory from the computer (_allocates_ it).
|
|
|
|
|
This memory is kept by the program until the instance of a class is no longer
|
|
|
|
|
needed, at which point it's _deallocated_/_freed_. The challenge is knowing when
|
|
|
|
|
a class is no longer needed! This is where _memory management strategies_,
|
|
|
|
|
like `owned`, come in.
|
|
|
|
|
|
|
|
|
|
We don't need to get too deep into the various memory management strategies
|
|
|
|
|
in today's post.
|
|
|
|
|
|
|
|
|
|
{{< details summary="**(If you're curious, here's a brief description of each strategy...)**" >}}
|
|
|
|
|
* When using the `owned` strategy, a class instance has one "owner" variable.
|
|
|
|
|
The instance is only around as long as this owner exists.
|
|
|
|
|
As soon as the owner disappears, the class instance is deallocated.
|
|
|
|
|
In some cases --- though we won't be covering them today --- ownership can
|
|
|
|
|
be transferred from one variable to another, but no two values can
|
|
|
|
|
own the same class instance at the same time.
|
|
|
|
|
|
|
|
|
|
Other variables can still refer to an `owned` class instance, but they must _borrow_ it,
|
|
|
|
|
creating, for example, a `borrowed person`. Borrows do not affect the
|
|
|
|
|
lifetime of class or when it is deallocated.
|
|
|
|
|
* When using the `shared` strategy, Chapel keeps track of how many places
|
|
|
|
|
still have variables that refer to a particular instance of a class. This
|
|
|
|
|
is typically called a _reference count_. Each time a variable is created
|
|
|
|
|
or changed to refer to a class instance, the instance's reference count
|
|
|
|
|
increases. When that variable goes out of scope and disappears, the
|
|
|
|
|
reference count decreases. Finally, when the reference count reaches
|
|
|
|
|
zero (no more variables refer to the class instance), there's no point
|
|
|
|
|
in keeping it around anymore, and its memory is deallocated.
|
|
|
|
|
|
|
|
|
|
As is the case with `owned`, other variables can borrow `shared` class instances.
|
|
|
|
|
Such borrows do not affect the reference count at all, and therefore don't
|
|
|
|
|
influence when the instance is freed.
|
|
|
|
|
* When using the `unmanaged` strategy, you're promising to manually free
|
|
|
|
|
the memory later, using the `delete` keyword. This is very similar to
|
|
|
|
|
how `new`/`delete` work in classic C++.
|
|
|
|
|
{{< /details >}}
|
|
|
|
|
|
|
|
|
|
So, the `owned` keyword in our `children` list means we've opted for the
|
|
|
|
|
`owned` memory management strategy. The implication of this is that
|
|
|
|
|
when a "parent" person is deallocated, so are all of its children
|
|
|
|
|
(since the person class, through its `children` list, owns each child).
|
|
|
|
|
If we aren't planning on sharing our data, `owned` is the preferred strategy. This is because
|
|
|
|
|
it precludes the need for some bookkeeping, which
|
|
|
|
|
often makes a difference in terms of performance. The added benefit to using
|
|
|
|
|
`owned`, in my personal view, is that it's easier to figure out when something
|
|
|
|
|
will be deleted --- there's no chance of some other variable, elsewhere in my program,
|
|
|
|
|
preventing a class instance's deallocation.
|
|
|
|
|
|
|
|
|
|
#### Methods
|
|
|
|
|
Remember how I said that classes can be used to group together pieces
|
|
|
|
|
of related data? Well, they can do more than that. They can also group
|
|
|
|
|
together operations on this data, in the form of _methods_. For instance,
|
|
|
|
|
we could add the following definition **inside** the `class` declaration
|
|
|
|
|
for our `person`:
|
|
|
|
|
|
|
|
|
|
```Chapel
|
|
|
|
|
class person {
|
|
|
|
|
// ... as before
|
|
|
|
|
|
|
|
|
|
proc getGreeting() {
|
|
|
|
|
return "Hello, " + this.firstName + "!";
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Just like fields can be thought of as `var`s that are associated with a particular
|
|
|
|
|
class instance, methods can be thought of as _procedures_ associated with
|
|
|
|
|
a particular class instance. Thus, methods behave pretty much exactly
|
|
|
|
|
like the `proc`s we've seen so far, with the notable difference of being able to
|
|
|
|
|
access that class instance through the `this` keyword.
|
|
|
|
|
For example, inside the body of a method like `getGreeting` above,
|
|
|
|
|
`this.firstName` gets us the person's first name, and `this.lastName` would
|
|
|
|
|
get us their last name.
|
|
|
|
|
|
|
|
|
|
We can call methods using the dot syntax:
|
|
|
|
|
|
|
|
|
|
```Chapel
|
|
|
|
|
// Prints "Hello, Daniel!"
|
|
|
|
|
writeln(biggestCandyFan.getGreeting());
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Methods are a powerful tool for abstraction; rather than writing external code
|
|
|
|
|
that refers to the various fields of a class, we can put that logic
|
|
|
|
|
inside of methods, and avoid exposing it to the rest of the world. A person
|
|
|
|
|
writing `.getGreeting()` will not need to know how a name is represented
|
|
|
|
|
in the `person` class.
|
|
|
|
|
|
|
|
|
|
Another sort of method is a _type method_ (sometimes referred to as
|
|
|
|
|
a _static method_ in other languages). Rather than being called on
|
|
|
|
|
an instance of a person, like `biggestCandyFan` or `daniel`, it's called
|
|
|
|
|
on the class itself. For instance:
|
|
|
|
|
|
|
|
|
|
```Chapel
|
|
|
|
|
class person {
|
|
|
|
|
// ... as before
|
|
|
|
|
|
|
|
|
|
proc type createBiggestCandyFan() {
|
|
|
|
|
return new person("Daniel", "Fedorin");
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
var biggestCandyFan = person.createBiggestCandyFan();
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Methods like this have the benefit of being associated with a particular class.
|
|
|
|
|
This means that another class can have its own `createBiggestCandyFan()`
|
|
|
|
|
method, and there won't be any confusion or problems arising from trying
|
|
|
|
|
to figure out which is which. Perhaps dogs (represented by a hypothetical
|
|
|
|
|
`dog` class) have a biggest candy fan, too!
|
|
|
|
|
|
|
|
|
|
```Chapel
|
|
|
|
|
var biggestCandyFan = person.createBiggestCandyFan();
|
|
|
|
|
var biggestCandyFanDog = dog.createBiggestCandyFan();
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### A `Dir` Class to Represent Directories
|
|
|
|
|
Back to the solution. The class I use for tracking directories is actually not too different
|
|
|
|
|
from our modified `person` class above. Each directory
|
|
|
|
|
{{< sidenote right "dir-firstname-note" "will have a name" >}}
|
|
|
|
|
Despite the recent media noise about ChatGPT, directories have not yet
|
|
|
|
|
been granted personhood, and do not have both first and last names.
|
|
|
|
|
{{< /sidenote >}}
|
|
|
|
|
as well as a collection of files and directories it contains.
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
class Dir {
|
2022-12-06 23:02:10 -08:00
|
|
|
|
var name: string;
|
|
|
|
|
|
|
|
|
|
var files = new map(string, int);
|
2022-12-08 22:33:30 -08:00
|
|
|
|
var dirs = new list(owned Dir);
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
Since files have no
|
|
|
|
|
additional information to them besides their size, I decided to represent
|
|
|
|
|
them as a map --- a directory's `files` field associates each file's name
|
|
|
|
|
with that file's size. The subdirectories are represented just like
|
|
|
|
|
the `children` field from our `person` record, as a list of owned `Dir`s.
|
|
|
|
|
|
|
|
|
|
There are a few more things I want to add to `Dir`;
|
|
|
|
|
the first is a way to read our directory from our puzzle input.
|
|
|
|
|
|
|
|
|
|
#### Reading the File System with the `fromInput` Type Method
|
|
|
|
|
For reasons of abstraction and avoiding conflicts, I put
|
|
|
|
|
the code for creating a directory from user input into a type method on `Dir`. Within
|
|
|
|
|
this method, I include the now-familiar code for reading from the
|
|
|
|
|
input using `readLine`, until we run out of lines.
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
proc type fromInput(name: string): owned Dir {
|
|
|
|
|
var line: string;
|
|
|
|
|
var newDir = new Dir(name);
|
|
|
|
|
|
|
|
|
|
while readLine(line, stripNewline = true) {
|
|
|
|
|
/*
|
|
|
|
|
Notice that I'm accepting the name for the
|
|
|
|
|
directory as a string formal and initializing a new variable `newDir` with that name.
|
|
|
|
|
Notice also that I don't need to provide the `files` and `dirs`
|
|
|
|
|
as arguments to `new Dir` --- they have default values in the
|
|
|
|
|
class definition. By default, `new` uses the `owned` memory management
|
|
|
|
|
strategy. For the time being, the `newDir` variable owns our
|
|
|
|
|
directory-under-construction.
|
2022-12-06 23:02:10 -08:00
|
|
|
|
|
2022-12-08 22:33:30 -08:00
|
|
|
|
We're reading lines now; all that's left is to figure out what to do
|
|
|
|
|
with them. The first case is that of `$ cd ..`. When we see that line,
|
|
|
|
|
it means that we're done looking at the current directory; none
|
|
|
|
|
of the subsequent `ls` lines will be meant for us. Thus, we break
|
|
|
|
|
out of the input `while`-loop.
|
|
|
|
|
*/
|
|
|
|
|
if line == "$ cd .." {
|
|
|
|
|
break;
|
|
|
|
|
/*
|
|
|
|
|
If the `cd` command is used, but its argument isn't `..`, we're being
|
|
|
|
|
asked to descend into a sub-directory of our current `newDir`.
|
|
|
|
|
In this case, we call the `fromInput` method again, recursively,
|
|
|
|
|
to create a subdirectory of the current one. This
|
|
|
|
|
call will keep consuming lines from the input until the sub-directory
|
|
|
|
|
has been processed, at which point it will return it to us. We'll
|
|
|
|
|
immediately append this sub-directory to the `newDir.dirs` list,
|
|
|
|
|
which becomes the sub-directory's new owner.
|
|
|
|
|
|
|
|
|
|
Recall that we need to give `fromInput` the name of the new
|
|
|
|
|
sub-directory. We can figure out the name by slicing the string
|
|
|
|
|
starting after the `$ cd` prefix. Since I want to get the rest of the
|
|
|
|
|
characters after the prefix, I leave the end of my range unbounded, which
|
|
|
|
|
makes the slice go until the characters run out at the end of the string.
|
|
|
|
|
If you're feeling shaky on lists and `append`, check out our [day 5 article]({{< relref "aoc2022-day05-cratestacks" >}}#moving-crates-within-an-array-of-lists).
|
|
|
|
|
If you want a little refresher on slicing, we first covered it on [day 3]({{< relref "aoc2022-day03-rucksacks" >}}/ranges-and-slicing).
|
|
|
|
|
|
|
|
|
|
*/
|
|
|
|
|
} else if line.startsWith("$ cd ") {
|
|
|
|
|
param cdPrefix = "$ cd ";
|
|
|
|
|
const dirName = line[cdPrefix.size..];
|
|
|
|
|
newDir.dirs.append(Dir.fromInput(dirName));
|
|
|
|
|
/*
|
|
|
|
|
As it turns out, all that's left is to handle files. We already get
|
|
|
|
|
directory names from `cd`, so there's no reason to worry about
|
|
|
|
|
lines starting with `dir`. The `ls` command itself always precedes
|
|
|
|
|
the list of files and directories; by itself, it provides us no
|
|
|
|
|
additional information. Thus, our last case is a line that's neither
|
|
|
|
|
`dir` nor `ls`. Such a line is a file, so its format will be a number
|
|
|
|
|
followed by the file's name.
|
|
|
|
|
|
|
|
|
|
I use the `partition` method on the line to split it into three
|
|
|
|
|
pieces: the part before the space, the space itself, and the part
|
|
|
|
|
after the space. After that, I can just update the `newDir` map,
|
|
|
|
|
associating the file called `name` with its size. I use an integer cast
|
|
|
|
|
to convert `size` (a string) to a number.
|
|
|
|
|
*/
|
|
|
|
|
} else if !line.startsWith("$ ls") && !line.startsWith("dir") {
|
|
|
|
|
const (size, _, name) = line.partition(" ");
|
|
|
|
|
newDir.files[name] = size : int;
|
|
|
|
|
/*
|
|
|
|
|
That's it for the loop! Once the loop stops running, we know we're done
|
|
|
|
|
processing the directory. All that remains is to return it. Returning
|
|
|
|
|
an `owned` value from a function or method transfers ownership to whatever
|
|
|
|
|
code calls the function or method.
|
|
|
|
|
*/
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return newDir;
|
2022-12-06 23:02:10 -08:00
|
|
|
|
}
|
2022-12-08 22:33:30 -08:00
|
|
|
|
/*
|
|
|
|
|
One more thing: I have explicitly annotated the
|
|
|
|
|
return type of `fromInput` to be `owned Dir` to let Chapel know
|
|
|
|
|
that I'm using the `owned` memory management strategy. This might just
|
|
|
|
|
be the first return type annotation we've written so far. Up until now,
|
|
|
|
|
Chapel has been able to deduce the return types of our procedures
|
|
|
|
|
and iterators automatically. However, here, because we are using
|
|
|
|
|
recursion, it needs just a little bit of help: determining the types
|
|
|
|
|
in the body of `fromInput` requires knowing the type of `fromInput`!
|
|
|
|
|
The manual type annotation helps break that loop.
|
|
|
|
|
*/
|
2022-12-06 23:02:10 -08:00
|
|
|
|
|
|
|
|
|
/*
|
2022-12-08 22:33:30 -08:00
|
|
|
|
#### An Iterator Method for Listing Directory Sizes
|
|
|
|
|
Let's recap. What we have now is a data structure, `Dir`, which represents
|
|
|
|
|
the directory tree. We also have a type method, `Dir.fromInput` that
|
|
|
|
|
converts our puzzle input into this data structure. What's left?
|
|
|
|
|
|
|
|
|
|
The way I see it, the problem is composed of three pieces:
|
|
|
|
|
|
|
|
|
|
1. Go through all of the directory sizes...
|
|
|
|
|
2. ... ignoring those that are above a certain threshold ...
|
|
|
|
|
3. ... and sum them.
|
|
|
|
|
|
|
|
|
|
Over the past week, we've gotten really good at summing things! In
|
|
|
|
|
Chapel, we can just use `+reduce` to compute the sum of something
|
|
|
|
|
iterable, so there's point number three. For point two, it turns out that
|
|
|
|
|
those [loop expressions]({{< relref "aoc2022-day06-packets" >}}#parallel-loop-expressions)
|
|
|
|
|
from yesterday can be used to filter out elements like so:
|
|
|
|
|
|
|
|
|
|
```Chapel
|
|
|
|
|
[for i in iterable] if someCondition then i
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Putting these two pieces together, we might write something like:
|
|
|
|
|
```Chapel
|
|
|
|
|
+ reduce [for size in directorySizes] if size < 1000000 then size
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
That `directorySizes` is the only "fictional" piece of the solution.
|
|
|
|
|
Perhaps we can make our `Dir` tree support an iterator of directory sizes?
|
|
|
|
|
Then, we'd have our answer.
|
|
|
|
|
|
|
|
|
|
In my solution, I do just that. Methods on classes don't have to be procedures ---
|
|
|
|
|
they can also be iterators. There's only one complication. We want our
|
|
|
|
|
iterator method to yield the sizes of _all_ of the various sub-directories
|
|
|
|
|
within a `Dir` including sub-directories of sub-directories. That's because
|
|
|
|
|
we have to sum them all up as per the problem statement. However, when
|
|
|
|
|
_computing_ the size of a directory, we don't want to include sub-sub-directories
|
|
|
|
|
in our counting: the direct sub-directories already include the sizes of
|
|
|
|
|
their own contents. To make this work, I added a `parentSize` formal to
|
|
|
|
|
the iterator method, which represents a reference to the parent directory's
|
|
|
|
|
size. When it's done yielding its own size, as well as the sizes of the
|
|
|
|
|
sub-directories, the iterator method will add its own size to its parent's.
|
2022-12-06 23:02:10 -08:00
|
|
|
|
|
2022-12-08 22:33:30 -08:00
|
|
|
|
Here's the implementation of the iterator method; I'll talk about it in
|
|
|
|
|
more detail below.
|
|
|
|
|
*/
|
|
|
|
|
iter dirSizes(ref parentSize = 0): int {
|
|
|
|
|
// Compute sizes from files only.
|
|
|
|
|
var size = + reduce files.values();
|
|
|
|
|
for subDir in dirs {
|
|
|
|
|
// Yield directory sizes from the dir.
|
|
|
|
|
for subSize in subDir.dirSizes(size) do yield subSize;
|
|
|
|
|
}
|
|
|
|
|
yield size;
|
|
|
|
|
parentSize += size;
|
|
|
|
|
}
|
|
|
|
|
/*
|
|
|
|
|
The first thing this method does is create a new variable, `size`,
|
|
|
|
|
representing the current directory's size. It's initialized to the sum
|
|
|
|
|
of all the file sizes. However, at this point, that's not the whole size ---
|
|
|
|
|
we also need to figure out how much data is stored in the subdirectories.
|
|
|
|
|
|
|
|
|
|
I use a `for` loop over the `dirs` list to examine each sub-directory
|
|
|
|
|
of the current folder in turn. Each of these sub-directories is its
|
|
|
|
|
own full-fledged `Dir`, so we can call its `dirSizes`
|
|
|
|
|
method. This gives us an iterator of all directory sizes from `subDir`.
|
|
|
|
|
I simply yield them from the parent iterator, making it yield
|
|
|
|
|
the sizes of all directories, including nested ones. Notice that I also
|
|
|
|
|
provide `size` as the argument to the recursive call to `dirSizes`:
|
|
|
|
|
the inner for-loop serves the double purpose of yielding directory sizes
|
|
|
|
|
and finishing computing the current folder's size.
|
|
|
|
|
|
|
|
|
|
Once all of the sub-directory sizes have been yielded, the `size` variable
|
|
|
|
|
includes all the files in the folder, including nested ones. Thus, I use it to yield
|
|
|
|
|
the size of the current folder. I also add `size` to `parentSize`.
|
|
|
|
|
|
|
|
|
|
That concludes our `Dir` class!
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
|
|
|
|
|
{{< skip >}}
|
2022-12-06 23:02:10 -08:00
|
|
|
|
```Chapel
|
|
|
|
|
iter these(param tag: iterKind): (string, int) where tag == iterKind.standalone {
|
|
|
|
|
var size = + reduce files.values();
|
|
|
|
|
coforall dir in dirs with (+ reduce size) {
|
|
|
|
|
// Yield directory sizes from the dir.
|
|
|
|
|
forall subSize in dir do yield subSize;
|
|
|
|
|
// Count its size for our size.
|
|
|
|
|
size += dir.size;
|
|
|
|
|
}
|
|
|
|
|
yield (name, size);
|
|
|
|
|
this.size = size;
|
|
|
|
|
}
|
|
|
|
|
```
|
2022-12-08 22:33:30 -08:00
|
|
|
|
{{< /skip >}}
|
2022-12-06 23:02:10 -08:00
|
|
|
|
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
2022-12-08 22:33:30 -08:00
|
|
|
|
/*
|
|
|
|
|
### Putting It All Together
|
|
|
|
|
With our `Dir` class complete, we can finally make use of it in our code.
|
|
|
|
|
The first thing we need to do is read our file system from the input;
|
|
|
|
|
this is accomplished using the `fromInput` method.
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
var rootFolder = Dir.fromInput("/");
|
2022-12-07 00:02:53 -08:00
|
|
|
|
|
2022-12-08 22:33:30 -08:00
|
|
|
|
/*
|
|
|
|
|
Next up, we can use that `+reduce` expression I described above. I use
|
|
|
|
|
a new variable, `rootSize`, to represent the size of the top-level directory.
|
|
|
|
|
After the call to `dirSizes` completes, it will be set to the total size of
|
|
|
|
|
the root directory, i.e., the total disk usage. */
|
2022-12-07 00:02:53 -08:00
|
|
|
|
var rootSize = 0;
|
2022-12-08 22:33:30 -08:00
|
|
|
|
writeln(+ reduce [size in rootFolder.dirSizes(rootSize)] if size < 100000 then size);
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
I could've omitted the argument to `dirSizes` --- notice from the method's
|
|
|
|
|
signature that I provide a default value for `parentSize`.
|
|
|
|
|
|
|
|
|
|
```Chapel
|
|
|
|
|
iter dirSizes(ref parentSize = 0): int {
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
However, knowing `rootSize` lets us easily compute the amount of space we need
|
|
|
|
|
to free up (for part 2 of today's problem).
|
|
|
|
|
*/
|
|
|
|
|
const toDelete = rootSize - 40000000; // = 30000000 - (70000000 - rootSize)
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
We can now re-use our `dirSizes` stream to check every directory size again,
|
|
|
|
|
this time looking for the smallest folder that meets a certain threshold.
|
|
|
|
|
A `min` reduction takes care of this:
|
|
|
|
|
*/
|
|
|
|
|
writeln(min reduce [size in rootFolder.dirSizes()] if size >= toDelete then size);
|
|
|
|
|
|
|
|
|
|
/* And there's the solution to part 2, as well! */
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
### Summary
|
|
|
|
|
This concludes today's description of my solution. This time, I introduced
|
|
|
|
|
Chapel's classes --- defining them, creating fields and adding methods. We got
|
|
|
|
|
a little taste of memory management strategies and ownership, though I deliberately
|
|
|
|
|
kept it light to avoid introducing too many new concepts.
|
|
|
|
|
|
|
|
|
|
Admittedly, today's solution is (for the most part) serial. Although the
|
|
|
|
|
`+reduce` expression that computes the initial `size` of a directory from
|
|
|
|
|
its `files` is eligible for parallelization, the `dirSizes` iterator is not. The main
|
|
|
|
|
reason for this is that the interaction between recursive parallel iterators and
|
|
|
|
|
reductions is, at the time of writing, unimplemented.
|
|
|
|
|
Nevertheless, I think that using even a serial iterator has _yielded_ an elegant
|
|
|
|
|
solution (pun intended).
|
|
|
|
|
|
|
|
|
|
If you wanted to write a parallel version, I'd advise creating a new,
|
|
|
|
|
non-iterator method on `Dir` that solves just part 1 of today's puzzle.
|
|
|
|
|
This method could return a tuple of two elements, perhaps `sumSmallSizes`
|
|
|
|
|
and `dirSize`; then, a simple `forall` loop over `dirs` (and judicious use of reduce intents,
|
|
|
|
|
which are described in our [day 4 article]({{< relref "aoc2022-day04-ranges" >}}third-solution-parallel-approach))
|
|
|
|
|
will let you compute the answer in parallel.
|
2022-12-06 23:02:10 -08:00
|
|
|
|
|
2022-12-08 22:33:30 -08:00
|
|
|
|
Thanks for reading! Please feel free
|
|
|
|
|
to ask any questions or post any comments you have in the new [Blog
|
|
|
|
|
Category](https://chapel.discourse.group/c/blog/21) of Chapel's
|
|
|
|
|
Discourse Page. */
|