Compare commits

...

3 Commits

Author SHA1 Message Date
a441280812 Add article about Crystal and Nix with OpenSSL.
Some checks failed
continuous-integration/drone/push Build is failing
2020-04-26 19:56:15 -07:00
eda9bbb191 Add more to part 12 of compiler series 2020-04-25 18:07:32 -07:00
2d9da2899f Switch to no line breaks (for Ghostwriter support) 2020-04-25 15:45:15 -07:00
2 changed files with 157 additions and 41 deletions

View File

@ -5,30 +5,17 @@ tags: ["C and C++", "Functional Languages", "Compilers"]
draft: true
---
Now that our language's type system is more fleshed out and pleasant to use,
it's time to shift our focus to the ergonomics of the language itself. I've been
mentioning `let/in` expressions and __lambda expressions__ for a while now.
The former will let us create names for expressions that are limited to
a certain scope (without having to create global variable bindings), while
the latter will allow us to create functions without giving them any name at
all.
Now that our language's type system is more fleshed out and pleasant to use, it's time to shift our focus to the ergonomics of the language itself. I've been mentioning `let/in` expressions and __lambda expressions__ for a while now. The former will let us create names for expressions that are limited to a certain scope (without having to create global variable bindings), while the latter will allow us to create functions without giving them any name at all.
Let's take a look at `let/in` expressions first, to make sure we're all on
the same page about what it is we're trying to implement. Let's
start with some rather basic examples, and then move on to more
complex examples. The most basic use of a `let/in` expression is, in Haskell:
Let's take a look at `let/in` expressions first, to make sure we're all on the same page about what it is we're trying to implement. Let's start with some rather basic examples, and then move on to more complex examples. The most basic use of a `let/in` expression is, in Haskell:
```Haskell
let x = 5 in x + x
```
In the above example, we bind the variable `x` to the value `5`, and then
refer to `x` twice in the expression after the `in`. The whole snippet is one
expression, evaluating to what the `in` part evaluates to. Additionally,
the variable `x` does not escape the expression -
In the above example, we bind the variable `x` to the value `5`, and then refer to `x` twice in the expression after the `in`. The whole snippet is one expression, evaluating to what the `in` part evaluates to. Additionally, the variable `x` does not escape the expression -
{{< sidenote "right" "used-note" "it cannot be used anywhere else." >}}
Unless, of course, you bind it elsewhere; naturally, using <code>x</code>
here does not forbid you from re-using the variable.
Unless, of course, you bind it elsewhere; naturally, using <code>x</code> here does not forbid you from re-using the variable.
{{< /sidenote >}}
Now, consider a slightly more complicated example:
@ -41,31 +28,17 @@ Here, we're defining a _function_ `sum`,
{{< sidenote "right" "eta-note" "which takes a single argument:" >}}
Those who favor the
<a href="https://en.wikipedia.org/wiki/Tacit_programming#Functional_programming">point-free</a>
programming style may be slightly twitching right now, the words
<em>eta reduction</em> swirling in their mind. What do you know,
<code>fold</code>-based <code>sum</code> is even one of the examples
on the Wikipedia page! I assure you, I left the code as you see it
deliberately, to demonstrate a principle.
{{< /sidenote >}} the list to be summed. We will want this to be valid
in our language, as well. We will soon see how this particular feature
is related to lambda functions, and why I'm covering these two features
in the same post.
programming style may be slightly twitching right now, the words <em>eta reduction</em> swirling in their mind. What do you know, <code>fold</code>-based <code>sum</code> is even one of the examples on the Wikipedia page! I assure you, I left the code as you see it deliberately, to demonstrate a principle.
{{< /sidenote >}} the list to be summed. We will want this to be valid in our language, as well. We will soon see how this particular feature is related to lambda functions, and why I'm covering these two features in the same post.
Let's step up the difficulty a bit more, with an example that,
{{< sidenote "left" "translate-note" "though it does not immediately translate to our language," >}}
The part that doesn't translate well is the whole deal with patterns in
function arguments, as well as the notion of having more than one equation
for a single function, as is the case with <code>safeTail</code>.
The part that doesn't translate well is the whole deal with patterns in function arguments, as well as the notion of having more than one equation for a single function, as is the case with <code>safeTail</code>.
<br><br>
It's not that these things are <em>impossible</em> to translate; it's just
that translating them may be worthy of a post in and of itself, and would only
serve to bloat and complicate this part. What can be implemented with
pattern arguments can just as well be implemented using regular case expressions;
I dare say most "big" functional languages actually just convert from the
former to the latter as part of the compillation process.
It's not that these things are <em>impossible</em> to translate; it's just that translating them may be worthy of a post in and of itself, and would only serve to bloat and complicate this part. What can be implemented with pattern arguments can just as well be implemented using regular case expressions; I dare say most "big" functional languages actually just convert from the former to the latter as part of the compillation process.
{{< /sidenote >}} illustrates another important principle:
```Haskell
```Haskell {linenos=table}
let
safeTail [] = Nothing
safeTail [x] = Just x
@ -75,11 +48,9 @@ in
myTail
```
The principle here is that definitions in `let/in` can be __recursive and
polymorphic__. Remember the note in
The principle here is that definitions in `let/in` can be __recursive and polymorphic__. Remember the note in
[part 10]({{< relref "10_compiler_polymorphism.md" >}}) about
[let-polymorphism](https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system#Let-polymorphism)? This is it: we're allowing polymorphic variable bindings,
but only when they're bound in a `let/in` expression (or at the top level).
[let-polymorphism](https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system#Let-polymorphism)? This is it: we're allowing polymorphic variable bindings, but only when they're bound in a `let/in` expression (or at the top level).
The principles demonstrated by the last two snippets mean that compiling `let/in` expressions, at least with the power we want to give them, will require the same kind of dependency analysis we had to go through when we implemented polymorphically typed functions. That is, we will need to analyze which functions calls which other functions, and typecheck the callees before the callers. We will continue to represent callee-caller relationships using a dependency graph, in which nodes represent functions, and an edge from one function node to another means that the former function calls the latter. Below is an image of one such graph:
@ -99,3 +70,36 @@ Things are more complicated now that `let/in` expressions are able to introduce
In the above image, some of the original nodes in our graph now contain other, smaller graphs. Those subgraphs are the graphs created by function declarations in `let/in` expressions. Just like our top-level nodes, the nodes of these smaller graphs can depend on other nodes, and even form cycles. Within each subgraph, we will have to perform the same kind of cycle detection, resulting in something like this:
{{< figure src="fig_subgraphs_colored_all.png" caption="Augmented dependency graph with mutually recursive groups highlighted." >}}
When typechecking a function, we must be ready to perform dependency analysis at any point. What's more is that the free variable analysis we used to perform must now be extended to differentiate between free variables that refer to "nearby" definitions (i.e. within the same `let/in` expression), and "far away" definitions (i.e. outside of the `let/in` expression). And speaking of free variables...
What do we do about variables that are captured by a local definition? Consider the following snippet:
```Haskell {linenos=table}
addToAll n xs = map addSingle xs
where
addSingle x = n + x
```
In the code above, the variable `n`, bound on line 1, is used by `addSingle` on line 3. When a function refers to variables bound outside of itself (as `addSingle` does), it is said to be _capturing_ these variables, and the function is called a _closure_. Why does this matter? On the machine level, functions are represented as sequences of instructions, and there's a finite number of them (as there is finite space on the machine). But there is an infinite number of `addSingle` functions! When we write `addToAll 5 [1,2,3]`, `addSingle` becomes `5+x`. When, on the other hand, we write `addToAll 6 [1,2,3]`, `addSingle` becomes `6+x`. There are certain ways to work around this - we could, for instance, dynamically create machine code in memory, and then execute it (this is called [just-in-time compilation](https://en.wikipedia.org/wiki/Just-in-time_compilation)). This would end up with a collections of runtime-defined functions that can be represented as follows:
```Haskell {linenos=table}
-- Version of addSingle when n = 5
addSingle5 x = 5 + x
-- Version of addSingle when n = 6
addSingle6 x = 6 + x
-- ... and so on ...
```
But now, we end up creating several functions with almost identical bodies, with the exception of the free variables themselves. Wouldn't it be better to perform the well-known strategy of reducing code duplication by factoring out parameters, and leaving only instance of the repeated code? We would end up with:
```Haskell {linenos=table}
addToAll n xs = map (addSingle n) xs
addSingle n x = n + x
```
Observe that we no longer have the "infinite" number of functions - the infinitude of possible behaviors is created via currying. Also note that `addSingle`
{{< sidenote "right" "global-note" "is now declared at the global scope," >}}
Wait a moment, didn't we just talk about nested polymorphic definitions, and how they change our typechecking model? If we transform our program into a bunch of global definitions, we don't need to make adjustments to our typechecking. <br><br>
This is true, but why should we perform transformations on a malformed program? Typechecking before pulling functions to the global scope will help us save the work, and breaking down one dependency-searching problem (which is \(O(n^3)\) thanks to Warshall's) into smaller, independent problems may even lead to better performance. Furthermore, typechecking before program transformations will help us come up with more helpful error messages.
{{< /sidenote >}} and can be transformed into a sequence of instructions just like any other global function. It has been pulled from its `where` (which, by the way, is pretty much equivalent to a `let/in`) to the top level.
This technique of replacing captured variables with arguments, and pulling closures into the global scope to aid compilation, is called [Lambda Lifting](https://en.wikipedia.org/wiki/Lambda_lifting). Its name is no coincidence - lambda functions need to undergo the same kind of transformation as our nested definitions (unlike nested definitions, though, lambda functions need to be named). This is why they are included in this post together with `let/in`!

View File

@ -0,0 +1,112 @@
---
title: Building a Crystal Project with Nix, Revisited
date: 2020-04-26T18:37:22-07:00
tags: ["Crystal", "Nix"]
---
As I've described in my [previous post]({{< relref "crystal_nix.md" >}}), the process for compiling a Crystal project with Nix is a fairly straightforward one. As is standard within the Nix ecosystem, the project's dependencies, as specified by the source language's build system (shards, in Crystal's case), are converted into a Nix expression (`shards.nix`). These dependencies are then used in a derivation, which, in Crystal's case, can take advantage of `buildCrystalPackage` to reduce boilerplate build scripts. All is well.
Things start to fall apart a little bit when the Crystal project being built is more complex. The predefined infrastructure (like `buildCrystalPackage`)
{{< sidenote "right" "versatility-note" "is not written with versatility in mind," >}}
This is not a bad thing at all; it's much better to get something working for the practical case, rather than concoct an overcomplicated solution that covers all theoretically possible cases.
{{< /sidenote >}} though it seems to work exceptionally in the common case. Additionally, I discovered that the compiler itself has some quirks, and have killed a few hours of my time trying to figure out some unexpected behaviors.
This post will cover the extra, more obscure steps I had to take to build an HTTPS-enabled Crystal project.
### First Problem: Git-Based Dependencies
A lot of my projects use Crystal libraries that are not hosted on GitHub at all; I use a private Git server, and most of my non-public code resides on it. The Crystal people within Nix don't seem to like this: let's look at the code for `crystal2nix.cr` file in the [nixpkgs repository](https://github.com/NixOS/nixpkgs/blob/1ffdf01777360f548cc7c10ef5b168cbe78fd183/pkgs/development/compilers/crystal/crystal2nix.cr). In particular, consider lines 18 and 19:
```Crystal {linenos=table,linenostart=18}
yaml.shards.each do |key, value|
owner, repo = value["github"].split("/")
```
Ouch! If you as much as mention a non-GitHub repository in your `shards.lock` file, you will experience a good old uncaught exception. Things don't end there, either. Nix provides a convenient `fetchFromGitHub` function, which only requires a repository name and its enclosing namespace (user or group). `crystal2nix` uses this, by generating a file with that information:
```Crystal {linenos=table,linenostart=34}
file.puts %( #{key} = {)
file.puts %( owner = "#{owner}";)
file.puts %( repo = "#{repo}";)
file.puts %( rev = "#{rev}";)
file.puts %( sha256 = "#{sha256}";)
file.puts %( };)
```
And, of course, `build-package.nix` (of which [this is the version at the time of writing](https://github.com/NixOS/nixpkgs/blob/912eb6b120eba15237ff053eafc4b5d90577685b/pkgs/development/compilers/crystal/build-package.nix)) uses this to declare dependencies:
```Nix {linenos=table,linenostart=26}
crystalLib = linkFarm "crystal-lib" (lib.mapAttrsToList (name: value: {
inherit name;
path = fetchFromGitHub value;
}) (import shardsFile));
```
This effectively creates a folder of dependencies cloned from GitHub, which is then placed into `lib` as if `shards` was run:
```Nix {linenos=table,linenostart=37}
configurePhase = args.configurePhase or lib.concatStringsSep "\n" ([
"runHook preConfigure"
] ++ lib.optional (lockFile != null) "ln -s ${lockFile} ./shard.lock"
++ lib.optional (shardsFile != null) "ln -s ${crystalLib} lib"
++ [ "runHook postConfigure "]);
```
Sleek, except that there's no place in this flow for dependencies based _only_ on Git! `crystalLib` is declared locally in a `let/in` expression, and we don't have access to it; neither can we call `linkFarm` again, since this results in a derivation, which, with different inputs, will be created at a different path. To work around this, I made my own Nix package, called `customCrystal`, and had it pass several modifications to `buildCrystalPackage`:
```Nix
{ stdenv, lib, linkFarm, fetchgit, fetchFromGitHub }:
{ crystal,
gitShardsFile ? null,
lockFile ? null,
shardsFile ? null, ...}@args:
let
buildArgs = builtins.removeAttrs args [ "crystal" ];
githubLinks = lib.mapAttrsToList (name: value: {
inherit name;
path = fetchFromGitHub value;
}) (import shardsFile);
gitLinks = lib.mapAttrsToList (name: value: {
inherit name;
path = fetchgit { inherit (value) url rev sha256; };
}) (import gitShardsFile);
crystalLib = linkFarm "crystal-lib" (githubLinks ++ gitLinks);
configurePhase = args.configurePhase or lib.concatStringsSep "\n" ([
"runHook preConfigure"
] ++ lib.optional (lockFile != null) "ln -s ${lockFile} ./shard.lock"
++ lib.optional (shardsFile != null) "ln -s ${crystalLib} lib"
++ [ "runHook postConfigure "]);
in
crystal.buildCrystalPackage (buildArgs // { inherit configurePhase; })
```
This does pretty much the equivalent of what `buildCrystalPackage` does (indeed, it does the heavy lifting). However, this snippet also retrieves Git repositories from the `gitShardsFile`, and creates the `lib` folder using both Git and GitHub dependencies. I didn't bother writing a `crystal2nix` equivalent for this, since I only had a couple of dependencies. I invoked my new function like `buildCrystalPackage`, with the addition of passing in the Crystal package, and that problem was solved.
### Second Problem: OpenSSL
The package I was trying to build used Crystal's built-in HTTP client, which, in turn, required OpenSSL. This, I thought, would be rather straightforward: add `openssl` to my package's `buildInputs`, and be done with it. It was not as simple, though, and I was greeted with a wall of errors like this one:
```
/nix/store/sq2b0dqlq243mqn4ql5h36xmpplyy20k-binutils-2.31.1/bin/ld: _main.o: in function `__crystal_main':
main_module:(.text+0x6f0): undefined reference to `SSL_library_init'
/nix/store/sq2b0dqlq243mqn4ql5h36xmpplyy20k-binutils-2.31.1/bin/ld: main_module:(.text+0x6f5): undefined reference to `SSL_load_error_strings'
/nix/store/sq2b0dqlq243mqn4ql5h36xmpplyy20k-binutils-2.31.1/bin/ld: main_module:(.text+0x6fa): undefined reference to `OPENSSL_add_all_algorithms_noconf'
/nix/store/sq2b0dqlq243mqn4ql5h36xmpplyy20k-binutils-2.31.1/bin/ld: main_module:(.text+0x6ff): undefined reference to `ERR_load_crypto_strings'
/nix/store/sq2b0dqlq243mqn4ql5h36xmpplyy20k-binutils-2.31.1/bin/ld: _main.o: in function `*HTTP::Client::new<String, (Int32 | Nil), Bool>:HTTP::Client':
```
Some snooping led me to discover that these symbols were part of OpenSSL 1.0.2, support for which ended in 2019. OpenSSL 1.1.0 has these symbols deprecated, and from what I can tell, they might be missing from the `.so` file altogether. I tried changing the package to specifically accept OpenSSL 1.0.2, but that didn't work, either: for some reason, the Crystal kept running the `gcc` command with `-L...openssl-1.1.0`. It also seemed like the compiler itself was built against the most recent version of OpenSSL, so what's the issue? I discovered this is a problem in the compiler itself. Consider the following line from Crystal's `openssl/lib_ssl.cr` [source file](https://github.com/crystal-lang/crystal/blob/0.34.0/src/openssl/lib_ssl.cr):
```Crystal {linenos=table,linenostart=8}
{% ssl_version = `hash pkg-config 2> /dev/null && pkg-config --silence-errors --modversion libssl || printf %s 0.0.0`.split.last.gsub(/[^0-9.]/, "") %}
```
Excuse me? If `pkg-config` is not found (which, in Nix, it won't be by default), Crystal assumes that it's using the _least_ up-to-date version of OpenSSL,
{{< sidenote "right" "version-note" "indicated by version code 0.0.0." >}}
The Crystal compiler compares version numbers based on semantic versioning, it seems, and 0.0.0 will always compare to be less than any other version of OpenSSL. Thus, code 0.0.0 indicates that Crystal should assume it's dealing with an extremely old version of OpenSSL.
{{< /sidenote >}} This matters, because later on in the file, we get this beauty:
```Crystal {linenos=table,linenostart=215}
{% if compare_versions(OPENSSL_VERSION, "1.1.0") >= 0 %}
fun tls_method = TLS_method : SSLMethod
{% else %}
fun ssl_library_init = SSL_library_init
fun ssl_load_error_strings = SSL_load_error_strings
fun sslv23_method = SSLv23_method : SSLMethod
{% end %}
```
That would be where the linker errors are coming from. Adding `pkg-config`to `buildInputs` along with `openssl` fixes the issue, and my package builds without problems.
### Conclusion
Crystal is a rather obscure language, and Nix is a rather obscure build system. I'm grateful that the infrastructure I'm using exists, and that using it is as streamlined as it is. There is, however, always room for improvement. If I have time, I will be opening pull requests for the `crystal2nix` tool on GitHub (to allow Git-based repositories), and perhaps on the Crystal compiler as well (to try figure out what to do about `pkg-config`). If someone else wants to do it themselves, I'd be happy to hear how it goes! Otherwise, I hope you found this post useful.