270 lines
13 KiB
Markdown
270 lines
13 KiB
Markdown
---
|
|
title: Meaningfully Typechecking a Language in Idris
|
|
date: 2020-02-27T21:58:55-08:00
|
|
tags: ["Haskell", "Idris", "Programming Languages"]
|
|
---
|
|
|
|
This term, I'm a TA for Oregon State University's Programming Languages course.
|
|
The students in the course are tasked with using Haskell to implement a programming
|
|
language of their own design. One of the things they can do to gain points for the
|
|
project is implement type checking, rejecting
|
|
{{< sidenote "right" "ill-typed-note" "ill-typed programs or expressions" >}}
|
|
Whether or not the below example is ill-typed actually depends on your language.
|
|
Many languages (even those with a static type system, like C++ or Crystal)
|
|
have a notion of "truthy" and "falsy" values. These values can be used
|
|
in the condition of an if-expression, and will be equivalent to "true" or "false",
|
|
respectively. However, for simplicity, I will avoid including
|
|
truthy and falsy values into the languages in this post. For the same reason, I will avoid
|
|
reasoning about
|
|
<a href="https://developer.mozilla.org/en-US/docs/Glossary/Type_coercion">type coercions</a>,
|
|
which make expressions like <code>"Hello"+3</code> valid.
|
|
{{< /sidenote >}} such as:
|
|
|
|
```Haskell
|
|
if "Hello" then 0 else 1
|
|
```
|
|
|
|
For instance, a student may have a function `typecheck`, with the following
|
|
signature (in Haskell):
|
|
|
|
```Haskell
|
|
typecheck :: Expr -> Either TypeError ExprType
|
|
```
|
|
|
|
The function will return an error if something goes wrong, or, if everything
|
|
goes well, the type of the given expression. So far, so good.
|
|
|
|
A student asked, however:
|
|
|
|
> Now that I ran type checking on my program, surely I don't need to include errors
|
|
in my {{< sidenote "right" "valuation-function-note" "valuation function!" >}}
|
|
I'm using "valuation function" here in the context of
|
|
<a href="https://en.wikibooks.org/wiki/Haskell/Denotational_semantics">denotational semantics</a>.
|
|
In short, a
|
|
<a href="http://www.inf.ed.ac.uk/teaching/courses/inf2a/readings/semantics-note.pdf">valuation function</a>
|
|
takes an expression and assigns to it some
|
|
representation of its meaning. For a language of arithmetic expression, the
|
|
"meaning" of an expression is just a number (the result of simplifying the expression).
|
|
For a language of booleans, <code>and</code>, and <code>or</code>, the "meaning" is a boolean
|
|
for the same reason. Since an expression in the language can be ill-formed (like
|
|
<code>list(5)</code> in Python), the "meaning" (<em>semantic domain</em>) of a
|
|
complicated language tends to include the possibility of errors.
|
|
{{< /sidenote >}} I should be able to make my function be of type `Expr -> Val`, and not
|
|
`Expr -> Maybe Val`!
|
|
|
|
Unfortunately, this is not quite true. It is true that if the student's type checking
|
|
function is correct, then there will be no way for a type error to occur during
|
|
the evaluation of an expression "validated" by said function. The issue is, though,
|
|
that __the type system does not know about the expression's type-correctness__. Haskell
|
|
doesn't know that an expression has been type checked; worse, since the function's type
|
|
indicates that it accepts `Expr`, it must handle invalid expressions to avoid being [partial](https://wiki.haskell.org/Partial_functions). In short, even if we __know__ that the
|
|
expressions we give to a function are type safe, we have no way of enforcing this.
|
|
|
|
A potential solution offered in class was to separate the expressions into several
|
|
data types, `BoolExpr`, `ArithExpr`, and finally, a more general `Expr'` that can
|
|
be constructed from the first two. Operations such as `and` and `or`
|
|
will then only be applicable to boolean expressions:
|
|
|
|
```Haskell
|
|
data BoolExpr = BoolLit Bool | And BoolExpr BoolExpr | Or BoolExpr BoolExpr
|
|
```
|
|
|
|
It will be a type error to represent an expression such as `true or 5`. Then,
|
|
`Expr'` may have a constructor such as `IfElse` that only accepts a boolean
|
|
expression as the first argument:
|
|
|
|
```Haskell
|
|
data Expr' = IfElse BoolExpr Expr' Expr' | ...
|
|
```
|
|
|
|
All seems well. Now, it's impossible to have a non-boolean condition, and thus,
|
|
this error has been eliminated from the evaluator. Maybe we can even have
|
|
our type checking function translate an unsafe, potentially incorrect `Expr` into
|
|
a more safe `Expr'`:
|
|
|
|
```Haskell
|
|
typecheck :: Expr -> Either TypeError (Expr', ExprType)
|
|
```
|
|
|
|
However, we typically also want the branches of an if-expression to both have the same
|
|
type - `if x then 3 else False` may work sometimes, but not always, depending of the
|
|
value of `x`. How do we encode this? Do we have two constructors, `IfElseBool` and
|
|
`IfElseInt`, with one in `BoolExpr` and the other in `ArithExpr`? What if we add strings?
|
|
We'll be copying functionality back and forth, and our code will suffer. Wouldn't it be
|
|
nice if we could somehow tag our expressions with the type they produce? Instead of
|
|
`BoolExpr` and `ArithExpr`, we would be able to have `Expr BoolType` and `Expr IntType`,
|
|
which would share the `IfElse` constructor...
|
|
|
|
It's not easy to do this in canonical Haskell, but it can be done in Idris!
|
|
|
|
### Enter Dependent Types
|
|
Idris is a language with support for [dependent types](https://en.wikipedia.org/wiki/Dependent_type). Wikipedia gives the following definition for "dependent type":
|
|
|
|
> In computer science and logic, a dependent type is a type whose definition depends on a value.
|
|
|
|
This is exactly what we want. In Idris, we can define the possible set of types in our
|
|
language:
|
|
|
|
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 1 4>}}
|
|
|
|
Then, we can define a `SafeExpr` type family, which is indexed by `ExprType`.
|
|
Here's the
|
|
{{< sidenote "right" "gadt-note" "code," >}}
|
|
I should probably note that the definition of <code>SafeExpr</code> is that of
|
|
a
|
|
<a href="https://en.wikipedia.org/wiki/Generalized_algebraic_data_type">Generalized Algebraic Data Type</a>,
|
|
or GADT for short. This is what allows each of our constructors to produce
|
|
values of a different type: <code>IntLiteral</code> builds <code>SafeExpr IntType</code>,
|
|
while <code>BoolLiteral</code> builds <code>SafeExpr BoolType</code>.
|
|
{{</ sidenote >}} which we will discuss below:
|
|
|
|
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 23 27 >}}
|
|
|
|
The first line of the above snippet says, "`SafeExpr` is a type constructor
|
|
that requires a value of type `ExprType`". For example, we can have
|
|
`SafeExpr IntType`, or `SafeExpr BoolType`. Next, we have to define constructors
|
|
for `SafeExpr`. One such constructor is `IntLiteral`, which takes a value of
|
|
type `Int` (which represents the value of the integer literal), and builds
|
|
a value of `SafeExpr IntType`, that is, an expression that __we know evaluates
|
|
to an integer__.
|
|
|
|
The same is the case for `BoolLiteral` and `StringLiteral`, only they build
|
|
values of type `SafeExpr BoolType` and `SafeExpr StringType`, respectively.
|
|
|
|
The more complicated case is that of `BinOperation`. Put simply, it takes
|
|
a binary function of type `a->b->c` (kind of), two `SafeExpr`s producing `a` and `b`,
|
|
and combines the values of those expressions using the function to generate
|
|
a value of type `c`. Since the whole expression returns `c`, `BinOperation`
|
|
builds a value of type `SafeExpr c`.
|
|
|
|
That's almost it. Except, what's up with `repr`? We need it because `SafeExpr`
|
|
is parameterized by a __value__ of type `ExprType`. Thus, `a`, `b`, and `c` are
|
|
all values in the definition of `BinOperation`. However, in a function
|
|
`input->output`, both `input` and `output` have to be __types__, not values.
|
|
Thus, we define a function `repr` which converts values such as `IntType` into
|
|
the actual type that `eval` would yield when running our expression:
|
|
|
|
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 6 9 >}}
|
|
|
|
The power of dependent types allows us to run `repr` inside the type
|
|
of `BinOp` to compute the type of the function it must accept.
|
|
|
|
Now, we have a way to represent expressions that are guaranteed to be type safe.
|
|
With this, we can make our `typeheck` function convert an `Expr` to a `SafeExpr`.
|
|
Wait a minute, though! We can't _just_ return `SafeExpr`: it's a type constructor!
|
|
We need to somehow return `SafeExpr a`, where `a` is a value of type `ExprType`. But
|
|
it doesn't make sense for the return type to have a new type variable that didn't
|
|
occur in the rest of the type signature. It would be ideal if we could return both
|
|
the type of the expression, and a `SafeExpr` of that type.
|
|
|
|
In fact, we can!
|
|
|
|
Idris has something called _dependent pairs_, which are like normal pairs, but
|
|
in which the type of the second element depends on the value of the first element.
|
|
The canonical example of this is a pair of (list length, list of that many elements).
|
|
For instance, in Idris, we can write:
|
|
|
|
```Idris
|
|
listPair : (n : Nat ** Vec n Int)
|
|
```
|
|
|
|
In the above snippet, we declare the type for a pair of a natural number (`n : Nat`) and a list of
|
|
integers (`Vect n Int`), where the number of elements in the list is equal to the natural number. Let's
|
|
try applying this to our problem. We want to return an `ExprType`, and a `SafeExpr` which
|
|
depends on that `ExprType`. How about this:
|
|
|
|
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 36 36 >}}
|
|
|
|
Given an expression, we return either an error (`String`) or a dependent pair, which
|
|
contains some `ExprType` `n` and a `SafeExpr` that evaluates to a value of type `n`.
|
|
We can even start implementing this function, starting with literals:
|
|
|
|
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 37 39 >}}
|
|
|
|
Note the use of `_`. Since `IntLiteral` always produces `SafeExpr IntType`, we allow
|
|
Idris to infer that `IntType` must be the first element of the tuple. This is easy
|
|
enough, because a boolean, integer, or string literal can never be type-incorrect.
|
|
|
|
The interesting case is that of a binary operation. Is `"hello" * 3` invalid? It might
|
|
be, but some languages evaluate the multiplication of a string by a number as repeating
|
|
the string that many times: `"hellohellohello"`. It is up to us, the language designers,
|
|
to specify the set of valid operations. Furthermore, observe that `BinOperation`
|
|
takes a _function_ as its first argument, not an `Op`. To guarantee that we can,
|
|
in fact, evaluate a `BinOperation` to the promised type, we require that the means
|
|
of performing the evaluation is included in the expression. Thus, when we convert `Expr`
|
|
to `SafeExpr`, we need to convert an `Op` to a corresponding function. As we can see
|
|
with `"hello"*3` and `163*2`, an `Op` can correspond to a different function
|
|
depending on the types of its inputs. To deal with this, we define a new function `typecheckOp`,
|
|
which takes an `Op` and two expression types, and returns either an error (if the `Op` can't
|
|
be applied to those types) or a dependent pair containing the output type of the operation
|
|
and a function that performs the required computation. That's a mouthful; let's look at the code:
|
|
|
|
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 29 34 >}}
|
|
|
|
When `(+)` is applied to two integers, this is not an error, and the result is also
|
|
an integer. To perform addition, we use Idris' built-in function `(+)`. The
|
|
same is true for all other arithmetic operations in this example. In all other
|
|
cases, we simply return an error. We can now use `typecheckOp` in our `typecheck`
|
|
function:
|
|
|
|
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 40 44 >}}
|
|
|
|
Here, we use do-notation to first type check first the left, then the right subexpression.
|
|
Since the result of type checking the subexpressions gives us their output types,
|
|
we can feed these types, together with `o`, to `typecheckOp` to determine the output
|
|
type and the applicable evaluation function. Finally, we assemble the new `SafeExpr`
|
|
from the function and the two translated subexpressions.
|
|
|
|
Alright, we've done all this work. Is it worth it? Let's try implementing `eval`:
|
|
|
|
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 46 50 >}}
|
|
|
|
That's it! No `Maybe`, no error cases. `eval` is completely total, but doesn't require
|
|
error handling because it __knows that the expression it is evaluating is type-correct__!
|
|
|
|
Let's run all of this. We'll need some code to print the result of evaluating an expression.
|
|
Here's all that:
|
|
|
|
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 52 64 >}}
|
|
|
|
And the output is:
|
|
|
|
```
|
|
>>> idris TypesafeIntr.idr -o typesafe
|
|
>>> ./typesafe
|
|
326
|
|
```
|
|
|
|
That's right! What about a type-incorrect example?
|
|
|
|
```Idris
|
|
BinOp Add (IntLit 6) (BinOp Multiply (IntLit 160) (StringLit "hi"))
|
|
```
|
|
|
|
The program reports:
|
|
|
|
```
|
|
Type error: Invalid binary operator application
|
|
```
|
|
|
|
Awesome!
|
|
|
|
### Wrapping Up
|
|
In this post, we learned that type checking can be used to translate an expression into
|
|
a more strongly-typed data type, which can be (more) safe to evaluate. To help
|
|
strengthen the types of
|
|
{{< sidenote "right" "if-else-note" "our expression language," >}}
|
|
You may be thinking, "but where did the if-expressions go?". It turns out
|
|
that making sure that the branches of an if-expression are of the same
|
|
type is actually a fairly difficult task; the best way I found was
|
|
enumerating all the possible "valid" combinations of types in a case-expression.
|
|
Since this is obviously not the right solution, I decided to publish what I have,
|
|
and look for an alternative. If I find a better solution, I will write a follow-up
|
|
post.
|
|
{{< /sidenote >}} we used the Idris language and
|
|
its support for dependent types and Generalized Algebraic Data Types (GADTs).
|
|
I hope this was interesting!
|
|
|
|
As usual, you can find the code for this post in this website's Git repository. The
|
|
source file we went through today is found [here](https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/typesafe-interpreter/TypesafeIntr.idr).
|