blog-static/content/blog/typesafe_interpreter_revisi...

16 KiB

title date tags series favorite
Meaningfully Typechecking a Language in Idris, Revisited 2020-07-22T14:37:35-07:00
Idris
Programming Languages
Meaningfully Typechecking a Language in Idris true

Some time ago, I wrote a post titled [Meaningfully Typechecking a Language in Idris]({{< relref "typesafe_interpreter.md" >}}). The gist of the post was as follows:

  • Programming Language Fundamentals students were surprised that, despite having run their expression through (object language) typechecking, they still had to have a Maybe type in their evaluation functions. This was due to the fact that the (meta language) type system was not certain that (object language) typechecking worked.
  • A potential solution was to write separate expression types such as ArithExpr and BoolExpr, which are known to produce booleans or integers. However, this required the re-implementation of most of the logic for IfElse, for which the branches could have integers, booleans, or strings.
  • An alternative solution was to use dependent types, and index the Expr type with the type it evaluates to. We defined a data type data ExprType = IntType | StringType | BoolType, and then were able to write types like SafeExpr IntType that we knew would evaluate to an integer, or SafeExpr BoolType, which we also knew would evaluate to a boolean. We then made our typecheck function return a pair of (type, SafeExpr of that type).

Unfortunately, I think that post is rather incomplete. I noted at the end of it that I was not certain on how to implement if-expressions, which were my primary motivation for not just sticking with ArithExpr and BoolExpr. It didn't seem too severe then, but now I just feel like a charlatan. Today, I decided to try again, and managed to figure it out with the excellent help from people in the #idris channel on Freenode. It required a more advanced use of dependent types: in particular, I ended up using Idris' theorem proving facilities to get my code to pass typechecking. In this post, I will continue from where we left off in the previous post, adding support for if-expressions.

Let's start with the new Expr and SafeExpr types. Here they are:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 37 49 >}}

For Expr, the IfElse constructor is very straightforward. It takes three expressions: the condition, the 'then' branch, and the 'else' branch. With SafeExpr and IfThenElse, things are more rigid. The condition of the expression has to be of a boolean type, so we make the first argument SafeExpr BoolType. Also, the two branches of the if-expression have to be of the same type. We encode this by making both of the input expressions be of type SafeExpr t. Since the result of the if-expression will be the output of one of the branches, the whole if-expression is also of type SafeExpr t.

What Stumped Me: Equality

Typechecking if-expressions is where things get interesting. First, we want to require that the condition of the expression evaluates to a boolean. For this, we can write a function requireBool, that takes a dependent pair produced by typecheck. This function does one of two things:

  • If the dependent pair contains a BoolType, and therefore also an expression of type SafeExpr BoolType, requireBool succeeds, and returns the expression.
  • If the dependent pair contains any type other than BoolType, requireBool fails with an error message. Since we're using Either for error handling, this amounts to using the Left constructor.

Such a function is quite easy to write:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 58 60 >}}

We can then write all of the recursive calls to typecheck as follows:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 71 75 >}}

Alright, so we have the types of the t and e branches. All we have to do now is use (==). We could implement (==) as follows:

implementation Eq ExprType where
  IntType == IntType = True
  BoolType == BoolType = True
  StringType == StringType = True
  _ == _ = False

Now we're golden, right? We can just write the following:

76 77 78 if tt == et then pure (_ ** IfThenElse ce te ee) else Left "Incompatible branch types."

No, this is not quire right. Idris complains:

Type mismatch between et and tt

Huh? But we just saw that et == tt! What's the problem? The problem is, in fact, that (==) is meaningless as far as the Idris typechecker is concerned. We could have just as well written,

implementation Eq ExprType where
  _ == _ = True

This would tell us that IntType == BoolType. But of course, SafeExpr IntType is not the same as SafeExpr BoolType; I would be very worried if the typechecker allowed me to assert otherwise. There is, however, a kind of equality that we can use to convince the Idris typechecker that two types are the same. This equality, too, is a type.

Curry-Howard Correspondence

Spend enough time learning about Programming Language Theory, and you will hear the term Curry Howard Correspondence. If you're the paper kind of person, I suggest reading Philip Wadler's Propositions as Types paper. Alternatively, you can take a look at Logical Foundations' Proof Objects chapter. I will give a very brief explanation here, too, for the sake of completeness. The general gist is as follows: propositions (the logical kind) correspond to program types, and proofs of the propositions correspond to values of the types.

To get settled into this idea, let's look at a few 'well-known' examples:

  • (A,B), the tuple of two types A and B is equivalent to the proposition \(A \land B\), which means \(A\) and \(B\). Intuitively, to provide a proof of \(A \land B\), we have to provide the proofs of \(A\) and \(B\).
  • Either A B, which contains one of A or B, is equivalent to the proposition \(A \lor B\), which means \(A\) or \(B\). Intuitively, to provide a proof that either \(A\) or \(B\) is true, we need to provide one of them.
  • A -> B, the type of a function from A to B, is equivalent to the proposition \(A \rightarrow B\), which reads \(A\) implies \(B\). We can think of a function A -> B as creating a proof of B given a proof of A.

Now, consider Idris' unit type ():

data () = ()

This type takes no arguments, and there's only one way to construct it. We can create a value of type () at any time, by just writing (). This type is equivalent to \(\text{true}\): only one proof of it exists, and it requires no premises. It just is.

Consider also the type Void, which too is present in Idris:

-- Note: this is probably not valid code.
data Void = -- Nothing

The type Void has no constructors: it's impossible to create a value of this type, and therefore, it's impossible to create a proof of Void. Thus, as you may have guessed, Void is equivalent to \(\text{false}\).

Finally, we get to a more complicated example:

data (=) : a -> b -> Type where
   Refl : x = x

This defines a = b as a type, equivalent to the proposition that a is equal to b. The only way to construct such a type is to give it a single value x, creating the proof that x = x. This makes sense: equality is reflexive.

This definition isn't some loosey-goosey boolean-based equality! If we can construct a value of type a = b, we can prove to Idris' typechecker that a and b are equivalent. In fact, Idris' standard library gives us the following function:

replace : {a:_} -> {x:_} -> {y:_} -> {P : a -> Type} -> x = y -> P x -> P y

This reads, given a type a, and values x and y of type a, if we know that x = y, then we can rewrite any proposition in terms of x into another, also valid proposition in terms of y. Let's make this concrete. Suppose a is Int, and P (the type of which is now Int -> Type), is Even, a proposition that claims that its argument is even. {{< sidenote "right" "specialize-note" "Then, we have:" >}} I'm only writing type signatures for replace' to avoid overloading. There's no need to define a new function; replace' is just a specialization of replace, so we can use the former anywhere we can use the latter. {{< /sidenote >}}

replace' : {x : Int} -> {y : Int} -> x = y -> Even x -> Even y

That is, if we know that x is equal to y, and we know that x is even, it follows that y is even too. After all, they're one and the same! We can take this further. Recall:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 44 44 >}}

We can therefore write:

replace'' : {x : ExprType} -> {y : ExprType} -> x = y -> SafeExpr x -> SafeExpr y

This is exactly what we want! Given a proof that one ExprType, x, is equal to another ExprType, y, we can safely convert SafeExpr x to SafeExpr y. We will use this to convince the Idris typechecker to accept our program.

First Attempt: Eq implies Equality

It's pretty trivial to see that we did define (==) correctly (IntType is equal to IntType, StringType is equal to StringType, and so on). Thus, if we know that x == y is True, it should follow that x = y. We can thus define the following proposition:

eqCorrect : {a : ExprType} -> {b : ExprType} -> (a == b = True) -> a = b

We will see shortly why this is not the best solution, and thus, I won't bother creating a proof / implementation for this proposition / function. It reads:

If we have a proof that (==) returned true for some ExprTypes a and b, it must be that a is the same as b.

We can then define a function to cast a SafeExpr a to SafeExpr b, given that (==) returned True for some a and b:

safeCast : {a : ExprType} -> {b : ExprType} -> (a == b = True) -> SafeExpr a -> SafeExpr b
safeCast h e = replace (eqCorrect h) e

Awesome! All that's left now is to call safeCast from our typecheck function:

76 77 78 if tt == et then pure (_ ** IfThenElse ce te (safeCast ?uhOh ee)) else Left "Incompatible branch types."

No, this doesn't work after all. What do we put for ?uhOh? We need to have a value of type tt == et = True, but we don't have one. Idris' own if-then-else expressions do not provide us with such proofs about their conditions. The awesome people at #idris pointed out that the with clause can provide such a proof. We could therefore write:

createIfThenElse ce (tt ** et) (et ** ee) with (et == tt) proof p
  | True = pure (tt ** IfThenElse ce te (safeCast p ee))
  | False = Left "Incompatible branch types."

Here, the with clause effectively adds another argument equal to (et == tt) to createIfThenElse, and tries to pattern match on its value. When we combine this with the proof keyword, Idris will give us a handle to a proof, named p, that asserts the new argument evaluates to the value in the pattern match. In our case, this is exactly the proof we need to give to safeCast.

However, this is ugly. Idris' with clause only works at the top level of a function, so we have to define a function just to use it. It also shows that we're losing information when we call (==), and we have to reconstruct or recapture it using some other means.

Second Attempt: Decidable Propositions

More awesome folks over at #idris pointed out that the whole deal with (==) is inelegant; they suggested I use decidable propositions, using the Dec type. The type is defined as follows:

data Dec : Type -> Type where
  Yes : (prf : prop) -> Dec prop
  No : (contra : prop -> Void) -> Dec prop

There are two ways to construct a value of type Dec prop:

  • We use the Yes constructor, which means that the proposition prop is true. To use this constructor, we have to give it a proof of prop, called prf in the constructor.
  • We use the No constructor, which means that the proposition prop is false. We need a proof of type prop -> Void to represent this: if we have a proof of prop, we arrive at a contradiction.

This combines the nice True and False of Bool, with the 'real' proofs of the truthfulness or falsity. At the moment that we would have been creating a boolean, we also create a proof of that boolean's value. Thus, we don't lose information. Here's how we can go about this:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 20 29 >}}

We pattern match on the input expression types. If the types are the same, we return Yes, and couple it with Refl (since we've pattern matched on the types in the left-hand side of the function definition, the typechecker has enough information to create that Refl). On the other hand, if the expression types do not match, we have to provide a proof that their equality would be absurd. For this we use helper functions / theorems like intBoolImpossible:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 11 12 >}}

I'm not sure if there's a better way of doing this than using impossible. This does the job, though: Idris understands that there's no way we can get an input of type IntType = BoolType, and allows us to skip writing a right-hand side.

We can finally use this new decEq function in our type checker:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 76 78 >}}

Idris is happy with this! We should also add IfThenElse to our eval function. This part is very easy:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 80 85 >}}

Since the c part of the IfThenElse is indexed with BoolType, we know that evaluating it will give us a boolean. Thus, we can use that directly in the Idris if-then-else expression. Let's try this with a few expressions:

BinOp Add (IfElse (BoolLit True) (IntLit 6) (IntLit 7)) (BinOp Multiply (IntLit 160) (IntLit 2))

This evaluates 326, as it should. What if we make the condition non-boolean?

BinOp Add (IfElse (IntLit 1) (IntLit 6) (IntLit 7)) (BinOp Multiply (IntLit 160) (IntLit 2))

Our typechecker catches this, and we end up with the following output:

Type error: Not a boolean.

Alright, let's make one of the branches of the if-expression be a boolean, while the other remains an integer.

BinOp Add (IfElse (BoolLit True) (BoolLit True) (IntLit 7)) (BinOp Multiply (IntLit 160) (IntLit 2))

Our typechecker catches this, too:

Type error: Incompatible branch types.

Conclusion

I think this is a good approach. Should we want to add more types to our language, such as tuples, lists, and so on, we will be able to extend our decEq approach to construct more complex equality proofs, and keep the typecheck method the same. Had we not used this approach, and instead decided to pattern match on types inside of typecheck, we would've quickly found that this only works for languages with finitely many types. When we add polymorphic tuples and lists, we start being able to construct an arbitrary number of types: [a]. [[a]], and so on. Then, we cease to be able to enumerate all possible pairs of types, and require a recursive solution. I think that this leads us back to decEq.

I also hope that I've now redeemed myself as far as logical arguments go. We used dependent types and made our typechecking function save us from error-checking during evaluation. We did this without having to manually create different types of expressions like ArithExpr and BoolExpr, and without having to duplicate any code.

That's all I have for today, thank you for reading! As always, you can check out the full source code for the typechecker and interpreter on my Git server.