Add the second part of the typechecking post.
This commit is contained in:
parent
a3c299b057
commit
1f734a613c
|
@ -46,7 +46,7 @@ data SafeExpr : ExprType -> Type where
|
||||||
BoolLiteral : Bool -> SafeExpr BoolType
|
BoolLiteral : Bool -> SafeExpr BoolType
|
||||||
StringLiteral : String -> SafeExpr StringType
|
StringLiteral : String -> SafeExpr StringType
|
||||||
BinOperation : (repr a -> repr b -> repr c) -> SafeExpr a -> SafeExpr b -> SafeExpr c
|
BinOperation : (repr a -> repr b -> repr c) -> SafeExpr a -> SafeExpr b -> SafeExpr c
|
||||||
IfThenElse : { t : ExprType } -> SafeExpr BoolType -> SafeExpr t -> SafeExpr t -> SafeExpr t
|
IfThenElse : SafeExpr BoolType -> SafeExpr t -> SafeExpr t -> SafeExpr t
|
||||||
|
|
||||||
typecheckOp : Op -> (a : ExprType) -> (b : ExprType) -> Either String (c : ExprType ** repr a -> repr b -> repr c)
|
typecheckOp : Op -> (a : ExprType) -> (b : ExprType) -> Either String (c : ExprType ** repr a -> repr b -> repr c)
|
||||||
typecheckOp Add IntType IntType = Right (IntType ** (+))
|
typecheckOp Add IntType IntType = Right (IntType ** (+))
|
||||||
|
|
373
content/blog/typesafe_interpreter_revisited.md
Normal file
373
content/blog/typesafe_interpreter_revisited.md
Normal file
|
@ -0,0 +1,373 @@
|
||||||
|
---
|
||||||
|
title: Meaningfully Typechecking a Language in Idris, Revisited
|
||||||
|
date: 2020-07-19T17:19:02-07:00
|
||||||
|
draft: true
|
||||||
|
tags: ["Idris"]
|
||||||
|
---
|
||||||
|
|
||||||
|
Some time ago, I wrote a post titled [Meaningfully Typechecking a Language in Idris]({{< relref "typesafe_interpreter.md" >}}). The gist of the post was as follows:
|
||||||
|
|
||||||
|
* _Programming Language Fundamentals_ students were surprised that, despite
|
||||||
|
having run their expression through typechecking, they still had to
|
||||||
|
have a `Maybe` type in their evaluation functions. This was due to
|
||||||
|
the fact that the type system was not certain that typechecking worked.
|
||||||
|
* A potential solution was to write separate expression types such
|
||||||
|
as `ArithExpr` and `BoolExpr`, which are known to produce booleans
|
||||||
|
or integers. However, this required the re-implementation of most
|
||||||
|
of the logic for `IfElse`, for which the branches could have integers,
|
||||||
|
booleans, or strings.
|
||||||
|
* An alternative solution was to use dependent types, and index
|
||||||
|
the `Expr` type with the type it evaluates to. We defined a data type
|
||||||
|
`data ExprType = IntType | StringType | BoolType`, and then were able
|
||||||
|
to write types like `SafeExpr IntType` that we _knew_ would evaluate
|
||||||
|
to an integer, or `SafeExpr BoolType`, which we also _knew_ would
|
||||||
|
evaluate to a boolean. We then made our `typecheck` function
|
||||||
|
return a pair of `(type, SafeExpr of that type)`.
|
||||||
|
|
||||||
|
Unfortunately, I think that post is rather incomplete. I noted
|
||||||
|
at the end of it that I was not certain on how to implement
|
||||||
|
if-expressions, which were my primary motivation for not just
|
||||||
|
sticking with `ArithExpr` and `BoolExpr`. It didn't seem too severe
|
||||||
|
then, but now I just feel like a charlatan. Today, I decided to try
|
||||||
|
again, and managed to figure it out with the excellent help from
|
||||||
|
people in the `#idris` channel on Freenode. It required a more
|
||||||
|
advanced use of dependent types: in particular, I ended up using
|
||||||
|
Idris' theorem proving facilities to get my code to pass typechecking.
|
||||||
|
In this post, I will continue from where we left off in the previous
|
||||||
|
post, adding support for if-expressions.
|
||||||
|
|
||||||
|
Let's start with the new `Expr` and `SafeExpr` types. Here they are:
|
||||||
|
|
||||||
|
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 37 49 >}}
|
||||||
|
|
||||||
|
For `Expr`, the `IfElse` constructor is very straightforward. It takes
|
||||||
|
three expressions: the condition, the 'then' branch, and the 'else' branch.
|
||||||
|
With `SafeExpr` and `IfThneElse`, things are more rigid. The condition
|
||||||
|
of the expression has to be of a boolean type, so we make the first argument
|
||||||
|
`SafeExpr BoolType`. Also, the two branches of the if-expression have to
|
||||||
|
be of the same type. We encode this by making both of the input expressions
|
||||||
|
be of type `SafeExpr t`. Since the result of the if-expression will be
|
||||||
|
the output of one of the branches, the whole if-expression is also
|
||||||
|
of type `SafeExpr t`.
|
||||||
|
|
||||||
|
### What Stumped Me: Equality
|
||||||
|
Typechecking if-expressions is where things get interesting. First,
|
||||||
|
we want to require that the condition of the expression evaluates
|
||||||
|
to a boolean. For this, we can write a function `requireBool`,
|
||||||
|
that takes a dependent pair produced by `typecheck`. This
|
||||||
|
function does one of two things:
|
||||||
|
|
||||||
|
* If the dependent pair contains a `BoolType`, and therefore also an expression
|
||||||
|
of type `SafeExpr BoolType`, `requireBool` succeeds, and returns the expression.
|
||||||
|
* If the dependent pair contains any type other than `BoolType`, `requireBool`
|
||||||
|
fails with an error message. Since we're using `Either` for error handling,
|
||||||
|
this amounts to using the `Left` constructor.
|
||||||
|
|
||||||
|
Such a function is quite easy to write:
|
||||||
|
|
||||||
|
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 58 60 >}}
|
||||||
|
|
||||||
|
We can then write all of the recursive calls to `typecheck` as follows:
|
||||||
|
|
||||||
|
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 71 75 >}}
|
||||||
|
|
||||||
|
Alright, so we have the types of the `t` and `e` branches. All we have to
|
||||||
|
do now is use `(==)`. We could implement `(==)` as follows:
|
||||||
|
|
||||||
|
```Idris
|
||||||
|
implementation Eq ExprType where
|
||||||
|
IntType == IntType = True
|
||||||
|
BoolType == BoolType = True
|
||||||
|
StringType == StringType = True
|
||||||
|
_ == _ = False
|
||||||
|
```
|
||||||
|
|
||||||
|
Now we're golden, right? We can just write the following:
|
||||||
|
|
||||||
|
```Idris {linenos=table, linenostart=76}
|
||||||
|
if tt == et
|
||||||
|
then pure (_ ** IfThenElse ce te ee)
|
||||||
|
else Left "Incompatible branch types."
|
||||||
|
```
|
||||||
|
|
||||||
|
No, this is not quire right. Idris complains:
|
||||||
|
|
||||||
|
```
|
||||||
|
Type mismatch between et and tt
|
||||||
|
```
|
||||||
|
|
||||||
|
Huh? But we just saw that `et == tt`! What's the problem?
|
||||||
|
The problem is, in fact, that `(==)` is meaningless as far
|
||||||
|
as the Idris typechecker is concerned. We could have just
|
||||||
|
as well written,
|
||||||
|
|
||||||
|
```Idris
|
||||||
|
implementation Eq ExprType where
|
||||||
|
_ == _ = True
|
||||||
|
```
|
||||||
|
|
||||||
|
This would tell us that `IntType == BoolType`. But of course,
|
||||||
|
`SafeExpr IntType` is not the same as `SafeExpr BoolType`; I
|
||||||
|
would be very worried if the typechecker allowed me to assert
|
||||||
|
otherwise. There is, however, a kind of equality that we can
|
||||||
|
use to convince the Idris typechecker that two types are the
|
||||||
|
same. This equality, too, is a type.
|
||||||
|
|
||||||
|
### Curry-Howard Correspondence
|
||||||
|
Spend enough time learning about Programming Language Theory, and
|
||||||
|
you will hear the term _Curry Howard Correspondence_. If you're
|
||||||
|
the paper kind of person, I suggest reading Philip Wadler's
|
||||||
|
_Propositions as Types_ paper. Alternatively, you can take a look
|
||||||
|
at _Logical Foundations_' [Proof Objects](https://softwarefoundations.cis.upenn.edu/lf-current/ProofObjects.html)
|
||||||
|
chapter. I will give a very brief
|
||||||
|
explanation here, too, for the sake of completeness. The general
|
||||||
|
gist is as follows: __propositions (the logical kind) correspond
|
||||||
|
to program types__, and proofs of the propositions correspond
|
||||||
|
to values of the types.
|
||||||
|
|
||||||
|
To get settled into this idea, let's look at a few 'well-known' examples:
|
||||||
|
|
||||||
|
* `(A,B)`, the tuple of two types `A` and `B` is equivalent to the
|
||||||
|
proposition \\(A \land B\\), which means \\(A\\) and \\(B\\). Intuitively,
|
||||||
|
to provide a proof of \\(A \land B\\), we have to provide the proofs of
|
||||||
|
\\(A\\) and \\(B\\).
|
||||||
|
* `Either A B`, which contains one of `A` or `B`, is equivalent
|
||||||
|
to the proposition \\(A \lor B\\), which means \\(A\\) or \\(B\\).
|
||||||
|
Intuitively, to provide a proof that either \\(A\\) or \\(B\\)
|
||||||
|
is true, we need to provide one of them.
|
||||||
|
* `A -> B`, the type of a function from `A` to `B`, is equivalent
|
||||||
|
to the proposition \\(A \rightarrow B\\), which reads \\(A\\)
|
||||||
|
implies \\(B\\). We can think of a function `A -> B` as creating
|
||||||
|
a proof of `B` given a proof of `A`.
|
||||||
|
|
||||||
|
Now, consider Idris' unit type `()`:
|
||||||
|
|
||||||
|
```Idris
|
||||||
|
data () = ()
|
||||||
|
```
|
||||||
|
|
||||||
|
This type takes no arguments, and there's only one way to construct
|
||||||
|
it. We can create a value of type `()` at any time, by just writing `()`.
|
||||||
|
This type is equivalent to \\(\\text{true}\\): only one proof of it exists,
|
||||||
|
and it requires no premises. It just is.
|
||||||
|
|
||||||
|
Consider also the type `Void`, which too is present in Idris:
|
||||||
|
|
||||||
|
```Idris
|
||||||
|
-- Note: this is probably not valid code.
|
||||||
|
data Void = -- Nothing
|
||||||
|
```
|
||||||
|
|
||||||
|
The type `Void` has no constructors: it's impossible
|
||||||
|
to create a value of this type, and therefore, it's
|
||||||
|
impossible to create a proof of `Void`. Thus, as you may have guessed, `Void`
|
||||||
|
is equivalent to \\(\\text{false}\\).
|
||||||
|
|
||||||
|
Finally, we get to a more complicated example:
|
||||||
|
|
||||||
|
```Idris
|
||||||
|
data (=) : a -> b -> Type where
|
||||||
|
Refl : x = x
|
||||||
|
```
|
||||||
|
|
||||||
|
This defines `a = b` as a type, equivalent to the proposition
|
||||||
|
that `a` is equal to `b`. The only way to construct such a type
|
||||||
|
is to give it a single value `x`, creating the proof that `x = x`.
|
||||||
|
This makes sense: equality is reflexive.
|
||||||
|
|
||||||
|
This definition isn't some loosey-goosey boolean-based equality! If we can construct a value of
|
||||||
|
type `a = b`, we can prove to Idris' typechecker that `a` and `b` are equivalent. In
|
||||||
|
fact, Idris' standard library gives us the following function:
|
||||||
|
|
||||||
|
```Idris
|
||||||
|
replace : {a:_} -> {x:_} -> {y:_} -> {P : a -> Type} -> x = y -> P x -> P y
|
||||||
|
```
|
||||||
|
|
||||||
|
This reads, given a type `a`, and values `x` and `y` of type `a`, if we know
|
||||||
|
that `x = y`, then we can rewrite any proposition in terms of `x` into
|
||||||
|
another, also valid proposition in terms of `y`. Let's make this concrete.
|
||||||
|
Suppose `a` is `Int`, and `P` (the type of which is now `Int -> Type`),
|
||||||
|
is `Even`, a proposition that claims that its argument is even.
|
||||||
|
{{< sidenote "right" "specialize-note" "Then, we have:" >}}
|
||||||
|
I'm only writing type signatures for <code>replace'</code>
|
||||||
|
to avoid overloading. There's no need to define a new function;
|
||||||
|
<code>replace'</code> is just a specialization of <code>replace</code>,
|
||||||
|
so we can use the former anywhere we can use the latter.
|
||||||
|
{{< /sidenote >}}
|
||||||
|
|
||||||
|
```Idris
|
||||||
|
replace' : {x : Int} -> {y : Int} -> x = y -> Even x -> Even y
|
||||||
|
```
|
||||||
|
|
||||||
|
That is, if we know that `x` is equal to `y`, and we know that `x` is even,
|
||||||
|
it follows that `y` is even too. After all, they're one and the same!
|
||||||
|
We can take this further. Recall:
|
||||||
|
|
||||||
|
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 44 44 >}}
|
||||||
|
|
||||||
|
We can therefore write:
|
||||||
|
|
||||||
|
```Idris
|
||||||
|
replace'' : {x : ExprType} -> {y : ExprType} -> x = y -> SafeExpr x -> SafeExpr y
|
||||||
|
```
|
||||||
|
|
||||||
|
This is exactly what we want! Given a proof that one `ExprType`, `x` is equal to
|
||||||
|
another `ExprType`, `y`, we can safely convert `SafeExpr x` to `SafeExpr y`.
|
||||||
|
We will use this to convince the Idris typechecker to accept our program.
|
||||||
|
|
||||||
|
### First Attempt: `Eq` implies Equality
|
||||||
|
It's pretty trivial to see that we _did_ define `(==)` correctly (`IntType` is equal
|
||||||
|
to `IntType`, `StringType` is equal to `StringType`, and so on). Thus,
|
||||||
|
if we know that `x == y` is `True`, it should follow that `x = y`. We can thus
|
||||||
|
define the following proposition:
|
||||||
|
|
||||||
|
```Idris
|
||||||
|
eqCorrect : {a : ExprType} -> {b : ExprType} -> (a == b = True) -> a = b
|
||||||
|
```
|
||||||
|
|
||||||
|
We will see shortly why this is _not_ the best solution, and thus, I won't bother
|
||||||
|
creating a proof / implementation for this proposition / function.
|
||||||
|
It reads:
|
||||||
|
|
||||||
|
> If we have a proof that `(==)` returned true for some `ExprType`s `a` and `b`,
|
||||||
|
it must be that `a` is the same as `b`.
|
||||||
|
|
||||||
|
We can then define a function to cast
|
||||||
|
a `SafeExpr a` to `SafeExpr b`, given that `(==)` returned `True` for some `a` and `b`:
|
||||||
|
|
||||||
|
```Idris
|
||||||
|
safeCast : {a : ExprType} -> {b : ExprType} -> (a == b = True) -> SafeExpr a -> SafeExpr b
|
||||||
|
safeCast h e = replace (eqCorrect h) e
|
||||||
|
```
|
||||||
|
|
||||||
|
Awesome! All that's left now is to call `safeCast` from our `typecheck` function:
|
||||||
|
|
||||||
|
```Idris {linenos=table, linenostart=76}
|
||||||
|
if tt == et
|
||||||
|
then pure (_ ** IfThenElse ce te (safeCast ?uhOh ee))
|
||||||
|
else Left "Incompatible branch types."
|
||||||
|
```
|
||||||
|
|
||||||
|
No, this doesn't work after all. What do we put for `?uhOh`? We need to have
|
||||||
|
a value of type `tt == et = True`, but we don't have one. Idris' own if-then-else
|
||||||
|
expressions do not provide us with such proofs about their conditions. The awesome
|
||||||
|
people at `#idris` pointed out that the `with` clause can provide such a proof.
|
||||||
|
We could therefore write:
|
||||||
|
|
||||||
|
```Idris
|
||||||
|
createIfThenElse ce (tt ** et) (et ** ee) with (et == tt) proof p
|
||||||
|
| True = pure (tt ** IfThenElse ce te (safeCast p ee))
|
||||||
|
| False = Left "Incompatible branch types."
|
||||||
|
```
|
||||||
|
|
||||||
|
Here, the `with` clause effectively adds another argument equal to `(et == tt)` to `createIfThenElse`,
|
||||||
|
and tries to pattern match on its value. When we combine this with the `proof` keyword,
|
||||||
|
Idris will give us a handle to a proof, named `p`, that asserts the new argument
|
||||||
|
evaluates to the value in the pattern match. In our case, this is exactly
|
||||||
|
the proof we need to give to `safeCast`.
|
||||||
|
|
||||||
|
However, this is ugly. Idris' `with` clause only works at the top level of a function,
|
||||||
|
so we have to define a function just to use it. It also shows that we're losing
|
||||||
|
information when we call `(==)`, and we have to reconstruct or recapture it using
|
||||||
|
some other means.
|
||||||
|
|
||||||
|
|
||||||
|
### Second Attempt: Decidable Propositions
|
||||||
|
More awesome folks over at `#idris` pointed out that the whole deal with `(==)`
|
||||||
|
is inelegant; they suggested I use __decidable propositions__, using the `Dec` type.
|
||||||
|
The type is defined as follows:
|
||||||
|
|
||||||
|
```Idris
|
||||||
|
data Dec : Type -> Type where
|
||||||
|
Yes : (prf : prop) -> Dec prop
|
||||||
|
No : (contra : prop -> Void) -> Dec prop
|
||||||
|
```
|
||||||
|
|
||||||
|
There are two ways to construct a value of type `Dec prop`:
|
||||||
|
|
||||||
|
* We use the `Yes` constructor, which means that the proposition `prop`
|
||||||
|
is true. To use this constructor, we have to give it a proof of `prop`,
|
||||||
|
called `prf` in the constructor.
|
||||||
|
* We use the `No` constructor, which means that the proposition `prop`
|
||||||
|
is false. We need a proof of type `prop -> Void` to represent this:
|
||||||
|
if we have a proof of `prop`, we arrive at a contradiction.
|
||||||
|
|
||||||
|
This combines the nice `True` and `False` of `Bool`, with the
|
||||||
|
'real' proofs of the truthfulness or falsity. At the moment
|
||||||
|
that we would have been creating a boolean, we also create
|
||||||
|
a proof of that boolean's value. Thus, we don't lose information.
|
||||||
|
Here's how we can go about this:
|
||||||
|
|
||||||
|
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 20 29 >}}
|
||||||
|
|
||||||
|
We pattern match on the input expression types. If the types are the same, we return
|
||||||
|
`Yes`, and couple it with `Refl` (since we've pattern matched on the types
|
||||||
|
in the left-hand side of the function definition, the typechecker has enough
|
||||||
|
information to create that `Refl`). On the other hand, if the expression types
|
||||||
|
do not match, we have to provide a proof that their equality would be absurd.
|
||||||
|
For this we use helper functions / theorems like `intBoolImpossible`:
|
||||||
|
|
||||||
|
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 11 12 >}}
|
||||||
|
|
||||||
|
I'm not sure if there's a better way of doing this than using `impossible`.
|
||||||
|
This does the job, though: Idris understands that there's no way we can get
|
||||||
|
an input of type `IntType = BoolType`, and allows us to skip writing a right-hand side.
|
||||||
|
|
||||||
|
We can finally use this new `decEq` function in our type checker:
|
||||||
|
|
||||||
|
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 76 78 >}}
|
||||||
|
|
||||||
|
Idris is happy with this! We should also add `IfThenElse` to our `eval` function.
|
||||||
|
This part is very easy:
|
||||||
|
|
||||||
|
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 80 85 >}}
|
||||||
|
|
||||||
|
Since the `c` part of the `IfThenElse` is indexed with `BoolType`, we know
|
||||||
|
that evaluating it will give us a boolean. Thus, we can use that
|
||||||
|
directly in the Idris if-then-else expression. Let's try this with a few
|
||||||
|
expressions:
|
||||||
|
|
||||||
|
```Idris
|
||||||
|
BinOp Add (IfElse (BoolLit True) (IntLit 6) (IntLit 7)) (BinOp Multiply (IntLit 160) (IntLit 2))
|
||||||
|
```
|
||||||
|
|
||||||
|
This evaluates `326`, as it should. What if we make the condition non-boolean?
|
||||||
|
|
||||||
|
```Idris
|
||||||
|
BinOp Add (IfElse (IntLit 1) (IntLit 6) (IntLit 7)) (BinOp Multiply (IntLit 160) (IntLit 2))
|
||||||
|
```
|
||||||
|
|
||||||
|
Our typechecker catches this, and we end up with the following output:
|
||||||
|
|
||||||
|
```
|
||||||
|
Type error: Not a boolean.
|
||||||
|
```
|
||||||
|
|
||||||
|
Alright, let's make one of the branches of the if-expression be a boolean, while the
|
||||||
|
other remains an integer.
|
||||||
|
|
||||||
|
```Idris
|
||||||
|
BinOp Add (IfElse (BoolLit True) (BoolLit True) (IntLit 7)) (BinOp Multiply (IntLit 160) (IntLit 2))
|
||||||
|
```
|
||||||
|
|
||||||
|
Our typechecker catches this, too:
|
||||||
|
|
||||||
|
```
|
||||||
|
Type error: Incompatible branch types.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Conclusion
|
||||||
|
I think this is a good approach. Should we want to add more types to our language, such as tuples,
|
||||||
|
lists, and so on, we will be able to extend our `decEq` approach to construct more complex equality
|
||||||
|
proofs, and keep the `typecheck` method the same. Had we not used this approach,
|
||||||
|
and instead decided to pattern match on types inside of `typecheck`, we would've quickly
|
||||||
|
found that this only works for types with finitely many values. When we add polymorphic tuples
|
||||||
|
and lists, we start being able to construct an arbitrary number of types: `[a]`. `[[a]]`, and
|
||||||
|
so on. Then, we cease to be able to enumerate all possible pairs of types, and require a recursive
|
||||||
|
solution. I think that this leads us back to `decEq`.
|
||||||
|
|
||||||
|
I also hope that I've now redeemed myself as far as logical arguments go. We used dependent types
|
||||||
|
and made our typechecking function save us from error-checking during evaluation. We did this
|
||||||
|
without having to manually create different types of expressions like `ArithExpr` and `BoolExpr`.
|
||||||
|
|
||||||
|
That's all I have for today, thank you for reading!
|
Loading…
Reference in New Issue
Block a user