From 1f734a613cdb680538b27a830f642bcdf1815d16 Mon Sep 17 00:00:00 2001 From: Danila Fedorin Date: Sun, 19 Jul 2020 22:56:44 -0700 Subject: [PATCH] Add the second part of the typechecking post. --- code/typesafe-interpreter/TypesafeIntrV2.idr | 2 +- .../blog/typesafe_interpreter_revisited.md | 373 ++++++++++++++++++ 2 files changed, 374 insertions(+), 1 deletion(-) create mode 100644 content/blog/typesafe_interpreter_revisited.md diff --git a/code/typesafe-interpreter/TypesafeIntrV2.idr b/code/typesafe-interpreter/TypesafeIntrV2.idr index ddaa9d5..5888445 100644 --- a/code/typesafe-interpreter/TypesafeIntrV2.idr +++ b/code/typesafe-interpreter/TypesafeIntrV2.idr @@ -46,7 +46,7 @@ data SafeExpr : ExprType -> Type where BoolLiteral : Bool -> SafeExpr BoolType StringLiteral : String -> SafeExpr StringType BinOperation : (repr a -> repr b -> repr c) -> SafeExpr a -> SafeExpr b -> SafeExpr c - IfThenElse : { t : ExprType } -> SafeExpr BoolType -> SafeExpr t -> SafeExpr t -> SafeExpr t + IfThenElse : SafeExpr BoolType -> SafeExpr t -> SafeExpr t -> SafeExpr t typecheckOp : Op -> (a : ExprType) -> (b : ExprType) -> Either String (c : ExprType ** repr a -> repr b -> repr c) typecheckOp Add IntType IntType = Right (IntType ** (+)) diff --git a/content/blog/typesafe_interpreter_revisited.md b/content/blog/typesafe_interpreter_revisited.md new file mode 100644 index 0000000..fe38bc7 --- /dev/null +++ b/content/blog/typesafe_interpreter_revisited.md @@ -0,0 +1,373 @@ +--- +title: Meaningfully Typechecking a Language in Idris, Revisited +date: 2020-07-19T17:19:02-07:00 +draft: true +tags: ["Idris"] +--- + +Some time ago, I wrote a post titled [Meaningfully Typechecking a Language in Idris]({{< relref "typesafe_interpreter.md" >}}). The gist of the post was as follows: + +* _Programming Language Fundamentals_ students were surprised that, despite +having run their expression through typechecking, they still had to +have a `Maybe` type in their evaluation functions. This was due to +the fact that the type system was not certain that typechecking worked. +* A potential solution was to write separate expression types such +as `ArithExpr` and `BoolExpr`, which are known to produce booleans +or integers. However, this required the re-implementation of most +of the logic for `IfElse`, for which the branches could have integers, +booleans, or strings. +* An alternative solution was to use dependent types, and index +the `Expr` type with the type it evaluates to. We defined a data type +`data ExprType = IntType | StringType | BoolType`, and then were able +to write types like `SafeExpr IntType` that we _knew_ would evaluate +to an integer, or `SafeExpr BoolType`, which we also _knew_ would +evaluate to a boolean. We then made our `typecheck` function +return a pair of `(type, SafeExpr of that type)`. + +Unfortunately, I think that post is rather incomplete. I noted +at the end of it that I was not certain on how to implement +if-expressions, which were my primary motivation for not just +sticking with `ArithExpr` and `BoolExpr`. It didn't seem too severe +then, but now I just feel like a charlatan. Today, I decided to try +again, and managed to figure it out with the excellent help from +people in the `#idris` channel on Freenode. It required a more +advanced use of dependent types: in particular, I ended up using +Idris' theorem proving facilities to get my code to pass typechecking. +In this post, I will continue from where we left off in the previous +post, adding support for if-expressions. + +Let's start with the new `Expr` and `SafeExpr` types. Here they are: + +{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 37 49 >}} + +For `Expr`, the `IfElse` constructor is very straightforward. It takes +three expressions: the condition, the 'then' branch, and the 'else' branch. +With `SafeExpr` and `IfThneElse`, things are more rigid. The condition +of the expression has to be of a boolean type, so we make the first argument +`SafeExpr BoolType`. Also, the two branches of the if-expression have to +be of the same type. We encode this by making both of the input expressions +be of type `SafeExpr t`. Since the result of the if-expression will be +the output of one of the branches, the whole if-expression is also +of type `SafeExpr t`. + +### What Stumped Me: Equality +Typechecking if-expressions is where things get interesting. First, +we want to require that the condition of the expression evaluates +to a boolean. For this, we can write a function `requireBool`, +that takes a dependent pair produced by `typecheck`. This +function does one of two things: + +* If the dependent pair contains a `BoolType`, and therefore also an expression +of type `SafeExpr BoolType`, `requireBool` succeeds, and returns the expression. +* If the dependent pair contains any type other than `BoolType`, `requireBool` +fails with an error message. Since we're using `Either` for error handling, +this amounts to using the `Left` constructor. + +Such a function is quite easy to write: + +{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 58 60 >}} + +We can then write all of the recursive calls to `typecheck` as follows: + +{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 71 75 >}} + +Alright, so we have the types of the `t` and `e` branches. All we have to +do now is use `(==)`. We could implement `(==)` as follows: + +```Idris +implementation Eq ExprType where + IntType == IntType = True + BoolType == BoolType = True + StringType == StringType = True + _ == _ = False +``` + +Now we're golden, right? We can just write the following: + +```Idris {linenos=table, linenostart=76} + if tt == et + then pure (_ ** IfThenElse ce te ee) + else Left "Incompatible branch types." +``` + +No, this is not quire right. Idris complains: + +``` +Type mismatch between et and tt +``` + +Huh? But we just saw that `et == tt`! What's the problem? +The problem is, in fact, that `(==)` is meaningless as far +as the Idris typechecker is concerned. We could have just +as well written, + +```Idris +implementation Eq ExprType where + _ == _ = True +``` + +This would tell us that `IntType == BoolType`. But of course, +`SafeExpr IntType` is not the same as `SafeExpr BoolType`; I +would be very worried if the typechecker allowed me to assert +otherwise. There is, however, a kind of equality that we can +use to convince the Idris typechecker that two types are the +same. This equality, too, is a type. + +### Curry-Howard Correspondence +Spend enough time learning about Programming Language Theory, and +you will hear the term _Curry Howard Correspondence_. If you're +the paper kind of person, I suggest reading Philip Wadler's +_Propositions as Types_ paper. Alternatively, you can take a look +at _Logical Foundations_' [Proof Objects](https://softwarefoundations.cis.upenn.edu/lf-current/ProofObjects.html) +chapter. I will give a very brief +explanation here, too, for the sake of completeness. The general +gist is as follows: __propositions (the logical kind) correspond +to program types__, and proofs of the propositions correspond +to values of the types. + +To get settled into this idea, let's look at a few 'well-known' examples: + +* `(A,B)`, the tuple of two types `A` and `B` is equivalent to the +proposition \\(A \land B\\), which means \\(A\\) and \\(B\\). Intuitively, +to provide a proof of \\(A \land B\\), we have to provide the proofs of +\\(A\\) and \\(B\\). +* `Either A B`, which contains one of `A` or `B`, is equivalent +to the proposition \\(A \lor B\\), which means \\(A\\) or \\(B\\). +Intuitively, to provide a proof that either \\(A\\) or \\(B\\) +is true, we need to provide one of them. +* `A -> B`, the type of a function from `A` to `B`, is equivalent +to the proposition \\(A \rightarrow B\\), which reads \\(A\\) +implies \\(B\\). We can think of a function `A -> B` as creating +a proof of `B` given a proof of `A`. + +Now, consider Idris' unit type `()`: + +```Idris +data () = () +``` + +This type takes no arguments, and there's only one way to construct +it. We can create a value of type `()` at any time, by just writing `()`. +This type is equivalent to \\(\\text{true}\\): only one proof of it exists, +and it requires no premises. It just is. + +Consider also the type `Void`, which too is present in Idris: + +```Idris +-- Note: this is probably not valid code. +data Void = -- Nothing +``` + +The type `Void` has no constructors: it's impossible +to create a value of this type, and therefore, it's +impossible to create a proof of `Void`. Thus, as you may have guessed, `Void` +is equivalent to \\(\\text{false}\\). + +Finally, we get to a more complicated example: + +```Idris +data (=) : a -> b -> Type where + Refl : x = x +``` + +This defines `a = b` as a type, equivalent to the proposition +that `a` is equal to `b`. The only way to construct such a type +is to give it a single value `x`, creating the proof that `x = x`. +This makes sense: equality is reflexive. + +This definition isn't some loosey-goosey boolean-based equality! If we can construct a value of +type `a = b`, we can prove to Idris' typechecker that `a` and `b` are equivalent. In +fact, Idris' standard library gives us the following function: + +```Idris +replace : {a:_} -> {x:_} -> {y:_} -> {P : a -> Type} -> x = y -> P x -> P y +``` + +This reads, given a type `a`, and values `x` and `y` of type `a`, if we know +that `x = y`, then we can rewrite any proposition in terms of `x` into +another, also valid proposition in terms of `y`. Let's make this concrete. +Suppose `a` is `Int`, and `P` (the type of which is now `Int -> Type`), +is `Even`, a proposition that claims that its argument is even. +{{< sidenote "right" "specialize-note" "Then, we have:" >}} +I'm only writing type signatures for replace' +to avoid overloading. There's no need to define a new function; +replace' is just a specialization of replace, +so we can use the former anywhere we can use the latter. +{{< /sidenote >}} + +```Idris +replace' : {x : Int} -> {y : Int} -> x = y -> Even x -> Even y +``` + +That is, if we know that `x` is equal to `y`, and we know that `x` is even, +it follows that `y` is even too. After all, they're one and the same! +We can take this further. Recall: + +{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 44 44 >}} + +We can therefore write: + +```Idris +replace'' : {x : ExprType} -> {y : ExprType} -> x = y -> SafeExpr x -> SafeExpr y +``` + +This is exactly what we want! Given a proof that one `ExprType`, `x` is equal to +another `ExprType`, `y`, we can safely convert `SafeExpr x` to `SafeExpr y`. +We will use this to convince the Idris typechecker to accept our program. + +### First Attempt: `Eq` implies Equality +It's pretty trivial to see that we _did_ define `(==)` correctly (`IntType` is equal +to `IntType`, `StringType` is equal to `StringType`, and so on). Thus, +if we know that `x == y` is `True`, it should follow that `x = y`. We can thus +define the following proposition: + +```Idris +eqCorrect : {a : ExprType} -> {b : ExprType} -> (a == b = True) -> a = b +``` + +We will see shortly why this is _not_ the best solution, and thus, I won't bother +creating a proof / implementation for this proposition / function. +It reads: + +> If we have a proof that `(==)` returned true for some `ExprType`s `a` and `b`, +it must be that `a` is the same as `b`. + +We can then define a function to cast +a `SafeExpr a` to `SafeExpr b`, given that `(==)` returned `True` for some `a` and `b`: + +```Idris +safeCast : {a : ExprType} -> {b : ExprType} -> (a == b = True) -> SafeExpr a -> SafeExpr b +safeCast h e = replace (eqCorrect h) e +``` + +Awesome! All that's left now is to call `safeCast` from our `typecheck` function: + +```Idris {linenos=table, linenostart=76} + if tt == et + then pure (_ ** IfThenElse ce te (safeCast ?uhOh ee)) + else Left "Incompatible branch types." +``` + +No, this doesn't work after all. What do we put for `?uhOh`? We need to have +a value of type `tt == et = True`, but we don't have one. Idris' own if-then-else +expressions do not provide us with such proofs about their conditions. The awesome +people at `#idris` pointed out that the `with` clause can provide such a proof. +We could therefore write: + +```Idris +createIfThenElse ce (tt ** et) (et ** ee) with (et == tt) proof p + | True = pure (tt ** IfThenElse ce te (safeCast p ee)) + | False = Left "Incompatible branch types." +``` + +Here, the `with` clause effectively adds another argument equal to `(et == tt)` to `createIfThenElse`, +and tries to pattern match on its value. When we combine this with the `proof` keyword, +Idris will give us a handle to a proof, named `p`, that asserts the new argument +evaluates to the value in the pattern match. In our case, this is exactly +the proof we need to give to `safeCast`. + +However, this is ugly. Idris' `with` clause only works at the top level of a function, +so we have to define a function just to use it. It also shows that we're losing +information when we call `(==)`, and we have to reconstruct or recapture it using +some other means. + + +### Second Attempt: Decidable Propositions +More awesome folks over at `#idris` pointed out that the whole deal with `(==)` +is inelegant; they suggested I use __decidable propositions__, using the `Dec` type. +The type is defined as follows: + +```Idris +data Dec : Type -> Type where + Yes : (prf : prop) -> Dec prop + No : (contra : prop -> Void) -> Dec prop +``` + +There are two ways to construct a value of type `Dec prop`: + +* We use the `Yes` constructor, which means that the proposition `prop` +is true. To use this constructor, we have to give it a proof of `prop`, +called `prf` in the constructor. +* We use the `No` constructor, which means that the proposition `prop` +is false. We need a proof of type `prop -> Void` to represent this: +if we have a proof of `prop`, we arrive at a contradiction. + +This combines the nice `True` and `False` of `Bool`, with the +'real' proofs of the truthfulness or falsity. At the moment +that we would have been creating a boolean, we also create +a proof of that boolean's value. Thus, we don't lose information. +Here's how we can go about this: + +{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 20 29 >}} + +We pattern match on the input expression types. If the types are the same, we return +`Yes`, and couple it with `Refl` (since we've pattern matched on the types +in the left-hand side of the function definition, the typechecker has enough +information to create that `Refl`). On the other hand, if the expression types +do not match, we have to provide a proof that their equality would be absurd. +For this we use helper functions / theorems like `intBoolImpossible`: + +{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 11 12 >}} + +I'm not sure if there's a better way of doing this than using `impossible`. +This does the job, though: Idris understands that there's no way we can get +an input of type `IntType = BoolType`, and allows us to skip writing a right-hand side. + +We can finally use this new `decEq` function in our type checker: + +{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 76 78 >}} + +Idris is happy with this! We should also add `IfThenElse` to our `eval` function. +This part is very easy: + +{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 80 85 >}} + +Since the `c` part of the `IfThenElse` is indexed with `BoolType`, we know +that evaluating it will give us a boolean. Thus, we can use that +directly in the Idris if-then-else expression. Let's try this with a few +expressions: + +```Idris +BinOp Add (IfElse (BoolLit True) (IntLit 6) (IntLit 7)) (BinOp Multiply (IntLit 160) (IntLit 2)) +``` + +This evaluates `326`, as it should. What if we make the condition non-boolean? + +```Idris +BinOp Add (IfElse (IntLit 1) (IntLit 6) (IntLit 7)) (BinOp Multiply (IntLit 160) (IntLit 2)) +``` + +Our typechecker catches this, and we end up with the following output: + +``` +Type error: Not a boolean. +``` + +Alright, let's make one of the branches of the if-expression be a boolean, while the +other remains an integer. + +```Idris +BinOp Add (IfElse (BoolLit True) (BoolLit True) (IntLit 7)) (BinOp Multiply (IntLit 160) (IntLit 2)) +``` + +Our typechecker catches this, too: + +``` +Type error: Incompatible branch types. +``` + +### Conclusion +I think this is a good approach. Should we want to add more types to our language, such as tuples, +lists, and so on, we will be able to extend our `decEq` approach to construct more complex equality +proofs, and keep the `typecheck` method the same. Had we not used this approach, +and instead decided to pattern match on types inside of `typecheck`, we would've quickly +found that this only works for types with finitely many values. When we add polymorphic tuples +and lists, we start being able to construct an arbitrary number of types: `[a]`. `[[a]]`, and +so on. Then, we cease to be able to enumerate all possible pairs of types, and require a recursive +solution. I think that this leads us back to `decEq`. + +I also hope that I've now redeemed myself as far as logical arguments go. We used dependent types +and made our typechecking function save us from error-checking during evaluation. We did this +without having to manually create different types of expressions like `ArithExpr` and `BoolExpr`. + +That's all I have for today, thank you for reading!