diff --git a/code/typesafe-interpreter/TypesafeIntr.idr b/code/typesafe-interpreter/TypesafeIntr.idr index f6ec46e..a12d82d 100644 --- a/code/typesafe-interpreter/TypesafeIntr.idr +++ b/code/typesafe-interpreter/TypesafeIntr.idr @@ -43,7 +43,7 @@ typecheck (BinOp o l r) = do (ot ** f) <- typecheckOp o lt rt pure (_ ** BinOperation f le re) -eval : {t : ExprType} -> SafeExpr t -> repr t +eval : SafeExpr t -> repr t eval (IntLiteral i) = i eval (BoolLiteral b) = b eval (StringLiteral s) = s @@ -58,7 +58,7 @@ tryEval : Expr -> String tryEval ex = case typecheck ex of Left err => "Type error: " ++ err - Right (t ** e) => resultStr $ eval {t=t} e + Right (t ** e) => resultStr $ eval {t} e main : IO () main = putStrLn $ tryEval $ BinOp Add (IntLit 6) (BinOp Multiply (IntLit 160) (IntLit 2)) diff --git a/content/blog/typesafe_interpreter.md b/content/blog/typesafe_interpreter.md index a19aa1a..acadd9e 100644 --- a/content/blog/typesafe_interpreter.md +++ b/content/blog/typesafe_interpreter.md @@ -87,10 +87,10 @@ a more safe `Expr'`: typecheck :: Expr -> Either TypeError (Expr', ExprType) ``` -However, we typically also want the branches of an if expression to both have the same +However, we typically also want the branches of an if-expression to both have the same type - `if x then 3 else False` may work sometimes, but not always, depending of the value of `x`. How do we encode this? Do we have two constructors, `IfElseBool` and -`IfElseInt`, with one `BoolExpr` and the other in `ArithExpr`? What if we add strings? +`IfElseInt`, with one in `BoolExpr` and the other in `ArithExpr`? What if we add strings? We'll be copying functionality back and forth, and our code will suffer. Wouldn't it be nice if we could somehow tag our expressions with the type they produce? Instead of `BoolExpr` and `ArithExpr`, we would be able to have `Expr BoolType` and `Expr IntType`, @@ -149,3 +149,113 @@ the actual type that `eval` would yield when running our expression: The power of dependent types allows us to run `repr` inside the type of `BinOp` to compute the type of the function it must accept. + +Now, we have a way to represent expressions that are guaranteed to be type safe. +With this, we can make our `typeheck` function convert an `Expr` to a `SafeExpr`. +Wait a minute, though! We can't _just_ return `SafeExpr`: it's a type constructor! +We need to somehow return `SafeExpr a`, where `a` is a value of type `ExprType`. But +it doesn't make sense for the return type to have a new type variable that didn't +occur in the rest of the type signature. It would be ideal if we could return both +the type of the expression, and a `SafeExpr` of that type. + +In fact, we can! + +Idris has something called _dependent pairs_, which are like normal pairs, but +in which the type of the second element depends on the value of the first element. +The canonical example of this is a pair of (list length, list of that many elements). +For instance, in Idris, we can write: + +```Idris +listPair : (n : Nat ** Vec n Int) +``` + +In the above snippet, we declare the type for a pair of a natural number (`n : Nat`) and a list of +integers (`Vect n Int`), where the number of elements in the list is equal to the natural number. Let's +try applying this to our problem. We want to return an `ExprType`, and a `SafeExpr` which +depends on that `ExprType`. How about this: + +{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 36 36 >}} + +Given an expression, we return either an error (`String`) or a dependent pair, which +contains some `ExprType` `n` and a `SafeExpr` that evaluates to a value of type `n`. +We can even start implementing this function, starting with literals: + +{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 37 39 >}} + +Note the use of `_`. Since `IntLiteral` always produces `SafeExpr IntType`, we allow +Idris to infer that `IntType` must be the first element of the tuple. This is easy +enough, because a boolean, integer, or string literal can never be type-incorrect. + +The interesting case is that of a binary operation. Is `"hello" * 3` invalid? It might +be, but some languages evaluate the multiplication of a string by a number as repeating +the string that many times: `"hellohellohello"`. It is up to us, the language designers, +to specify the set of valid operations. Furthermore, observe that `BinOperation` +takes a _function_ as its first argument, not an `Op`. To guarantee that we can, +in fact, evaluate a `BinOperation` to the promised type, we require that the means +of performing the evaluation is included in the expression. Thus, when we convert `Expr` +to `SafeExpr`, we need to convert an `Op` to a corresponding function. As we can see +with `"hello"*3` and `163*2`, an `Op` can correspond to a different function +depending on the types of its inputs. To deal with this, we define a new function `typecheckOp`, +which takes an `Op` and two expression types, and returns either an error (if the `Op` can't +be applied to those types) or a dependent pair containing the output type of the operation +and a function that performs the required computation. That's a mouthful; let's look at the code: + +{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 29 34 >}} + +When `(+)` is applied to two integers, this is not an error, and the result is also +an integer. To perform addition, we use Idris' built-in function `(+)`. The +same is true for all other arithmetic operations in this example. In all other +cases, we simply return an error. We can now use `typecheckOp` in our `typecheck` +function: + +{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 40 44 >}} + +Here, we use do-notation to first type check first the left, then the right subexpression. +Since the result of type checking the subexpressions gives us their output types, +we can feed these types, together with `o`, to `typecheckOp` to determine the output +type and the applicable evaluation function. Finally, we assemble the new `SafeExpr` +from the function and the two translated subexpressions. + +Alright, we've done all this work. Is it worth it? Let's try implementing `eval`: + +{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 46 50 >}} + +That's it! No `Maybe`, no error cases. `eval` is completely total, but doesn't require +error handling because it __knows that the expression it is evaluating is type-correct__! + +Let's run all of this. We'll need some code to print the result of evaluating an expression. +Here's all that: + +{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 52 64 >}} + +And the output is: + +``` +>>> idris TypesafeIntr.idr -o typesafe +>>> ./typesafe +326 +``` + +That's right! What about a type-incorrect example? + +```Idris +BinOp Add (IntLit 6) (BinOp Multiply (IntLit 160) (StringLit "hi")) +``` + +The program reports: + +``` +Type error: Invalid binary operator application +``` + +Awesome! + +### Wrapping Up +In this post, we learned that type checking can be used to translate an expression into +a more strongly-typed data type, which can be (more) safe to evaluate. To help +strengthen the types of our expression language, we used the Idris language and +its support for dependent types and Generalized Algebraic Data Types (GADTs). +I this was interesting! + +As usual, you can find the code for this post in this website's Git repository. The +source file we went through today is found [here](https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/typesafe-interpreter/TypesafeIntr.idr).