--- title: Meaningfully Typechecking a Language in Idris date: 2020-02-27T21:58:55-08:00 tags: ["Haskell", "Idris", "Programming Languages"] series: "Meaningfully Typechecking a Language in Idris" --- This term, I'm a TA for Oregon State University's Programming Languages course. The students in the course are tasked with using Haskell to implement a programming language of their own design. One of the things they can do to gain points for the project is implement type checking, rejecting {{< sidenote "right" "ill-typed-note" "ill-typed programs or expressions" >}} Whether or not the below example is ill-typed actually depends on your language. Many languages (even those with a static type system, like C++ or Crystal) have a notion of "truthy" and "falsy" values. These values can be used in the condition of an if-expression, and will be equivalent to "true" or "false", respectively. However, for simplicity, I will avoid including truthy and falsy values into the languages in this post. For the same reason, I will avoid reasoning about type coercions, which make expressions like "Hello"+3 valid. {{< /sidenote >}} such as: ```Haskell if "Hello" then 0 else 1 ``` For instance, a student may have a function `typecheck`, with the following signature (in Haskell): ```Haskell typecheck :: Expr -> Either TypeError ExprType ``` The function will return an error if something goes wrong, or, if everything goes well, the type of the given expression. So far, so good. A student asked, however: > Now that I ran type checking on my program, surely I don't need to include errors in my {{< sidenote "right" "valuation-function-note" "valuation function!" >}} I'm using "valuation function" here in the context of denotational semantics. In short, a valuation function takes an expression and assigns to it some representation of its meaning. For a language of arithmetic expression, the "meaning" of an expression is just a number (the result of simplifying the expression). For a language of booleans, and, and or, the "meaning" is a boolean for the same reason. Since an expression in the language can be ill-formed (like list(5) in Python), the "meaning" (semantic domain) of a complicated language tends to include the possibility of errors. {{< /sidenote >}} I should be able to make my function be of type `Expr -> Val`, and not `Expr -> Maybe Val`! Unfortunately, this is not quite true. It is true that if the student's type checking function is correct, then there will be no way for a type error to occur during the evaluation of an expression "validated" by said function. The issue is, though, that __the type system does not know about the expression's type-correctness__. Haskell doesn't know that an expression has been type checked; worse, since the function's type indicates that it accepts `Expr`, it must handle invalid expressions to avoid being [partial](https://wiki.haskell.org/Partial_functions). In short, even if we __know__ that the expressions we give to a function are type safe, we have no way of enforcing this. A potential solution offered in class was to separate the expressions into several data types, `BoolExpr`, `ArithExpr`, and finally, a more general `Expr'` that can be constructed from the first two. Operations such as `and` and `or` will then only be applicable to boolean expressions: ```Haskell data BoolExpr = BoolLit Bool | And BoolExpr BoolExpr | Or BoolExpr BoolExpr ``` It will be a type error to represent an expression such as `true or 5`. Then, `Expr'` may have a constructor such as `IfElse` that only accepts a boolean expression as the first argument: ```Haskell data Expr' = IfElse BoolExpr Expr' Expr' | ... ``` All seems well. Now, it's impossible to have a non-boolean condition, and thus, this error has been eliminated from the evaluator. Maybe we can even have our type checking function translate an unsafe, potentially incorrect `Expr` into a more safe `Expr'`: ```Haskell typecheck :: Expr -> Either TypeError (Expr', ExprType) ``` However, we typically also want the branches of an if-expression to both have the same type - `if x then 3 else False` may work sometimes, but not always, depending of the value of `x`. How do we encode this? Do we have two constructors, `IfElseBool` and `IfElseInt`, with one in `BoolExpr` and the other in `ArithExpr`? What if we add strings? We'll be copying functionality back and forth, and our code will suffer. Wouldn't it be nice if we could somehow tag our expressions with the type they produce? Instead of `BoolExpr` and `ArithExpr`, we would be able to have `Expr BoolType` and `Expr IntType`, which would share the `IfElse` constructor... It's not easy to do this in canonical Haskell, but it can be done in Idris! ### Enter Dependent Types Idris is a language with support for [dependent types](https://en.wikipedia.org/wiki/Dependent_type). Wikipedia gives the following definition for "dependent type": > In computer science and logic, a dependent type is a type whose definition depends on a value. This is exactly what we want. In Idris, we can define the possible set of types in our language: {{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 1 4>}} Then, we can define a `SafeExpr` type family, which is indexed by `ExprType`. Here's the {{< sidenote "right" "gadt-note" "code," >}} I should probably note that the definition of SafeExpr is that of a Generalized Algebraic Data Type, or GADT for short. This is what allows each of our constructors to produce values of a different type: IntLiteral builds SafeExpr IntType, while BoolLiteral builds SafeExpr BoolType. {{}} which we will discuss below: {{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 23 27 >}} The first line of the above snippet says, "`SafeExpr` is a type constructor that requires a value of type `ExprType`". For example, we can have `SafeExpr IntType`, or `SafeExpr BoolType`. Next, we have to define constructors for `SafeExpr`. One such constructor is `IntLiteral`, which takes a value of type `Int` (which represents the value of the integer literal), and builds a value of `SafeExpr IntType`, that is, an expression that __we know evaluates to an integer__. The same is the case for `BoolLiteral` and `StringLiteral`, only they build values of type `SafeExpr BoolType` and `SafeExpr StringType`, respectively. The more complicated case is that of `BinOperation`. Put simply, it takes a binary function of type `a->b->c` (kind of), two `SafeExpr`s producing `a` and `b`, and combines the values of those expressions using the function to generate a value of type `c`. Since the whole expression returns `c`, `BinOperation` builds a value of type `SafeExpr c`. That's almost it. Except, what's up with `repr`? We need it because `SafeExpr` is parameterized by a __value__ of type `ExprType`. Thus, `a`, `b`, and `c` are all values in the definition of `BinOperation`. However, in a function `input->output`, both `input` and `output` have to be __types__, not values. Thus, we define a function `repr` which converts values such as `IntType` into the actual type that `eval` would yield when running our expression: {{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 6 9 >}} The power of dependent types allows us to run `repr` inside the type of `BinOp` to compute the type of the function it must accept. Now, we have a way to represent expressions that are guaranteed to be type safe. With this, we can make our `typeheck` function convert an `Expr` to a `SafeExpr`. Wait a minute, though! We can't _just_ return `SafeExpr`: it's a type constructor! We need to somehow return `SafeExpr a`, where `a` is a value of type `ExprType`. But it doesn't make sense for the return type to have a new type variable that didn't occur in the rest of the type signature. It would be ideal if we could return both the type of the expression, and a `SafeExpr` of that type. In fact, we can! Idris has something called _dependent pairs_, which are like normal pairs, but in which the type of the second element depends on the value of the first element. The canonical example of this is a pair of (list length, list of that many elements). For instance, in Idris, we can write: ```Idris listPair : (n : Nat ** Vec n Int) ``` In the above snippet, we declare the type for a pair of a natural number (`n : Nat`) and a list of integers (`Vect n Int`), where the number of elements in the list is equal to the natural number. Let's try applying this to our problem. We want to return an `ExprType`, and a `SafeExpr` which depends on that `ExprType`. How about this: {{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 36 36 >}} Given an expression, we return either an error (`String`) or a dependent pair, which contains some `ExprType` `n` and a `SafeExpr` that evaluates to a value of type `n`. We can even start implementing this function, starting with literals: {{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 37 39 >}} Note the use of `_`. Since `IntLiteral` always produces `SafeExpr IntType`, we allow Idris to infer that `IntType` must be the first element of the tuple. This is easy enough, because a boolean, integer, or string literal can never be type-incorrect. The interesting case is that of a binary operation. Is `"hello" * 3` invalid? It might be, but some languages evaluate the multiplication of a string by a number as repeating the string that many times: `"hellohellohello"`. It is up to us, the language designers, to specify the set of valid operations. Furthermore, observe that `BinOperation` takes a _function_ as its first argument, not an `Op`. To guarantee that we can, in fact, evaluate a `BinOperation` to the promised type, we require that the means of performing the evaluation is included in the expression. Thus, when we convert `Expr` to `SafeExpr`, we need to convert an `Op` to a corresponding function. As we can see with `"hello"*3` and `163*2`, an `Op` can correspond to a different function depending on the types of its inputs. To deal with this, we define a new function `typecheckOp`, which takes an `Op` and two expression types, and returns either an error (if the `Op` can't be applied to those types) or a dependent pair containing the output type of the operation and a function that performs the required computation. That's a mouthful; let's look at the code: {{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 29 34 >}} When `(+)` is applied to two integers, this is not an error, and the result is also an integer. To perform addition, we use Idris' built-in function `(+)`. The same is true for all other arithmetic operations in this example. In all other cases, we simply return an error. We can now use `typecheckOp` in our `typecheck` function: {{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 40 44 >}} Here, we use do-notation to first type check first the left, then the right subexpression. Since the result of type checking the subexpressions gives us their output types, we can feed these types, together with `o`, to `typecheckOp` to determine the output type and the applicable evaluation function. Finally, we assemble the new `SafeExpr` from the function and the two translated subexpressions. Alright, we've done all this work. Is it worth it? Let's try implementing `eval`: {{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 46 50 >}} That's it! No `Maybe`, no error cases. `eval` is completely total, but doesn't require error handling because it __knows that the expression it is evaluating is type-correct__! Let's run all of this. We'll need some code to print the result of evaluating an expression. Here's all that: {{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 52 64 >}} And the output is: ``` >>> idris TypesafeIntr.idr -o typesafe >>> ./typesafe 326 ``` That's right! What about a type-incorrect example? ```Idris BinOp Add (IntLit 6) (BinOp Multiply (IntLit 160) (StringLit "hi")) ``` The program reports: ``` Type error: Invalid binary operator application ``` Awesome! ### Wrapping Up In this post, we learned that type checking can be used to translate an expression into a more strongly-typed data type, which can be (more) safe to evaluate. To help strengthen the types of {{< sidenote "right" "if-else-note" "our expression language," >}} You may be thinking, "but where did the if-expressions go?". It turns out that making sure that the branches of an if-expression are of the same type is actually a fairly difficult task; the best way I found was enumerating all the possible "valid" combinations of types in a case-expression. Since this is obviously not the right solution, I decided to publish what I have, and look for an alternative. If I find a better solution, I will write a follow-up post. {{< /sidenote >}} we used the Idris language and its support for dependent types and Generalized Algebraic Data Types (GADTs). I hope this was interesting! As usual, you can find the code for this post in this website's Git repository. The source file we went through today is found [here](https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/typesafe-interpreter/TypesafeIntr.idr).