blog-static/content/blog/typesafe_interpreter.md

7.6 KiB

title date draft tags
Meaningfully Typechecking a Language in Idris 2020-02-27T21:58:55-08:00 true
Haskell
Idris

This term, I'm a TA for Oregon State University's Programming Languages course. The students in the course are tasked with using Haskell to implement a programming language of their own design. One of the things they can do to gain points for the project is implement type checking, rejecting {{< sidenote "right" "ill-typed-note" "ill-typed programs or expressions" >}} Whether or not the below example is ill-typed actually depends on your language. Many languages (even those with a static type system, like C++ or Crystal) have a notion of "truthy" and "falsy" values. These values can be used in the condition of an if-expression, and will be equivalent to "true" or "false", respectively. However, for simplicity, I will avoid including truthy and falsy values into the languages in this post. For the same reason, I will avoid reasoning about type coercions, which make expressions like "Hello"+3 valid. {{< /sidenote >}} such as:

if "Hello" then 0 else 1

For instance, a student may have a function typecheck, with the following signature (in Haskell):

typecheck :: Expr -> Either TypeError ExprType

The function will return an error if something goes wrong, or, if everything goes well, the type of the given expression. So far, so good.

A student asked, however:

Now that I ran type checking on my program, surely I don't need to include errors in my {{< sidenote "right" "valuation-function-note" "valuation function!" >}} I'm using "valuation function" here in the context of denotational semantics. In short, a valuation function takes an expression and assigns to it some representation of its meaning. For a language of arithmetic expression, the "meaning" of an expression is just a number (the result of simplifying the expression). For a language of booleans, and, and or, the "meaning" is a boolean for the same reason. Since an expression in the language can be ill-formed (like list(5) in Python), the "meaning" (semantic domain) of a complicated language tends to include the possibility of errors. {{< /sidenote >}} I should be able to make my function be of type Expr -> Val, and not Expr -> Maybe Val!

Unfortunately, this is not quite true. It is true that if the student's type checking function is correct, then there will be no way for a type error to occur during the evaluation of an expression "validated" by said function. The issue is, though, that the type system does not know about the expression's type-correctness. Haskell doesn't know that an expression has been type checked; worse, since the function's type indicates that it accepts Expr, it must handle invalid expressions to avoid being partial. In short, even if we know that the expressions we give to a function are type safe, we have no way of enforcing this.

A potential solution offered in class was to separate the expressions into several data types, BoolExpr, ArithExpr, and finally, a more general Expr' that can be constructed from the first two. Operations such as and and or will then only be applicable to boolean expressions:

data BoolExpr = BoolLit Bool | And BoolExpr BoolExpr | Or BoolExpr BoolExpr

It will be a type error to represent an expression such as true or 5. Then, Expr' may have a constructor such as IfElse that only accepts a boolean expression as the first argument:

data Expr' = IfElse BoolExpr Expr' Expr' | ...

All seems well. Now, it's impossible to have a non-boolean condition, and thus, this error has been eliminated from the evaluator. Maybe we can even have our type checking function translate an unsafe, potentially incorrect Expr into a more safe Expr':

typecheck :: Expr -> Either TypeError (Expr', ExprType)

However, we typically also want the branches of an if expression to both have the same type - if x then 3 else False may work sometimes, but not always, depending of the value of x. How do we encode this? Do we have two constructors, IfElseBool and IfElseInt, with one BoolExpr and the other in ArithExpr? What if we add strings? We'll be copying functionality back and forth, and our code will suffer. Wouldn't it be nice if we could somehow tag our expressions with the type they produce? Instead of BoolExpr and ArithExpr, we would be able to have Expr BoolType and Expr IntType, which would share the IfElse constructor...

It's not easy to do this in canonical Haskell, but it can be done in Idris!

Enter Dependent Types

Idris is a language with support for dependent types. Wikipedia gives the following definition for "dependent type":

In computer science and logic, a dependent type is a type whose definition depends on a value.

This is exactly what we want. In Idris, we can define the possible set of types in our language:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 1 4>}}

Then, we can define a SafeExpr type family, which is indexed by ExprType. Here's the {{< sidenote "right" "gadt-note" "code," >}} I should probably note that the definition of SafeExpr is that of a Generalized Algebraic Data Type, or GADT for short. This is what allows each of our constructors to produce values of a different type: IntLiteral builds SafeExpr IntType, while BoolLiteral builds SafeExpr BoolType. {{</ sidenote >}} which we will discuss below:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 23 27 >}}

The first line of the above snippet says, "SafeExpr is a type constructor that requires a value of type ExprType". For example, we can have SafeExpr IntType, or SafeExpr BoolType. Next, we have to define constructors for SafeExpr. One such constructor is IntLiteral, which takes a value of type Int (which represents the value of the integer literal), and builds a value of SafeExpr IntType, that is, an expression that we know evaluates to an integer.

The same is the case for BoolLiteral and StringLiteral, only they build values of type SafeExpr BoolType and SafeExpr StringType, respectively.

The more complicated case is that of BinOperation. Put simply, it takes a binary function of type a->b->c (kind of), two SafeExprs producing a and b, and combines the values of those expressions using the function to generate a value of type c. Since the whole expression returns c, BinOperation builds a value of type SafeExpr c.

That's almost it. Except, what's up with repr? We need it because SafeExpr is parameterized by a value of type ExprType. Thus, a, b, and c are all values in the definition of BinOperation. However, in a function input->output, both input and output have to be types, not values. Thus, we define a function repr which converts values such as IntType into the actual type that eval would yield when running our expression:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 6 9 >}}

The power of dependent types allows us to run repr inside the type of BinOp to compute the type of the function it must accept.