diff --git a/code/typesafe-interpreter/TypesafeIntr.idr b/code/typesafe-interpreter/TypesafeIntr.idr new file mode 100644 index 0000000..f6ec46e --- /dev/null +++ b/code/typesafe-interpreter/TypesafeIntr.idr @@ -0,0 +1,64 @@ +data ExprType + = IntType + | BoolType + | StringType + +repr : ExprType -> Type +repr IntType = Int +repr BoolType = Bool +repr StringType = String + +data Op + = Add + | Subtract + | Multiply + | Divide + +data Expr + = IntLit Int + | BoolLit Bool + | StringLit String + | BinOp Op Expr Expr + +data SafeExpr : ExprType -> Type where + IntLiteral : Int -> SafeExpr IntType + BoolLiteral : Bool -> SafeExpr BoolType + StringLiteral : String -> SafeExpr StringType + BinOperation : (repr a -> repr b -> repr c) -> SafeExpr a -> SafeExpr b -> SafeExpr c + +typecheckOp : Op -> (a : ExprType) -> (b : ExprType) -> Either String (c : ExprType ** repr a -> repr b -> repr c) +typecheckOp Add IntType IntType = Right (IntType ** (+)) +typecheckOp Subtract IntType IntType = Right (IntType ** (-)) +typecheckOp Multiply IntType IntType = Right (IntType ** (*)) +typecheckOp Divide IntType IntType = Right (IntType ** div) +typecheckOp _ _ _ = Left "Invalid binary operator application" + +typecheck : Expr -> Either String (n : ExprType ** SafeExpr n) +typecheck (IntLit i) = Right (_ ** IntLiteral i) +typecheck (BoolLit b) = Right (_ ** BoolLiteral b) +typecheck (StringLit s) = Right (_ ** StringLiteral s) +typecheck (BinOp o l r) = do + (lt ** le) <- typecheck l + (rt ** re) <- typecheck r + (ot ** f) <- typecheckOp o lt rt + pure (_ ** BinOperation f le re) + +eval : {t : ExprType} -> SafeExpr t -> repr t +eval (IntLiteral i) = i +eval (BoolLiteral b) = b +eval (StringLiteral s) = s +eval (BinOperation f l r) = f (eval l) (eval r) + +resultStr : {t : ExprType} -> repr t -> String +resultStr {t=IntType} i = show i +resultStr {t=BoolType} b = show b +resultStr {t=StringType} s = show s + +tryEval : Expr -> String +tryEval ex = + case typecheck ex of + Left err => "Type error: " ++ err + Right (t ** e) => resultStr $ eval {t=t} e + +main : IO () +main = putStrLn $ tryEval $ BinOp Add (IntLit 6) (BinOp Multiply (IntLit 160) (IntLit 2)) diff --git a/content/blog/typesafe_interpreter.md b/content/blog/typesafe_interpreter.md new file mode 100644 index 0000000..a19aa1a --- /dev/null +++ b/content/blog/typesafe_interpreter.md @@ -0,0 +1,151 @@ +--- +title: Meaningfully Typechecking a Language in Idris +date: 2020-02-27T21:58:55-08:00 +draft: true +tags: ["Haskell", "Idris"] +--- + +This term, I'm a TA for Oregon State University's Programming Languages course. +The students in the course are tasked with using Haskell to implement a programming +language of their own design. One of the things they can do to gain points for the +project is implement type checking, rejecting +{{< sidenote "right" "ill-typed-note" "ill-typed programs or expressions" >}} +Whether or not the below example is ill-typed actually depends on your language. +Many languages (even those with a static type system, like C++ or Crystal) +have a notion of "truthy" and "falsy" values. These values can be used +in the condition of an if-expression, and will be equivalent to "true" or "false", +respectively. However, for simplicity, I will avoid including +truthy and falsy values into the languages in this post. For the same reason, I will avoid +reasoning about +type coercions, +which make expressions like "Hello"+3 valid. +{{< /sidenote >}} such as: + +```Haskell +if "Hello" then 0 else 1 +``` + +For instance, a student may have a function `typecheck`, with the following +signature (in Haskell): + +```Haskell +typecheck :: Expr -> Either TypeError ExprType +``` + +The function will return an error if something goes wrong, or, if everything +goes well, the type of the given expression. So far, so good. + +A student asked, however: + +> Now that I ran type checking on my program, surely I don't need to include errors +in my {{< sidenote "right" "valuation-function-note" "valuation function!" >}} +I'm using "valuation function" here in the context of +denotational semantics. +In short, a +valuation function +takes an expression and assigns to it some +representation of its meaning. For a language of arithmetic expression, the +"meaning" of an expression is just a number (the result of simplifying the expression). +For a language of booleans, and, and or, the "meaning" is a boolean +for the same reason. Since an expression in the language can be ill-formed (like +list(5) in Python), the "meaning" (semantic domain) of a +complicated language tends to include the possibility of errors. +{{< /sidenote >}} I should be able to make my function be of type `Expr -> Val`, and not +`Expr -> Maybe Val`! + +Unfortunately, this is not quite true. It is true that if the student's type checking +function is correct, then there will be no way for a type error to occur during +the evaluation of an expression "validated" by said function. The issue is, though, +that __the type system does not know about the expression's type-correctness__. Haskell +doesn't know that an expression has been type checked; worse, since the function's type +indicates that it accepts `Expr`, it must handle invalid expressions to avoid being [partial](https://wiki.haskell.org/Partial_functions). In short, even if we __know__ that the +expressions we give to a function are type safe, we have no way of enforcing this. + +A potential solution offered in class was to separate the expressions into several +data types, `BoolExpr`, `ArithExpr`, and finally, a more general `Expr'` that can +be constructed from the first two. Operations such as `and` and `or` +will then only be applicable to boolean expressions: + +```Haskell +data BoolExpr = BoolLit Bool | And BoolExpr BoolExpr | Or BoolExpr BoolExpr +``` + +It will be a type error to represent an expression such as `true or 5`. Then, +`Expr'` may have a constructor such as `IfElse` that only accepts a boolean +expression as the first argument: + +```Haskell +data Expr' = IfElse BoolExpr Expr' Expr' | ... +``` + +All seems well. Now, it's impossible to have a non-boolean condition, and thus, +this error has been eliminated from the evaluator. Maybe we can even have +our type checking function translate an unsafe, potentially incorrect `Expr` into +a more safe `Expr'`: + +```Haskell +typecheck :: Expr -> Either TypeError (Expr', ExprType) +``` + +However, we typically also want the branches of an if expression to both have the same +type - `if x then 3 else False` may work sometimes, but not always, depending of the +value of `x`. How do we encode this? Do we have two constructors, `IfElseBool` and +`IfElseInt`, with one `BoolExpr` and the other in `ArithExpr`? What if we add strings? +We'll be copying functionality back and forth, and our code will suffer. Wouldn't it be +nice if we could somehow tag our expressions with the type they produce? Instead of +`BoolExpr` and `ArithExpr`, we would be able to have `Expr BoolType` and `Expr IntType`, +which would share the `IfElse` constructor... + +It's not easy to do this in canonical Haskell, but it can be done in Idris! + +### Enter Dependent Types +Idris is a language with support for [dependent types](https://en.wikipedia.org/wiki/Dependent_type). Wikipedia gives the following definition for "dependent type": + +> In computer science and logic, a dependent type is a type whose definition depends on a value. + +This is exactly what we want. In Idris, we can define the possible set of types in our +language: + +{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 1 4>}} + +Then, we can define a `SafeExpr` type family, which is indexed by `ExprType`. +Here's the +{{< sidenote "right" "gadt-note" "code," >}} +I should probably note that the definition of SafeExpr is that of +a +Generalized Algebraic Data Type, +or GADT for short. This is what allows each of our constructors to produce +values of a different type: IntLiteral builds SafeExpr IntType, +while BoolLiteral builds SafeExpr BoolType. +{{}} which we will discuss below: + +{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 23 27 >}} + +The first line of the above snippet says, "`SafeExpr` is a type constructor +that requires a value of type `ExprType`". For example, we can have +`SafeExpr IntType`, or `SafeExpr BoolType`. Next, we have to define constructors +for `SafeExpr`. One such constructor is `IntLiteral`, which takes a value of +type `Int` (which represents the value of the integer literal), and builds +a value of `SafeExpr IntType`, that is, an expression that __we know evaluates +to an integer__. + +The same is the case for `BoolLiteral` and `StringLiteral`, only they build +values of type `SafeExpr BoolType` and `SafeExpr StringType`, respectively. + +The more complicated case is that of `BinOperation`. Put simply, it takes +a binary function of type `a->b->c` (kind of), two `SafeExpr`s producing `a` and `b`, +and combines the values of those expressions using the function to generate +a value of type `c`. Since the whole expression returns `c`, `BinOperation` +builds a value of type `SafeExpr c`. + +That's almost it. Except, what's up with `repr`? We need it because `SafeExpr` +is parameterized by a __value__ of type `ExprType`. Thus, `a`, `b`, and `c` are +all values in the definition of `BinOperation`. However, in a function +`input->output`, both `input` and `output` have to be __types__, not values. +Thus, we define a function `repr` which converts values such as `IntType` into +the actual type that `eval` would yield when running our expression: + +{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 6 9 >}} + +The power of dependent types allows us to run `repr` inside the type +of `BinOp` to compute the type of the function it must accept.