Add initial draft of typesafe interpreter post

2020-02-27 23:09:51 -08:00 · 2020-02-27 23:09:51 -08:00 · 9e399ebe3c
commit 9e399ebe3c
parent eac1151616
2 changed files with 215 additions and 0 deletions
--- a/code/typesafe-interpreter/TypesafeIntr.idr
+++ b/code/typesafe-interpreter/TypesafeIntr.idr
@ -0,0 +1,64 @@
+data ExprType
+  = IntType
+  | BoolType
+  | StringType
+
+repr : ExprType -> Type
+repr IntType = Int
+repr BoolType = Bool
+repr StringType = String
+
+data Op
+  = Add
+  | Subtract
+  | Multiply
+  | Divide
+
+data Expr
+  = IntLit Int
+  | BoolLit Bool
+  | StringLit String
+  | BinOp Op Expr Expr
+
+data SafeExpr : ExprType -> Type where
+  IntLiteral : Int -> SafeExpr IntType
+  BoolLiteral : Bool -> SafeExpr BoolType
+  StringLiteral : String -> SafeExpr StringType
+  BinOperation : (repr a -> repr b -> repr c) -> SafeExpr a -> SafeExpr b -> SafeExpr c
+
+typecheckOp : Op -> (a : ExprType) -> (b : ExprType) -> Either String (c : ExprType ** repr a -> repr b -> repr c) 
+typecheckOp Add IntType IntType = Right (IntType ** (+))
+typecheckOp Subtract IntType IntType = Right (IntType ** (-))
+typecheckOp Multiply IntType IntType = Right (IntType ** (*))
+typecheckOp Divide IntType IntType = Right (IntType ** div)
+typecheckOp _ _ _ = Left "Invalid binary operator application"
+
+typecheck : Expr -> Either String (n : ExprType ** SafeExpr n)
+typecheck (IntLit i) = Right (_ ** IntLiteral i)
+typecheck (BoolLit b) = Right (_ ** BoolLiteral b)
+typecheck (StringLit s) = Right (_ ** StringLiteral s)
+typecheck (BinOp o l r) = do
+  (lt ** le) <- typecheck l
+  (rt ** re) <- typecheck r
+  (ot ** f) <- typecheckOp o lt rt
+  pure (_ ** BinOperation f le re)
+
+eval : {t : ExprType} -> SafeExpr t -> repr t
+eval (IntLiteral i) = i
+eval (BoolLiteral b) = b
+eval (StringLiteral s) = s
+eval (BinOperation f l r) = f (eval l) (eval r)
+
+resultStr : {t : ExprType} -> repr t -> String
+resultStr {t=IntType} i = show i
+resultStr {t=BoolType} b = show b
+resultStr {t=StringType} s = show s
+
+tryEval : Expr -> String
+tryEval ex =
+  case typecheck ex of
+    Left err => "Type error: " ++ err
+    Right (t ** e) => resultStr $ eval {t=t} e
+
+main : IO ()
+main = putStrLn $ tryEval $ BinOp Add (IntLit 6) (BinOp Multiply (IntLit 160) (IntLit 2))
--- a/content/blog/typesafe_interpreter.md
+++ b/content/blog/typesafe_interpreter.md
@ -0,0 +1,151 @@
+---
+title: Meaningfully Typechecking a Language in Idris
+date: 2020-02-27T21:58:55-08:00
+draft: true
+tags: ["Haskell", "Idris"]
+---
+
+This term, I'm a TA for Oregon State University's Programming Languages course.
+The students in the course are tasked with using Haskell to implement a programming
+language of their own design. One of the things they can do to gain points for the
+project is implement type checking, rejecting
+{{< sidenote "right" "ill-typed-note" "ill-typed programs or expressions" >}}
+Whether or not the below example is ill-typed actually depends on your language.
+Many languages (even those with a static type system, like C++ or Crystal)
+have a notion of "truthy" and "falsy" values. These values can be used 
+in the condition of an if-expression, and will be equivalent to "true" or "false",
+respectively. However, for simplicity, I will avoid including
+truthy and falsy values into the languages in this post. For the same reason, I will avoid
+reasoning about
+<a href="https://developer.mozilla.org/en-US/docs/Glossary/Type_coercion">type coercions</a>,
+which make expressions like <code>"Hello"+3</code> valid.
+{{< /sidenote >}} such as:
+
+```Haskell
+if "Hello" then 0 else 1
+```
+
+For instance, a student may have a function `typecheck`, with the following
+signature (in Haskell):
+
+```Haskell
+typecheck :: Expr -> Either TypeError ExprType
+```
+
+The function will return an error if something goes wrong, or, if everything
+goes well, the type of the given expression. So far, so good.
+
+A student asked, however:
+
+> Now that I ran type checking on my program, surely I don't need to include errors
+in my {{< sidenote "right" "valuation-function-note" "valuation function!" >}}
+I'm using "valuation function" here in the context of 
+<a href="https://en.wikibooks.org/wiki/Haskell/Denotational_semantics">denotational semantics</a>.
+In short, a
+<a href="http://www.inf.ed.ac.uk/teaching/courses/inf2a/readings/semantics-note.pdf">valuation function</a>
+takes an expression and assigns to it some
+representation of its meaning. For a language of arithmetic expression, the
+"meaning" of an expression is just a number (the result of simplifying the expression).
+For a language of booleans, <code>and</code>, and <code>or</code>, the "meaning" is a boolean
+for the same reason. Since an expression in the language can be ill-formed (like
+<code>list(5)</code> in Python), the "meaning" (<em>semantic domain</em>) of a
+complicated language tends to include the possibility of errors.
+{{< /sidenote >}} I should be able to make my function be of type `Expr -> Val`, and not
+`Expr -> Maybe Val`!
+
+Unfortunately, this is not quite true. It is true that if the student's type checking
+function is correct, then there will be no way for a type error to occur during
+the evaluation of an expression "validated" by said function. The issue is, though,
+that __the type system does not know about the expression's type-correctness__. Haskell
+doesn't know that an expression has been type checked; worse, since the function's type
+indicates that it accepts `Expr`, it must handle invalid expressions to avoid being [partial](https://wiki.haskell.org/Partial_functions). In short, even if we __know__ that the
+expressions we give to a function are type safe, we have no way of enforcing this.
+
+A potential solution offered in class was to separate the expressions into several
+data types, `BoolExpr`, `ArithExpr`, and finally, a more general `Expr'` that can
+be constructed from the first two. Operations such as `and` and `or`
+will then only be applicable to boolean expressions:
+
+```Haskell
+data BoolExpr = BoolLit Bool | And BoolExpr BoolExpr | Or BoolExpr BoolExpr
+```
+
+It will be a type error to represent an expression such as `true or 5`. Then,
+`Expr'` may have a constructor such as `IfElse` that only accepts a boolean
+expression as the first argument:
+
+```Haskell
+data Expr' = IfElse BoolExpr Expr' Expr' | ...
+```
+
+All seems well. Now, it's impossible to have a non-boolean condition, and thus,
+this error has been eliminated from the evaluator. Maybe we can even have
+our type checking function translate an unsafe, potentially incorrect `Expr` into
+a more safe `Expr'`:
+
+```Haskell
+typecheck :: Expr -> Either TypeError (Expr', ExprType)
+```
+
+However, we typically also want the branches of an if expression to both have the same
+type - `if x then 3 else False` may work sometimes, but not always, depending of the
+value of `x`. How do we encode this? Do we have two constructors, `IfElseBool` and
+`IfElseInt`, with one `BoolExpr` and the other in `ArithExpr`?  What if we add strings?
+We'll be copying functionality back and forth, and our code will suffer. Wouldn't it be
+nice if we could somehow tag our expressions with the type they produce? Instead of
+`BoolExpr` and `ArithExpr`, we would be able to have `Expr BoolType` and `Expr IntType`,
+which would share the `IfElse` constructor...
+
+It's not easy to do this in canonical Haskell, but it can be done in Idris!
+
+### Enter Dependent Types
+Idris is a language with support for [dependent types](https://en.wikipedia.org/wiki/Dependent_type). Wikipedia gives the following definition for "dependent type":
+
+> In computer science and logic, a dependent type is a type whose definition depends on a value.
+
+This is exactly what we want. In Idris, we can define the possible set of types in our
+language:
+
+{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 1 4>}}
+
+Then, we can define a `SafeExpr` type family, which is indexed by `ExprType`.
+Here's the
+{{< sidenote "right" "gadt-note" "code," >}}
+I should probably note that the definition of <code>SafeExpr</code> is that of
+a
+<a href="https://en.wikipedia.org/wiki/Generalized_algebraic_data_type">Generalized Algebraic Data Type</a>,
+or GADT for short. This is what allows each of our constructors to produce
+values of a different type: <code>IntLiteral</code> builds <code>SafeExpr IntType</code>,
+while <code>BoolLiteral</code> builds <code>SafeExpr BoolType</code>.
+{{</ sidenote >}} which we will discuss below:
+
+{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 23 27 >}}
+
+The first line of the above snippet says, "`SafeExpr` is a type constructor
+that requires a value of type `ExprType`". For example, we can have
+`SafeExpr IntType`, or `SafeExpr BoolType`. Next, we have to define constructors
+for `SafeExpr`. One such constructor is `IntLiteral`, which takes a value of
+type `Int` (which represents the value of the integer literal), and builds
+a value of `SafeExpr IntType`, that is, an expression that __we know evaluates
+to an integer__.
+
+The same is the case for `BoolLiteral` and `StringLiteral`, only they build
+values of type `SafeExpr BoolType` and `SafeExpr StringType`, respectively.
+
+The more complicated case is that of `BinOperation`. Put simply, it takes
+a binary function of type `a->b->c` (kind of), two `SafeExpr`s producing `a` and `b`,
+and combines the values of those expressions using the function to generate
+a value of type `c`. Since the whole expression returns `c`, `BinOperation`
+builds a value of type `SafeExpr c`.
+
+That's almost it. Except, what's up with `repr`? We need it because `SafeExpr`
+is parameterized by a __value__ of type `ExprType`. Thus, `a`, `b`, and `c` are
+all values in the definition of `BinOperation`. However, in a function
+`input->output`, both `input` and `output` have to be __types__, not values.
+Thus, we define a function `repr` which converts values such as `IntType` into
+the actual type that `eval` would yield when running our expression:
+
+{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 6 9 >}}
+
+The power of dependent types allows us to run `repr` inside the type
+of `BinOp` to compute the type of the function it must accept.