Add initial draft of typesafe interpreter post
All checks were successful
continuous-integration/drone/push Build is passing
All checks were successful
continuous-integration/drone/push Build is passing
This commit is contained in:
parent
eac1151616
commit
9e399ebe3c
64
code/typesafe-interpreter/TypesafeIntr.idr
Normal file
64
code/typesafe-interpreter/TypesafeIntr.idr
Normal file
|
@ -0,0 +1,64 @@
|
|||
data ExprType
|
||||
= IntType
|
||||
| BoolType
|
||||
| StringType
|
||||
|
||||
repr : ExprType -> Type
|
||||
repr IntType = Int
|
||||
repr BoolType = Bool
|
||||
repr StringType = String
|
||||
|
||||
data Op
|
||||
= Add
|
||||
| Subtract
|
||||
| Multiply
|
||||
| Divide
|
||||
|
||||
data Expr
|
||||
= IntLit Int
|
||||
| BoolLit Bool
|
||||
| StringLit String
|
||||
| BinOp Op Expr Expr
|
||||
|
||||
data SafeExpr : ExprType -> Type where
|
||||
IntLiteral : Int -> SafeExpr IntType
|
||||
BoolLiteral : Bool -> SafeExpr BoolType
|
||||
StringLiteral : String -> SafeExpr StringType
|
||||
BinOperation : (repr a -> repr b -> repr c) -> SafeExpr a -> SafeExpr b -> SafeExpr c
|
||||
|
||||
typecheckOp : Op -> (a : ExprType) -> (b : ExprType) -> Either String (c : ExprType ** repr a -> repr b -> repr c)
|
||||
typecheckOp Add IntType IntType = Right (IntType ** (+))
|
||||
typecheckOp Subtract IntType IntType = Right (IntType ** (-))
|
||||
typecheckOp Multiply IntType IntType = Right (IntType ** (*))
|
||||
typecheckOp Divide IntType IntType = Right (IntType ** div)
|
||||
typecheckOp _ _ _ = Left "Invalid binary operator application"
|
||||
|
||||
typecheck : Expr -> Either String (n : ExprType ** SafeExpr n)
|
||||
typecheck (IntLit i) = Right (_ ** IntLiteral i)
|
||||
typecheck (BoolLit b) = Right (_ ** BoolLiteral b)
|
||||
typecheck (StringLit s) = Right (_ ** StringLiteral s)
|
||||
typecheck (BinOp o l r) = do
|
||||
(lt ** le) <- typecheck l
|
||||
(rt ** re) <- typecheck r
|
||||
(ot ** f) <- typecheckOp o lt rt
|
||||
pure (_ ** BinOperation f le re)
|
||||
|
||||
eval : {t : ExprType} -> SafeExpr t -> repr t
|
||||
eval (IntLiteral i) = i
|
||||
eval (BoolLiteral b) = b
|
||||
eval (StringLiteral s) = s
|
||||
eval (BinOperation f l r) = f (eval l) (eval r)
|
||||
|
||||
resultStr : {t : ExprType} -> repr t -> String
|
||||
resultStr {t=IntType} i = show i
|
||||
resultStr {t=BoolType} b = show b
|
||||
resultStr {t=StringType} s = show s
|
||||
|
||||
tryEval : Expr -> String
|
||||
tryEval ex =
|
||||
case typecheck ex of
|
||||
Left err => "Type error: " ++ err
|
||||
Right (t ** e) => resultStr $ eval {t=t} e
|
||||
|
||||
main : IO ()
|
||||
main = putStrLn $ tryEval $ BinOp Add (IntLit 6) (BinOp Multiply (IntLit 160) (IntLit 2))
|
151
content/blog/typesafe_interpreter.md
Normal file
151
content/blog/typesafe_interpreter.md
Normal file
|
@ -0,0 +1,151 @@
|
|||
---
|
||||
title: Meaningfully Typechecking a Language in Idris
|
||||
date: 2020-02-27T21:58:55-08:00
|
||||
draft: true
|
||||
tags: ["Haskell", "Idris"]
|
||||
---
|
||||
|
||||
This term, I'm a TA for Oregon State University's Programming Languages course.
|
||||
The students in the course are tasked with using Haskell to implement a programming
|
||||
language of their own design. One of the things they can do to gain points for the
|
||||
project is implement type checking, rejecting
|
||||
{{< sidenote "right" "ill-typed-note" "ill-typed programs or expressions" >}}
|
||||
Whether or not the below example is ill-typed actually depends on your language.
|
||||
Many languages (even those with a static type system, like C++ or Crystal)
|
||||
have a notion of "truthy" and "falsy" values. These values can be used
|
||||
in the condition of an if-expression, and will be equivalent to "true" or "false",
|
||||
respectively. However, for simplicity, I will avoid including
|
||||
truthy and falsy values into the languages in this post. For the same reason, I will avoid
|
||||
reasoning about
|
||||
<a href="https://developer.mozilla.org/en-US/docs/Glossary/Type_coercion">type coercions</a>,
|
||||
which make expressions like <code>"Hello"+3</code> valid.
|
||||
{{< /sidenote >}} such as:
|
||||
|
||||
```Haskell
|
||||
if "Hello" then 0 else 1
|
||||
```
|
||||
|
||||
For instance, a student may have a function `typecheck`, with the following
|
||||
signature (in Haskell):
|
||||
|
||||
```Haskell
|
||||
typecheck :: Expr -> Either TypeError ExprType
|
||||
```
|
||||
|
||||
The function will return an error if something goes wrong, or, if everything
|
||||
goes well, the type of the given expression. So far, so good.
|
||||
|
||||
A student asked, however:
|
||||
|
||||
> Now that I ran type checking on my program, surely I don't need to include errors
|
||||
in my {{< sidenote "right" "valuation-function-note" "valuation function!" >}}
|
||||
I'm using "valuation function" here in the context of
|
||||
<a href="https://en.wikibooks.org/wiki/Haskell/Denotational_semantics">denotational semantics</a>.
|
||||
In short, a
|
||||
<a href="http://www.inf.ed.ac.uk/teaching/courses/inf2a/readings/semantics-note.pdf">valuation function</a>
|
||||
takes an expression and assigns to it some
|
||||
representation of its meaning. For a language of arithmetic expression, the
|
||||
"meaning" of an expression is just a number (the result of simplifying the expression).
|
||||
For a language of booleans, <code>and</code>, and <code>or</code>, the "meaning" is a boolean
|
||||
for the same reason. Since an expression in the language can be ill-formed (like
|
||||
<code>list(5)</code> in Python), the "meaning" (<em>semantic domain</em>) of a
|
||||
complicated language tends to include the possibility of errors.
|
||||
{{< /sidenote >}} I should be able to make my function be of type `Expr -> Val`, and not
|
||||
`Expr -> Maybe Val`!
|
||||
|
||||
Unfortunately, this is not quite true. It is true that if the student's type checking
|
||||
function is correct, then there will be no way for a type error to occur during
|
||||
the evaluation of an expression "validated" by said function. The issue is, though,
|
||||
that __the type system does not know about the expression's type-correctness__. Haskell
|
||||
doesn't know that an expression has been type checked; worse, since the function's type
|
||||
indicates that it accepts `Expr`, it must handle invalid expressions to avoid being [partial](https://wiki.haskell.org/Partial_functions). In short, even if we __know__ that the
|
||||
expressions we give to a function are type safe, we have no way of enforcing this.
|
||||
|
||||
A potential solution offered in class was to separate the expressions into several
|
||||
data types, `BoolExpr`, `ArithExpr`, and finally, a more general `Expr'` that can
|
||||
be constructed from the first two. Operations such as `and` and `or`
|
||||
will then only be applicable to boolean expressions:
|
||||
|
||||
```Haskell
|
||||
data BoolExpr = BoolLit Bool | And BoolExpr BoolExpr | Or BoolExpr BoolExpr
|
||||
```
|
||||
|
||||
It will be a type error to represent an expression such as `true or 5`. Then,
|
||||
`Expr'` may have a constructor such as `IfElse` that only accepts a boolean
|
||||
expression as the first argument:
|
||||
|
||||
```Haskell
|
||||
data Expr' = IfElse BoolExpr Expr' Expr' | ...
|
||||
```
|
||||
|
||||
All seems well. Now, it's impossible to have a non-boolean condition, and thus,
|
||||
this error has been eliminated from the evaluator. Maybe we can even have
|
||||
our type checking function translate an unsafe, potentially incorrect `Expr` into
|
||||
a more safe `Expr'`:
|
||||
|
||||
```Haskell
|
||||
typecheck :: Expr -> Either TypeError (Expr', ExprType)
|
||||
```
|
||||
|
||||
However, we typically also want the branches of an if expression to both have the same
|
||||
type - `if x then 3 else False` may work sometimes, but not always, depending of the
|
||||
value of `x`. How do we encode this? Do we have two constructors, `IfElseBool` and
|
||||
`IfElseInt`, with one `BoolExpr` and the other in `ArithExpr`? What if we add strings?
|
||||
We'll be copying functionality back and forth, and our code will suffer. Wouldn't it be
|
||||
nice if we could somehow tag our expressions with the type they produce? Instead of
|
||||
`BoolExpr` and `ArithExpr`, we would be able to have `Expr BoolType` and `Expr IntType`,
|
||||
which would share the `IfElse` constructor...
|
||||
|
||||
It's not easy to do this in canonical Haskell, but it can be done in Idris!
|
||||
|
||||
### Enter Dependent Types
|
||||
Idris is a language with support for [dependent types](https://en.wikipedia.org/wiki/Dependent_type). Wikipedia gives the following definition for "dependent type":
|
||||
|
||||
> In computer science and logic, a dependent type is a type whose definition depends on a value.
|
||||
|
||||
This is exactly what we want. In Idris, we can define the possible set of types in our
|
||||
language:
|
||||
|
||||
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 1 4>}}
|
||||
|
||||
Then, we can define a `SafeExpr` type family, which is indexed by `ExprType`.
|
||||
Here's the
|
||||
{{< sidenote "right" "gadt-note" "code," >}}
|
||||
I should probably note that the definition of <code>SafeExpr</code> is that of
|
||||
a
|
||||
<a href="https://en.wikipedia.org/wiki/Generalized_algebraic_data_type">Generalized Algebraic Data Type</a>,
|
||||
or GADT for short. This is what allows each of our constructors to produce
|
||||
values of a different type: <code>IntLiteral</code> builds <code>SafeExpr IntType</code>,
|
||||
while <code>BoolLiteral</code> builds <code>SafeExpr BoolType</code>.
|
||||
{{</ sidenote >}} which we will discuss below:
|
||||
|
||||
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 23 27 >}}
|
||||
|
||||
The first line of the above snippet says, "`SafeExpr` is a type constructor
|
||||
that requires a value of type `ExprType`". For example, we can have
|
||||
`SafeExpr IntType`, or `SafeExpr BoolType`. Next, we have to define constructors
|
||||
for `SafeExpr`. One such constructor is `IntLiteral`, which takes a value of
|
||||
type `Int` (which represents the value of the integer literal), and builds
|
||||
a value of `SafeExpr IntType`, that is, an expression that __we know evaluates
|
||||
to an integer__.
|
||||
|
||||
The same is the case for `BoolLiteral` and `StringLiteral`, only they build
|
||||
values of type `SafeExpr BoolType` and `SafeExpr StringType`, respectively.
|
||||
|
||||
The more complicated case is that of `BinOperation`. Put simply, it takes
|
||||
a binary function of type `a->b->c` (kind of), two `SafeExpr`s producing `a` and `b`,
|
||||
and combines the values of those expressions using the function to generate
|
||||
a value of type `c`. Since the whole expression returns `c`, `BinOperation`
|
||||
builds a value of type `SafeExpr c`.
|
||||
|
||||
That's almost it. Except, what's up with `repr`? We need it because `SafeExpr`
|
||||
is parameterized by a __value__ of type `ExprType`. Thus, `a`, `b`, and `c` are
|
||||
all values in the definition of `BinOperation`. However, in a function
|
||||
`input->output`, both `input` and `output` have to be __types__, not values.
|
||||
Thus, we define a function `repr` which converts values such as `IntType` into
|
||||
the actual type that `eval` would yield when running our expression:
|
||||
|
||||
{{< codelines "Idris" "typesafe-interpreter/TypesafeIntr.idr" 6 9 >}}
|
||||
|
||||
The power of dependent types allows us to run `repr` inside the type
|
||||
of `BinOp` to compute the type of the function it must accept.
|
Loading…
Reference in New Issue
Block a user