Update typesafe imperative language post draft.

2020-11-01 23:56:55 -08:00 · 2020-11-01 23:56:55 -08:00 · b459e9cbfe
commit b459e9cbfe
parent 52abe73ef7
2 changed files with 183 additions and 10 deletions
--- a/code/typesafe-imperative/TypesafeImp.idr
+++ b/code/typesafe-imperative/TypesafeImp.idr
@ -2,20 +2,20 @@ data Reg = A | B | R

 data Ty = IntTy | BoolTy

-RegState : Type
-RegState = (Ty, Ty, Ty)
+TypeState : Type
+TypeState = (Ty, Ty, Ty)

-getRegTy : Reg -> RegState -> Ty
+getRegTy : Reg -> TypeState -> Ty
 getRegTy A (a, _, _) = a
 getRegTy B (_, b, _) = b
 getRegTy R (_, _, r) = r

-setRegTy : Reg -> Ty -> RegState -> RegState
+setRegTy : Reg -> Ty -> TypeState -> TypeState
 setRegTy A a (_, b, r) = (a, b, r)
 setRegTy B b (a, _, r) = (a, b, r)
 setRegTy R r (a, b, _) = (a, b, r)

-data Expr : RegState -> Ty -> Type where
+data Expr : TypeState -> Ty -> Type where
  Lit : Int -> Expr s IntTy
  Load : (r : Reg) -> Expr s (getRegTy r s)
  Add : Expr s IntTy -> Expr s IntTy -> Expr s IntTy
@ -23,17 +23,17 @@ data Expr : RegState -> Ty -> Type where
  Not : Expr s BoolTy -> Expr s BoolTy

 mutual
-  data Stmt : RegState -> RegState -> RegState -> Type where
+  data Stmt : TypeState -> TypeState -> TypeState -> Type where
    Store : (r : Reg) -> Expr s t -> Stmt l s (setRegTy r t s)
    If : Expr s BoolTy -> Prog l s n -> Prog l s n -> Stmt l s n
    Loop : Prog s s s -> Stmt l s s
    Break : Stmt s s s

-  data Prog : RegState -> RegState -> RegState -> Type where
+  data Prog : TypeState -> TypeState -> TypeState -> Type where
    Nil : Prog l s s
    (::) : Stmt l s n -> Prog l n m -> Prog l s m

-initialState : RegState
+initialState : TypeState
 initialState = (IntTy, IntTy, IntTy)

 testProg : Prog Main.initialState Main.initialState Main.initialState
@ -64,7 +64,7 @@ repr : Ty -> Type
 repr IntTy = Int
 repr BoolTy = Bool

-data State : RegState -> Type where
+data State : TypeState -> Type where
  MkState : (repr a, repr b, repr c) -> State (a, b, c)

 getReg : (r : Reg) -> State s -> repr (getRegTy r s)
@ -97,3 +97,6 @@ mutual
  prog : Prog l s n -> State s -> Either (State l) (State n)
  prog Nil s = Right s
  prog (st::p) s = stmt st s >>= prog p
+
+run : Prog l s l -> State s -> State l
+run p s = either id id $ prog p s
--- a/content/blog/typesafe_imperative_lang.md
+++ b/content/blog/typesafe_imperative_lang.md
@ -246,7 +246,7 @@ The following (type-correct) program compiles just fine:

 {{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 36 47 >}}

-First, it loads a boolean (`True`, to be exact) into register `A`; then,
+First, it loads a boolean into register `A`; then,
 inside the `if/else` statement, it stores an integer into `A`. Finally,
 it stores another integer into `B`, and adds them into `R`. Even though
 `A` was a boolean at first, the type checker can deduce that it
@ -307,3 +307,173 @@ Type mismatch between IntTy and BoolTy
 And so, we have an encoding of our language that allows registers to
 be either integers or booleans, while still preventing
 type-incorrect programs!
+
+### Building an Interpreter
+A good test of such an encoding is the implementation
+of an interpreter. It should be possible to convince the
+typechecker that our interpreter doesn't need to
+handle type errors in the toy language, since they
+cannot occur.
+
+Let's start with something simple. First of all, suppose
+we have an expression of type `Expr InTy`. In our toy
+language, it produces an integer. Our interpreter, then,
+will probably want to use Idris' type `Int`. Similarly,
+an expression of type `Expr BoolTy` will produce a boolean
+in our toy language, which in Idris we can represent using
+the built-in `Bool` type. Despite the similar naming, though,
+there's no connection between Idris' `Bool` and our own `BoolTy`.
+We need to define a conversion from our own types -- which are
+values of type `Ty` -- into Idris types that result from evaluating
+expressions. We do so as follows:
+
+{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 63 65 >}}
+
+Similarly, we want to convert our `TypeState` (a tuple describing the _types_
+of our registers) into a tuple that actually holds the values of each
+register, which we will call `State`. The value of each register at
+any point depends on its type. My first thought was to define
+`State` as a function from `TypeState` to an Idris `Type`:
+
+```Idris
+State : TypeState -> Type
+State (a, b, c) = (repr a, repr b, repr c)
+```
+
+Unfortunately, this doesn't quite cut it. The problem is that this
+function technically doesn't give Idris any guarantees that `State`
+will be a tuple. The most Idris knows is that `State` will be some
+`Type`, which could be `Int`, `Bool`, or anything else! This
+becomes a problem when we try to pattern match on states to get
+the contents of a particular register. Instead, we have to define
+a new data type:
+
+{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 67 68 >}}
+
+In this snippet, `State` is still a (type level) function from `TypeState` to `Type`.
+However, by using a GADT, we guarantee that there's only one way to construct
+a `State (a,b,c)`: using a corresponding tuple. Now, Idris will accept our
+pattern matching:
+
+{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 70 78 >}}
+
+The `getReg` function will take out the value of the corresponding
+register, returning `Int` or `Bool` depending on the `TypeState`.
+What's important is that if the `TypeState` is known, then so
+is the type of `getReg`: no `Either` is involved her, and we
+can directly use the integer or boolean stored in the
+register. This is exactly what we do:
+
+{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 80 85 >}}
+
+This is pretty concise. Idris knows that `Lit i` is of type `Expr IntTy`,
+and it knows that `repr IntTy = Int`, so it also knows that
+`eval (Lit i)` produces an `Int`. Similarly, we wrote
+`Reg r` to have type `Expr s (getRegTy r s)`. Since `getReg`
+returns `repr (getRegTy r s)`, things check out here, too.
+A similar logic applies to the rest of the cases.
+
+The situation with statements is somewhat different. As we said, a statement
+doesn't return a value; it changes state. A good initial guess would
+be that to evaluate a statement that starts in state `s` and terminates in state `n`,
+we would take as input `State s` and return `State n`. However, things are not
+quite as simple, thanks to `Break`. Not only does `Break` require
+special case logic to return control to the end of the `Loop`, but
+it also requires some additional consideration: in a statement
+of type `Stmt l s n`, evaluating `Break` can return `State l`.
+
+To implement this, we'll use the `Either` type. The `Left` constructor
+will be contain the state at the time of evaluating a `Break`,
+and will indicate to the interpreter that we're breaking out of a loop.
+On the other hand, the `Right` constructor will contain the state
+as produced by all other statements, which would be considered
+{{< sidenote "right" "left-right-note" "the \"normal\" case." >}}
+We use <code>Left</code> for the "abnormal" case because of
+Idris' (and Haskell's) convention to use it as such. For
+instance, the two languages define a <code>Monad</code>
+instance for <code>Either a</code> where <code>(>>=)</code>
+behaves very much like it does for <code>Maybe</code>, with
+<code>Left</code> being treated as <code>Nothing</code>,
+and <code>Right</code> as <code>Just</code>. We will
+use this instance to clean up some of our computations.
+{{< /sidenote >}} Note that this doesn't indicate an error:
+we need to represent the two states (breaking out of a loop
+and normal execution) to define our language's semantics.
+
+{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 88 95 >}}
+
+First, note the type. We return an `Either` value, which will
+contain `State l` (in the `Left` constructor) if a `Break`
+was evaluated, and `State n` (in the `Right` constructor)
+if execution went on without breaking.
+
+The `Store` case is rather simple. We use `setReg` to update the result
+of the register `r` with the result of evaluating `e`. Because
+a store doesn't cause us to start breaking out of a loop,
+we use `Right` to wrap the new state.
+
+The `If` case is also rather simple. Its condition is guaranteed
+to evaluate to a boolean, so it's sufficient for us to use
+Idris' `if` expression. We use the `prog` function here, which
+implements the evaluation of a whole program. We'll get to it
+momentarily.
+
+`Loop` is the most interesting case. We start by evaluating
+the program `p` serving as the loop body. The result of this
+computation will be either a state after a break,
+held in `Left`, or as the normal execution state, held
+in `Right`. The `(>>=)` operation will do nothing in
+the first case, and feed the resulting (normal) state
+to `stmt (Loop p)` in the second case. This is exactly
+what we want: if we broke out of the current iteration
+of the loop, we shouldn't continue on to the next iteration.
+At the end of evaluating both `p` and the recursive call to
+`stmt`, we'll either have exited normally, or via breaking
+out. We don't want to continue breaking out further,
+so we return the final state wrapped in `Right` in both cases.
+Finally, `Break` returns the current state wrapped in `Left`,
+beginning the process of breaking out.
+
+The task of `prog` is simply to sequence several statements
+together. The monadic bind operator, `(>>=)`, is again perfect
+for this task, since it "stops" when it hits a `Left`, but
+continues otherwise. This is the implementation:
+
+{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 97 99 >}}
+
+Awesome! Let's try it out, shall we? I defined a quick `run` function
+as follows:
+
+{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 101 102 >}}
+
+We then have:
+
+```
+*TypesafeImp> run prodProg (MkState (0,0,0))
+MkState (0, 9, 63) : State (IntTy, IntTy, IntTy)
+```
+
+Which seems correct! The program multiplies seven by nine,
+and stops when register `A` reaches zero. Our test program
+runs, too:
+
+```
+*TypesafeImp> run testProg (MkState (0,0,0))
+MkState (1, 2, 3) : State (IntTy, IntTy, IntTy)
+```
+
+This is the correct answer: `A` ends up being set to
+`1` in the `then` branch of the conditional statement,
+`B` is set to 2 right after, and `R`, the sum of `A`
+and `B`, is rightly `3`.
+
+As you can see, we didn't have to write any error handling
+code! This is because the typechecker _knows_ that type errors
+aren't possible: our programs are guaranteed to be
+{{< sidenote "right" "termination-note" "type correct." >}}
+Our programs <em>aren't</em> guaranteed to terminate:
+we're lucky that Idris' totality checker is turned off by default.
+{{< /sidenote >}}
+
+This was a fun exercise, and I hope you enjoyed reading along!
+I hope to see you in my future posts.