From b459e9cbfe530a8bd9eebf0d5561389e4b38a810 Mon Sep 17 00:00:00 2001 From: Danila Fedorin Date: Sun, 1 Nov 2020 23:56:55 -0800 Subject: [PATCH] Update typesafe imperative language post draft. --- code/typesafe-imperative/TypesafeImp.idr | 21 +-- content/blog/typesafe_imperative_lang.md | 172 ++++++++++++++++++++++- 2 files changed, 183 insertions(+), 10 deletions(-) diff --git a/code/typesafe-imperative/TypesafeImp.idr b/code/typesafe-imperative/TypesafeImp.idr index 59785b6..17f6426 100644 --- a/code/typesafe-imperative/TypesafeImp.idr +++ b/code/typesafe-imperative/TypesafeImp.idr @@ -2,20 +2,20 @@ data Reg = A | B | R data Ty = IntTy | BoolTy -RegState : Type -RegState = (Ty, Ty, Ty) +TypeState : Type +TypeState = (Ty, Ty, Ty) -getRegTy : Reg -> RegState -> Ty +getRegTy : Reg -> TypeState -> Ty getRegTy A (a, _, _) = a getRegTy B (_, b, _) = b getRegTy R (_, _, r) = r -setRegTy : Reg -> Ty -> RegState -> RegState +setRegTy : Reg -> Ty -> TypeState -> TypeState setRegTy A a (_, b, r) = (a, b, r) setRegTy B b (a, _, r) = (a, b, r) setRegTy R r (a, b, _) = (a, b, r) -data Expr : RegState -> Ty -> Type where +data Expr : TypeState -> Ty -> Type where Lit : Int -> Expr s IntTy Load : (r : Reg) -> Expr s (getRegTy r s) Add : Expr s IntTy -> Expr s IntTy -> Expr s IntTy @@ -23,17 +23,17 @@ data Expr : RegState -> Ty -> Type where Not : Expr s BoolTy -> Expr s BoolTy mutual - data Stmt : RegState -> RegState -> RegState -> Type where + data Stmt : TypeState -> TypeState -> TypeState -> Type where Store : (r : Reg) -> Expr s t -> Stmt l s (setRegTy r t s) If : Expr s BoolTy -> Prog l s n -> Prog l s n -> Stmt l s n Loop : Prog s s s -> Stmt l s s Break : Stmt s s s - data Prog : RegState -> RegState -> RegState -> Type where + data Prog : TypeState -> TypeState -> TypeState -> Type where Nil : Prog l s s (::) : Stmt l s n -> Prog l n m -> Prog l s m -initialState : RegState +initialState : TypeState initialState = (IntTy, IntTy, IntTy) testProg : Prog Main.initialState Main.initialState Main.initialState @@ -64,7 +64,7 @@ repr : Ty -> Type repr IntTy = Int repr BoolTy = Bool -data State : RegState -> Type where +data State : TypeState -> Type where MkState : (repr a, repr b, repr c) -> State (a, b, c) getReg : (r : Reg) -> State s -> repr (getRegTy r s) @@ -97,3 +97,6 @@ mutual prog : Prog l s n -> State s -> Either (State l) (State n) prog Nil s = Right s prog (st::p) s = stmt st s >>= prog p + +run : Prog l s l -> State s -> State l +run p s = either id id $ prog p s diff --git a/content/blog/typesafe_imperative_lang.md b/content/blog/typesafe_imperative_lang.md index 5991ff7..88a5e68 100644 --- a/content/blog/typesafe_imperative_lang.md +++ b/content/blog/typesafe_imperative_lang.md @@ -246,7 +246,7 @@ The following (type-correct) program compiles just fine: {{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 36 47 >}} -First, it loads a boolean (`True`, to be exact) into register `A`; then, +First, it loads a boolean into register `A`; then, inside the `if/else` statement, it stores an integer into `A`. Finally, it stores another integer into `B`, and adds them into `R`. Even though `A` was a boolean at first, the type checker can deduce that it @@ -307,3 +307,173 @@ Type mismatch between IntTy and BoolTy And so, we have an encoding of our language that allows registers to be either integers or booleans, while still preventing type-incorrect programs! + +### Building an Interpreter +A good test of such an encoding is the implementation +of an interpreter. It should be possible to convince the +typechecker that our interpreter doesn't need to +handle type errors in the toy language, since they +cannot occur. + +Let's start with something simple. First of all, suppose +we have an expression of type `Expr InTy`. In our toy +language, it produces an integer. Our interpreter, then, +will probably want to use Idris' type `Int`. Similarly, +an expression of type `Expr BoolTy` will produce a boolean +in our toy language, which in Idris we can represent using +the built-in `Bool` type. Despite the similar naming, though, +there's no connection between Idris' `Bool` and our own `BoolTy`. +We need to define a conversion from our own types -- which are +values of type `Ty` -- into Idris types that result from evaluating +expressions. We do so as follows: + +{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 63 65 >}} + +Similarly, we want to convert our `TypeState` (a tuple describing the _types_ +of our registers) into a tuple that actually holds the values of each +register, which we will call `State`. The value of each register at +any point depends on its type. My first thought was to define +`State` as a function from `TypeState` to an Idris `Type`: + +```Idris +State : TypeState -> Type +State (a, b, c) = (repr a, repr b, repr c) +``` + +Unfortunately, this doesn't quite cut it. The problem is that this +function technically doesn't give Idris any guarantees that `State` +will be a tuple. The most Idris knows is that `State` will be some +`Type`, which could be `Int`, `Bool`, or anything else! This +becomes a problem when we try to pattern match on states to get +the contents of a particular register. Instead, we have to define +a new data type: + +{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 67 68 >}} + +In this snippet, `State` is still a (type level) function from `TypeState` to `Type`. +However, by using a GADT, we guarantee that there's only one way to construct +a `State (a,b,c)`: using a corresponding tuple. Now, Idris will accept our +pattern matching: + +{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 70 78 >}} + +The `getReg` function will take out the value of the corresponding +register, returning `Int` or `Bool` depending on the `TypeState`. +What's important is that if the `TypeState` is known, then so +is the type of `getReg`: no `Either` is involved her, and we +can directly use the integer or boolean stored in the +register. This is exactly what we do: + +{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 80 85 >}} + +This is pretty concise. Idris knows that `Lit i` is of type `Expr IntTy`, +and it knows that `repr IntTy = Int`, so it also knows that +`eval (Lit i)` produces an `Int`. Similarly, we wrote +`Reg r` to have type `Expr s (getRegTy r s)`. Since `getReg` +returns `repr (getRegTy r s)`, things check out here, too. +A similar logic applies to the rest of the cases. + +The situation with statements is somewhat different. As we said, a statement +doesn't return a value; it changes state. A good initial guess would +be that to evaluate a statement that starts in state `s` and terminates in state `n`, +we would take as input `State s` and return `State n`. However, things are not +quite as simple, thanks to `Break`. Not only does `Break` require +special case logic to return control to the end of the `Loop`, but +it also requires some additional consideration: in a statement +of type `Stmt l s n`, evaluating `Break` can return `State l`. + +To implement this, we'll use the `Either` type. The `Left` constructor +will be contain the state at the time of evaluating a `Break`, +and will indicate to the interpreter that we're breaking out of a loop. +On the other hand, the `Right` constructor will contain the state +as produced by all other statements, which would be considered +{{< sidenote "right" "left-right-note" "the \"normal\" case." >}} +We use Left for the "abnormal" case because of +Idris' (and Haskell's) convention to use it as such. For +instance, the two languages define a Monad +instance for Either a where (>>=) +behaves very much like it does for Maybe, with +Left being treated as Nothing, +and Right as Just. We will +use this instance to clean up some of our computations. +{{< /sidenote >}} Note that this doesn't indicate an error: +we need to represent the two states (breaking out of a loop +and normal execution) to define our language's semantics. + +{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 88 95 >}} + +First, note the type. We return an `Either` value, which will +contain `State l` (in the `Left` constructor) if a `Break` +was evaluated, and `State n` (in the `Right` constructor) +if execution went on without breaking. + +The `Store` case is rather simple. We use `setReg` to update the result +of the register `r` with the result of evaluating `e`. Because +a store doesn't cause us to start breaking out of a loop, +we use `Right` to wrap the new state. + +The `If` case is also rather simple. Its condition is guaranteed +to evaluate to a boolean, so it's sufficient for us to use +Idris' `if` expression. We use the `prog` function here, which +implements the evaluation of a whole program. We'll get to it +momentarily. + +`Loop` is the most interesting case. We start by evaluating +the program `p` serving as the loop body. The result of this +computation will be either a state after a break, +held in `Left`, or as the normal execution state, held +in `Right`. The `(>>=)` operation will do nothing in +the first case, and feed the resulting (normal) state +to `stmt (Loop p)` in the second case. This is exactly +what we want: if we broke out of the current iteration +of the loop, we shouldn't continue on to the next iteration. +At the end of evaluating both `p` and the recursive call to +`stmt`, we'll either have exited normally, or via breaking +out. We don't want to continue breaking out further, +so we return the final state wrapped in `Right` in both cases. +Finally, `Break` returns the current state wrapped in `Left`, +beginning the process of breaking out. + +The task of `prog` is simply to sequence several statements +together. The monadic bind operator, `(>>=)`, is again perfect +for this task, since it "stops" when it hits a `Left`, but +continues otherwise. This is the implementation: + +{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 97 99 >}} + +Awesome! Let's try it out, shall we? I defined a quick `run` function +as follows: + +{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 101 102 >}} + +We then have: + +``` +*TypesafeImp> run prodProg (MkState (0,0,0)) +MkState (0, 9, 63) : State (IntTy, IntTy, IntTy) +``` + +Which seems correct! The program multiplies seven by nine, +and stops when register `A` reaches zero. Our test program +runs, too: + +``` +*TypesafeImp> run testProg (MkState (0,0,0)) +MkState (1, 2, 3) : State (IntTy, IntTy, IntTy) +``` + +This is the correct answer: `A` ends up being set to +`1` in the `then` branch of the conditional statement, +`B` is set to 2 right after, and `R`, the sum of `A` +and `B`, is rightly `3`. + +As you can see, we didn't have to write any error handling +code! This is because the typechecker _knows_ that type errors +aren't possible: our programs are guaranteed to be +{{< sidenote "right" "termination-note" "type correct." >}} +Our programs aren't guaranteed to terminate: +we're lucky that Idris' totality checker is turned off by default. +{{< /sidenote >}} + +This was a fun exercise, and I hope you enjoyed reading along! +I hope to see you in my future posts.