diff --git a/code/typesafe-imperative/TypesafeImp.idr b/code/typesafe-imperative/TypesafeImp.idr index b40ab30..59785b6 100644 --- a/code/typesafe-imperative/TypesafeImp.idr +++ b/code/typesafe-imperative/TypesafeImp.idr @@ -23,20 +23,20 @@ data Expr : RegState -> Ty -> Type where Not : Expr s BoolTy -> Expr s BoolTy mutual - data Stmt : RegState -> RegState -> Type where - Store : (r : Reg) -> Expr s t -> Stmt s (setRegTy r t s) - If : Expr s BoolTy -> Prog s n -> Prog s n -> Stmt s n - Loop : Prog s s -> Stmt s s + data Stmt : RegState -> RegState -> RegState -> Type where + Store : (r : Reg) -> Expr s t -> Stmt l s (setRegTy r t s) + If : Expr s BoolTy -> Prog l s n -> Prog l s n -> Stmt l s n + Loop : Prog s s s -> Stmt l s s + Break : Stmt s s s - data Prog : RegState -> RegState -> Type where - Nil : Prog s s - Break : Prog s s - (::) : Stmt s n -> Prog n m -> Prog s m + data Prog : RegState -> RegState -> RegState -> Type where + Nil : Prog l s s + (::) : Stmt l s n -> Prog l n m -> Prog l s m initialState : RegState initialState = (IntTy, IntTy, IntTy) -testProg : Prog Main.initialState Main.initialState +testProg : Prog Main.initialState Main.initialState Main.initialState testProg = [ Store A (Lit 1 `Leq` Lit 2) , If (Load A) @@ -45,3 +45,55 @@ testProg = , Store B (Lit 2) , Store R (Add (Load A) (Load B)) ] + +prodProg : Prog Main.initialState Main.initialState Main.initialState +prodProg = + [ Store A (Lit 7) + , Store B (Lit 9) + , Store R (Lit 0) + , Loop + [ If (Load A `Leq` Lit 0) + [ Break ] + [ Store R (Load R `Add` Load B) + , Store A (Load A `Add` Lit (-1)) + ] + ] + ] + +repr : Ty -> Type +repr IntTy = Int +repr BoolTy = Bool + +data State : RegState -> Type where + MkState : (repr a, repr b, repr c) -> State (a, b, c) + +getReg : (r : Reg) -> State s -> repr (getRegTy r s) +getReg A (MkState (a, _, _)) = a +getReg B (MkState (_, b, _)) = b +getReg R (MkState (_, _, r)) = r + +setReg : (r : Reg) -> repr t -> State s -> State (setRegTy r t s) +setReg A a (MkState (_, b, r)) = MkState (a, b, r) +setReg B b (MkState (a, _, r)) = MkState (a, b, r) +setReg R r (MkState (a, b, _)) = MkState (a, b, r) + +expr : Expr s t -> State s -> repr t +expr (Lit i) _ = i +expr (Load r) s = getReg r s +expr (Add l r) s = expr l s + expr r s +expr (Leq l r) s = expr l s <= expr r s +expr (Not e) s = not $ expr e s + +mutual + stmt : Stmt l s n -> State s -> Either (State l) (State n) + stmt (Store r e) s = Right $ setReg r (expr e s) s + stmt (If c t e) s = if expr c s then prog t s else prog e s + stmt (Loop p) s = + case prog p s >>= stmt (Loop p) of + Right s => Right s + Left s => Right s + stmt Break s = Left s + + prog : Prog l s n -> State s -> Either (State l) (State n) + prog Nil s = Right s + prog (st::p) s = stmt st s >>= prog p diff --git a/content/blog/typesafe_imperative_lang.md b/content/blog/typesafe_imperative_lang.md index fc793c3..5991ff7 100644 --- a/content/blog/typesafe_imperative_lang.md +++ b/content/blog/typesafe_imperative_lang.md @@ -49,7 +49,7 @@ representation then, wouldn't it? I am pretty certain that a similar encoding in Haskell is possible. However, Haskell wasn't originally created for that kind of abuse of its type system, so it would probably not look very good. -{{< /sidenote >}} I am pretty certain that it _is_ possible to do this +{{< /sidenote >}} I am sure that it _is_ possible to do this in Idris, a dependently typed programming language. In this post I will talk about how to do that. @@ -109,12 +109,12 @@ types of each register. To do so, we add the state as a parameter to our `Expr` type. This would lead to types like the following: ```Idris --- An expression that produces a boolean --- when all the registers are integers. +-- An expression that produces a boolean when all the registers +-- are integers. Expr (IntTy, IntTy, IntTy) BoolTy --- An expression that produces an integer --- when A and B are integers, and R is a boolean. +-- An expression that produces an integer when A and B are integers, +-- and R is a boolean. Expr (IntTy, IntTy, BoolTy) IntTy ``` @@ -129,54 +129,115 @@ creates an expression that has `r`'s type in the current state `s`. Statements are a bit different. Unlike expressions, they don't evaluate to anything; rather, they do something. That "something" may very well be changing the current state. We could, for instance, set `A` to be a boolean, while it was -previously an integer. So, the `Stmt` type will take two arguments: the initial -state and the final state. This leads to the following definition: +previously an integer. This suggests equipping our `Stmt` type with two +arguments: the initial state (before the statement's execution), and the final +state (after the statement's execution). This would lead to types like this: -{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 26 29 >}} +```Idris +-- Statement that, when run while all registers contain integers, +-- terminates with registers B and R having been assigned boolean values. +Stmt (IntTy, IntTy, IntTy) (IntTy, BoolTy, BoolTy) +``` + +However, there's a problem with `loop` and `break`. When we run a loop, +we will require that the state at the end of one iteration is the +same as the state at its beginning. Otherwise, it would be possible +for a loop to keep changing the types of registers every iteration, +and it would become impossible for us to infer the final state +without actually running the program. In itself, this restriction +isn't a problem; most static type systems require both branches +of an `if/else` expression to be of the same type for a similar +reason. The problem comes from the interaction with `break`. + +By itself, the would-be type of `break` seems innocent enough. It +doesn't change any registers, so we could call it `Stmt s s`. +But consider the following program: + +``` +A := 0 +B := 0 +R := 0 +do + if 5 <= A then + B := 1 <= 1 + break + B := 0 + else + A := A + 1 + end +end +``` + +The loop starts with all registers having integer values. +As per our aforementioned loop requirement, the body +of the loop must terminate with all registers _still_ having +integer values. For the first five iterations that's exactly +what will happen. However, after we increment `A` the fifth time, +we will set `B` to a boolean value -- using a valid statement -- +and then `break`. The `break` statement will be accepted by +the typechecker, and so will the whole `then` branch. After all, +it seems as though we reset `B` back to an integer value. +But that won't be the case. We will have jumped to the end +of the loop, where we are expected to have an all-integer type, +which we will not have. + +The solution I came up with to address this issue was to +add a _third_ argument to `Stmt`, which contains the "context" +type. That is, it contains the type of the innermost loop surrounding +the statement. A `break` statement would only be permissible +if the current type matches the loop type. With this, we finally +write down a definition of `Stmt`: + +{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 26 30 >}} The `Store` constructor takes a register `r` and an expression producing some type `t` in state `s`. From these, it creates a statement that starts in `s`, and finishes -in a state similar to `s`, but with `r` now having type `t`. +in a state similar to `s`, but with `r` now having type `t`. The loop +type `l` remains unaffected and unused; we are free to assign any register +any value. -The `If` constructor takes a condition, which starts in state `s` and _must_ produce +The `If` constructor takes a condition `Expr`, which starts in state `s` and _must_ produce a boolean. It also takes two programs (sequences of statements), each of which -start in `s` and finishes in another state `n`. Then, the `If` constructor creates +starts in `s` and finishes in another state `n`. This results in a statement that starts in state `s`, and finishes in state `n`. Conceptually, each branch of the `if/else` statement must result in the same final state (in terms of types); otherwise, we wouldn't know which of the states to pick when deciding the final -state of the `If` itself. +state of the `If` itself. As with `Store`, the loop type `l` is untouched here. +Individual statements are free to modify the state however they wish. -The `Loop` constructor is even more restrictive: it takes a single program (the sequence -of instructions that it will be repeating). This program starts _and_ ends in state `s`; -since the loop can repeat many times, and since we're repeating the same program, -we want to make sure that program is always run from the same initial state. +The `Loop` constructor is very restrictive. It takes a single program (the sequence +of instructions that it will be repeating). As we discussed above, this program +must start _and_ end in the same state `s`. Furthermore, this program's loop +type must also be `s`, since the loop we're constructing will be surrounding the +program. The resulting loop itself still has an arbitrary loop type `l`, since +it doesn't surround itself. -I chose __not__ to encode `Break` as a statement. This is because we don't want `Break`s occurring -in the middle of a program! Otherwise, it would be possible to write a program -that _seems_ like it will terminate in one state, but, because of a break in the middle, -terminates in another! Instead, we'll encode `Break` as a part of the `Prog` encoding. +Finally, `Break` can only be constructed when the loop state matches the current +state. Since we'll be jumping to the end of the innermost loop, the final state +is also the same as the loop state. + +These are all the constructors we'll be needing. It's time to move on to +whole programs! ### Programs -A program is basically a list of statements. However, we can't use a regular Idris list for two -reasons: +A program is simply a list of statements. However, we can't use a regular Idris list, +because a regular list wouldn't be able to represent the relationship between +each successive statement. In our program, we want the final state of one +statement to be the initial state of the following one, since they'll +be executed in sequence. To represent this, we have to define our own +list-like GADT. The definition of the type turns out fairly straightforward: -1. Our type is not as simple as `[Stmt]`. We want each statement to begin in the state that the -previous statement ended in; we will have to do some work to ensure that. -2. We have two ways of ending the sequence of statements: either with or without a `break`. -Thus, instead of having a single `Nil` constructor, we'll have two. - -The definition of the type turns out fairly straightforward: - -{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 31 34 >}} +{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 32 34 >}} The `Nil` constructor represents an empty program (much like the built-in `Nil` represents an empty list). Since no actions are done, it creates a `Prog` that starts and ends in the same state: `s`. -The `Break` constructor is similar; however, it represents a `break` instruction, and thus, -must be distinct from the regular `End` constructor. Finally, the `(::)` constructor, much like -the built-in `(::)` constructor, takes a statement and another program. The statement begins in -state `s` and ends in state `n`; the program after that statement must then start in state `n`, -and end in some other state `m`. The combination of the statement and the program starts in state `s`, -and finishes in state `m`; thus, `(::)` yields `Prog s m`. +The `(::)` constructor, much like the built-in `(::)` constructor, takes a statement +and another program. The statement begins in state `s` and ends in state `n`; the program after +that statement must then start in state `n`, and end in some other state `m`. +The combination of the statement and the program starts in state `s`, and finishes in state `m`. +Thus, `(::)` yields `Prog s m`. None of the constructors affect the loop type `l`: we +are free to sequence any statements that we want, and it is impossible for us +to construct statements using `l` that cause runtime errors. This should be all! Let's try out some programs. @@ -192,7 +253,7 @@ it stores another integer into `B`, and adds them into `R`. Even though was reset back to an integer after the `if/else`, and the program is accepted. On the other hand, had we forgotten to set `A` to a boolean first: -``` +```Idris [ If (Load A) [ Store A (Lit 1) ] [ Store A (Lit 2) ] @@ -204,17 +265,45 @@ On the other hand, had we forgotten to set `A` to a boolean first: We would get a type error: ``` -Type mismatch between - getRegTy A (IntTy, IntTy, IntTy) -and - BoolTy +Type mismatch between getRegTy A (IntTy, IntTy, IntTy) and BoolTy ``` The type of register `A` (that is, `IntTy`) is incompatible with `BoolTy`. Our `initialState` says that `A` starts out as an integer, so it can't be used in an `if/else` right away! Similar errors occur if we make one of the branches of -the `if/else` empty, or if we set `B` to a boolean. And so, -we have an encoding of our language that allows registers to +the `if/else` empty, or if we set `B` to a boolean. + +We can also encode the example program from the beginning +of this post: + +{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 49 61 >}} + +This program compiles just fine, too! It is a little reminiscent of +the program we used to demonstrate how `break` could break things +if we weren't careful. So, let's go ahead and try `break` in an invalid +state: + +```Idris + [ Store A (Lit 7) + , Store B (Lit 9) + , Store R (Lit 0) + , Loop + [ If (Load A `Leq` Lit 0) + [ Store B (Lit 1 `Leq` Lit 1), Break, Store B (Lit 0) ] + [ Store R (Load R `Add` Load B) + , Store A (Load A `Add` Lit (-1)) + ] + ] + ] +``` + +Again, the type checker complains: + +``` +Type mismatch between IntTy and BoolTy +``` + +And so, we have an encoding of our language that allows registers to be either integers or booleans, while still preventing type-incorrect programs!