Make the typesafe imperative language work properly.

2020-10-31 01:34:23 -07:00
parent f0fe481bcf
commit 52abe73ef7
2 changed files with 193 additions and 52 deletions
--- a/code/typesafe-imperative/TypesafeImp.idr
+++ b/code/typesafe-imperative/TypesafeImp.idr
@@ -23,20 +23,20 @@ data Expr : RegState -> Ty -> Type where
  Not : Expr s BoolTy -> Expr s BoolTy

 mutual
-  data Stmt : RegState -> RegState -> Type where
-    Store : (r : Reg) -> Expr s t -> Stmt s (setRegTy r t s)
-    If : Expr s BoolTy -> Prog s n -> Prog s n -> Stmt s n
-    Loop : Prog s s -> Stmt s s
+  data Stmt : RegState -> RegState -> RegState -> Type where
+    Store : (r : Reg) -> Expr s t -> Stmt l s (setRegTy r t s)
+    If : Expr s BoolTy -> Prog l s n -> Prog l s n -> Stmt l s n
+    Loop : Prog s s s -> Stmt l s s
+    Break : Stmt s s s

-  data Prog : RegState -> RegState -> Type where
-    Nil : Prog s s
-    Break : Prog s s
-    (::) : Stmt s n -> Prog n m -> Prog s m
+  data Prog : RegState -> RegState -> RegState -> Type where
+    Nil : Prog l s s
+    (::) : Stmt l s n -> Prog l n m -> Prog l s m

 initialState : RegState
 initialState = (IntTy, IntTy, IntTy)

-testProg : Prog Main.initialState Main.initialState
+testProg : Prog Main.initialState Main.initialState Main.initialState
 testProg =
  [ Store A (Lit 1 `Leq` Lit 2)
  , If (Load A)
@@ -45,3 +45,55 @@ testProg =
  , Store B (Lit 2)
  , Store R (Add (Load A) (Load B))
  ]
+
+prodProg : Prog Main.initialState Main.initialState Main.initialState
+prodProg =
+  [ Store A (Lit 7)
+  , Store B (Lit 9)
+  , Store R (Lit 0)
+  , Loop
+    [ If (Load A `Leq` Lit 0)
+      [ Break ]
+      [ Store R (Load R `Add` Load B)
+      , Store A (Load A `Add` Lit (-1))
+      ]
+    ]
+  ]
+
+repr : Ty -> Type
+repr IntTy = Int
+repr BoolTy = Bool
+
+data State : RegState -> Type where
+  MkState : (repr a, repr b, repr c) -> State (a, b, c)
+
+getReg : (r : Reg) -> State s -> repr (getRegTy r s)
+getReg A (MkState (a, _, _)) = a
+getReg B (MkState (_, b, _)) = b
+getReg R (MkState (_, _, r)) = r
+
+setReg : (r : Reg) -> repr t -> State s -> State (setRegTy r t s)
+setReg A a (MkState (_, b, r)) = MkState (a, b, r)
+setReg B b (MkState (a, _, r)) = MkState (a, b, r)
+setReg R r (MkState (a, b, _)) = MkState (a, b, r)
+
+expr : Expr s t -> State s -> repr t
+expr (Lit i) _ = i
+expr (Load r) s = getReg r s
+expr (Add l r) s = expr l s + expr r s
+expr (Leq l r) s = expr l s <= expr r s
+expr (Not e) s = not $ expr e s
+
+mutual
+  stmt : Stmt l s n -> State s -> Either (State l) (State n)
+  stmt (Store r e) s = Right $ setReg r (expr e s) s
+  stmt (If c t e) s = if expr c s then prog t s else prog e s
+  stmt (Loop p) s =
+    case prog p s >>= stmt (Loop p) of
+      Right s => Right s
+      Left s => Right s
+  stmt Break s = Left s
+
+  prog : Prog l s n -> State s -> Either (State l) (State n)
+  prog Nil s = Right s
+  prog (st::p) s = stmt st s >>= prog p
--- a/content/blog/typesafe_imperative_lang.md
+++ b/content/blog/typesafe_imperative_lang.md
@@ -49,7 +49,7 @@ representation then, wouldn't it?
 I am pretty certain that a similar encoding in Haskell is possible. However,
 Haskell wasn't originally created for that kind of abuse of its type system,
 so it would probably not look very good.
-{{< /sidenote >}} I am pretty certain that it _is_ possible to do this
+{{< /sidenote >}} I am sure that it _is_ possible to do this
 in Idris, a dependently typed programming language. In this post I will
 talk about how to do that.

@@ -109,12 +109,12 @@ types of each register. To do so, we add the state as a parameter to
 our `Expr` type. This would lead to types like the following:

 ```Idris
-- An expression that produces a boolean
-- when all the registers are integers.
+-- An expression that produces a boolean when all the registers
+-- are integers.
 Expr (IntTy, IntTy, IntTy) BoolTy

-- An expression that produces an integer
-- when A and B are integers, and R is a boolean.
+-- An expression that produces an integer when A and B are integers,
+-- and R is a boolean.
 Expr (IntTy, IntTy, BoolTy) IntTy
 ```

@@ -129,54 +129,115 @@ creates an expression that has `r`'s type in the current state `s`.
 Statements are a bit different. Unlike expressions, they don't evaluate to
 anything; rather, they do something. That "something" may very well be changing
 the current state. We could, for instance, set `A` to be a boolean, while it was
-previously an integer. So, the `Stmt` type will take two arguments: the initial
-state and the final state. This leads to the following definition:
+previously an integer. This suggests equipping our `Stmt` type with two
+arguments: the initial state (before the statement's execution), and the final
+state (after the statement's execution). This would lead to types like this:

-{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 26 29 >}}
+```Idris
+-- Statement that, when run while all registers contain integers,
+-- terminates with registers B and R having been assigned boolean values.
+Stmt (IntTy, IntTy, IntTy) (IntTy, BoolTy, BoolTy)
+```
+
+However, there's a problem with `loop` and `break`. When we run a loop,
+we will require that the state at the end of one iteration is the
+same as the state at its beginning. Otherwise, it would be possible
+for a loop to keep changing the types of registers every iteration,
+and it would become impossible for us to infer the final state
+without actually running the program. In itself, this restriction
+isn't a problem; most static type systems require both branches
+of an `if/else` expression to be of the same type for a similar
+reason. The problem comes from the interaction with `break`.
+
+By itself, the would-be type of `break` seems innocent enough. It
+doesn't change any registers, so we could call it `Stmt s s`.
+But consider the following program:
+
+```
+A := 0
+B := 0
+R := 0
+do
+  if 5 <= A then
+    B := 1 <= 1
+    break
+    B := 0
+  else
+    A := A + 1
+  end
+end
+```
+
+The loop starts with all registers having integer values.
+As per our aforementioned loop requirement, the body
+of the loop must terminate with all registers _still_ having
+integer values. For the first five iterations that's exactly
+what will happen. However, after we increment `A` the fifth time,
+we will set `B` to a boolean value -- using a valid statement --
+and then `break`. The `break` statement will be accepted by
+the typechecker, and so will the whole `then` branch. After all,
+it seems as though we reset `B` back to an integer value.
+But that won't be the case. We will have jumped to the end
+of the loop, where we are expected to have an all-integer type,
+which we will not have.
+
+The solution I came up with to address this issue was to
+add a _third_ argument to `Stmt`, which contains the "context"
+type. That is, it contains the type of the innermost loop surrounding
+the statement. A `break` statement would only be permissible
+if the current type matches the loop type. With this, we finally
+write down a definition of `Stmt`:
+
+{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 26 30 >}}

 The `Store` constructor takes a register `r` and an expression producing some type `t` in state `s`.
 From these, it creates a statement that starts in `s`, and finishes
-in a state similar to `s`, but with `r` now having type `t`.
+in a state similar to `s`, but with `r` now having type `t`. The loop
+type `l` remains unaffected and unused; we are free to assign any register
+any value.

-The `If` constructor takes a condition, which starts in state `s` and _must_ produce
+The `If` constructor takes a condition `Expr`, which starts in state `s` and _must_ produce
 a boolean. It also takes two programs (sequences of statements), each of which
-start in `s` and finishes in another state `n`. Then, the `If` constructor creates
+starts in `s` and finishes in another state `n`. This results in
 a statement that starts in state `s`, and finishes in state `n`. Conceptually,
 each branch of the `if/else` statement must result in the same final state (in terms of types);
 otherwise, we wouldn't know which of the states to pick when deciding the final
-state of the `If` itself.
+state of the `If` itself. As with `Store`, the loop type `l` is untouched here.
+Individual statements are free to modify the state however they wish.

-The `Loop` constructor is even more restrictive: it takes a single program (the sequence
-of instructions that it will be repeating). This program starts _and_ ends in state `s`;
-since the loop can repeat many times, and since we're repeating the same program,
-we want to make sure that program is always run from the same initial state.
+The `Loop` constructor is very restrictive. It takes a single program (the sequence
+of instructions that it will be repeating). As we discussed above, this program
+must start _and_ end in the same state `s`. Furthermore, this program's loop
+type must also be `s`, since the loop we're constructing will be surrounding the
+program. The resulting loop itself still has an arbitrary loop type `l`, since
+it doesn't surround itself.

-I chose __not__ to encode `Break` as a statement. This is because we don't want `Break`s occurring
-in the middle of a program! Otherwise, it would be possible to write a program
-that _seems_ like it will terminate in one state, but, because of a break in the middle,
-terminates in another! Instead, we'll encode `Break` as a part of the `Prog` encoding.
+Finally, `Break` can only be constructed when the loop state matches the current
+state. Since we'll be jumping to the end of the innermost loop, the final state
+is also the same as the loop state. 
+
+These are all the constructors we'll be needing. It's time to move on to
+whole programs!

 ### Programs
-A program is basically a list of statements. However, we can't use a regular Idris list for two
-reasons:
+A program is simply a list of statements. However, we can't use a regular Idris list,
+because a regular list wouldn't be able to represent the relationship between
+each successive statement. In our program, we want the final state of one
+statement to be the initial state of the following one, since they'll
+be executed in sequence. To represent this, we have to define our own
+list-like GADT. The definition of the type turns out fairly straightforward:

-1. Our type is not as simple as `[Stmt]`. We want each statement to begin in the state that the
-previous statement ended in; we will have to do some work to ensure that.
-2. We have two ways of ending the sequence of statements: either with or without a `break`.
-Thus, instead of having a single `Nil` constructor, we'll have two.
-
-The definition of the type turns out fairly straightforward:
-
-{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 31 34 >}}
+{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 32 34 >}}

 The `Nil` constructor represents an empty program (much like the built-in `Nil` represents an empty list).
 Since no actions are done, it creates a `Prog` that starts and ends in the same state: `s`.
-The `Break` constructor is similar; however, it represents a `break` instruction, and thus,
-must be distinct from the regular `End` constructor. Finally, the `(::)` constructor, much like
-the built-in `(::)` constructor, takes a statement and another program. The statement begins in
-state `s` and ends in state `n`; the program after that statement must then start in state `n`,
-and end in some other state `m`. The combination of the statement and the program starts in state `s`,
-and finishes in state `m`; thus, `(::)` yields `Prog s m`.
+The `(::)` constructor, much like the built-in `(::)` constructor, takes a statement
+and another program. The statement begins in state `s` and ends in state `n`; the program after
+that statement must then start in state `n`, and end in some other state `m`.
+The combination of the statement and the program starts in state `s`, and finishes in state `m`.
+Thus, `(::)` yields `Prog s m`. None of the constructors affect the loop type `l`: we
+are free to sequence any statements that we want, and it is impossible for us
+to construct statements using `l` that cause runtime errors.

 This should be all! Let's try out some programs.

@@ -192,7 +253,7 @@ it stores another integer into `B`, and adds them into `R`. Even though
 was reset back to an integer after the `if/else`, and the program is accepted.
 On the other hand, had we forgotten to set `A` to a boolean first:

-```
+```Idris
  [ If (Load A)
    [ Store A (Lit 1) ]
    [ Store A (Lit 2) ]
@@ -204,17 +265,45 @@ On the other hand, had we forgotten to set `A` to a boolean first:
 We would get a type error:

 ```
-Type mismatch between
-        getRegTy A (IntTy, IntTy, IntTy)
-and
-        BoolTy
+Type mismatch between getRegTy A (IntTy, IntTy, IntTy) and BoolTy
 ```

 The type of register `A` (that is, `IntTy`) is incompatible
 with `BoolTy`. Our `initialState` says that `A` starts out as
 an integer, so it can't be used in an `if/else` right away!
 Similar errors occur if we make one of the branches of
-the `if/else` empty, or if we set `B` to a boolean. And so,
-we have an encoding of our language that allows registers to
+the `if/else` empty, or if we set `B` to a boolean.
+
+We can also encode the example program from the beginning
+of this post:
+
+{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 49 61 >}}
+
+This program compiles just fine, too! It is a little reminiscent of
+the program we used to demonstrate how `break` could break things
+if we weren't careful. So, let's go ahead and try `break` in an invalid
+state:
+
+```Idris
+  [ Store A (Lit 7)
+  , Store B (Lit 9)
+  , Store R (Lit 0)
+  , Loop
+    [ If (Load A `Leq` Lit 0)
+      [ Store B (Lit 1 `Leq` Lit 1), Break, Store B (Lit 0) ]
+      [ Store R (Load R `Add` Load B)
+      , Store A (Load A `Add` Lit (-1))
+      ]
+    ]
+  ]
+```
+
+Again, the type checker complains:
+
+```
+Type mismatch between IntTy and BoolTy
+```
+
+And so, we have an encoding of our language that allows registers to
 be either integers or booleans, while still preventing
 type-incorrect programs!