Update typesafe imperative language post draft.

This commit is contained in:
Danila Fedorin 2020-11-01 23:56:55 -08:00
parent 52abe73ef7
commit b459e9cbfe
2 changed files with 183 additions and 10 deletions

View File

@ -2,20 +2,20 @@ data Reg = A | B | R
data Ty = IntTy | BoolTy
RegState : Type
RegState = (Ty, Ty, Ty)
TypeState : Type
TypeState = (Ty, Ty, Ty)
getRegTy : Reg -> RegState -> Ty
getRegTy : Reg -> TypeState -> Ty
getRegTy A (a, _, _) = a
getRegTy B (_, b, _) = b
getRegTy R (_, _, r) = r
setRegTy : Reg -> Ty -> RegState -> RegState
setRegTy : Reg -> Ty -> TypeState -> TypeState
setRegTy A a (_, b, r) = (a, b, r)
setRegTy B b (a, _, r) = (a, b, r)
setRegTy R r (a, b, _) = (a, b, r)
data Expr : RegState -> Ty -> Type where
data Expr : TypeState -> Ty -> Type where
Lit : Int -> Expr s IntTy
Load : (r : Reg) -> Expr s (getRegTy r s)
Add : Expr s IntTy -> Expr s IntTy -> Expr s IntTy
@ -23,17 +23,17 @@ data Expr : RegState -> Ty -> Type where
Not : Expr s BoolTy -> Expr s BoolTy
mutual
data Stmt : RegState -> RegState -> RegState -> Type where
data Stmt : TypeState -> TypeState -> TypeState -> Type where
Store : (r : Reg) -> Expr s t -> Stmt l s (setRegTy r t s)
If : Expr s BoolTy -> Prog l s n -> Prog l s n -> Stmt l s n
Loop : Prog s s s -> Stmt l s s
Break : Stmt s s s
data Prog : RegState -> RegState -> RegState -> Type where
data Prog : TypeState -> TypeState -> TypeState -> Type where
Nil : Prog l s s
(::) : Stmt l s n -> Prog l n m -> Prog l s m
initialState : RegState
initialState : TypeState
initialState = (IntTy, IntTy, IntTy)
testProg : Prog Main.initialState Main.initialState Main.initialState
@ -64,7 +64,7 @@ repr : Ty -> Type
repr IntTy = Int
repr BoolTy = Bool
data State : RegState -> Type where
data State : TypeState -> Type where
MkState : (repr a, repr b, repr c) -> State (a, b, c)
getReg : (r : Reg) -> State s -> repr (getRegTy r s)
@ -97,3 +97,6 @@ mutual
prog : Prog l s n -> State s -> Either (State l) (State n)
prog Nil s = Right s
prog (st::p) s = stmt st s >>= prog p
run : Prog l s l -> State s -> State l
run p s = either id id $ prog p s

View File

@ -246,7 +246,7 @@ The following (type-correct) program compiles just fine:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 36 47 >}}
First, it loads a boolean (`True`, to be exact) into register `A`; then,
First, it loads a boolean into register `A`; then,
inside the `if/else` statement, it stores an integer into `A`. Finally,
it stores another integer into `B`, and adds them into `R`. Even though
`A` was a boolean at first, the type checker can deduce that it
@ -307,3 +307,173 @@ Type mismatch between IntTy and BoolTy
And so, we have an encoding of our language that allows registers to
be either integers or booleans, while still preventing
type-incorrect programs!
### Building an Interpreter
A good test of such an encoding is the implementation
of an interpreter. It should be possible to convince the
typechecker that our interpreter doesn't need to
handle type errors in the toy language, since they
cannot occur.
Let's start with something simple. First of all, suppose
we have an expression of type `Expr InTy`. In our toy
language, it produces an integer. Our interpreter, then,
will probably want to use Idris' type `Int`. Similarly,
an expression of type `Expr BoolTy` will produce a boolean
in our toy language, which in Idris we can represent using
the built-in `Bool` type. Despite the similar naming, though,
there's no connection between Idris' `Bool` and our own `BoolTy`.
We need to define a conversion from our own types -- which are
values of type `Ty` -- into Idris types that result from evaluating
expressions. We do so as follows:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 63 65 >}}
Similarly, we want to convert our `TypeState` (a tuple describing the _types_
of our registers) into a tuple that actually holds the values of each
register, which we will call `State`. The value of each register at
any point depends on its type. My first thought was to define
`State` as a function from `TypeState` to an Idris `Type`:
```Idris
State : TypeState -> Type
State (a, b, c) = (repr a, repr b, repr c)
```
Unfortunately, this doesn't quite cut it. The problem is that this
function technically doesn't give Idris any guarantees that `State`
will be a tuple. The most Idris knows is that `State` will be some
`Type`, which could be `Int`, `Bool`, or anything else! This
becomes a problem when we try to pattern match on states to get
the contents of a particular register. Instead, we have to define
a new data type:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 67 68 >}}
In this snippet, `State` is still a (type level) function from `TypeState` to `Type`.
However, by using a GADT, we guarantee that there's only one way to construct
a `State (a,b,c)`: using a corresponding tuple. Now, Idris will accept our
pattern matching:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 70 78 >}}
The `getReg` function will take out the value of the corresponding
register, returning `Int` or `Bool` depending on the `TypeState`.
What's important is that if the `TypeState` is known, then so
is the type of `getReg`: no `Either` is involved her, and we
can directly use the integer or boolean stored in the
register. This is exactly what we do:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 80 85 >}}
This is pretty concise. Idris knows that `Lit i` is of type `Expr IntTy`,
and it knows that `repr IntTy = Int`, so it also knows that
`eval (Lit i)` produces an `Int`. Similarly, we wrote
`Reg r` to have type `Expr s (getRegTy r s)`. Since `getReg`
returns `repr (getRegTy r s)`, things check out here, too.
A similar logic applies to the rest of the cases.
The situation with statements is somewhat different. As we said, a statement
doesn't return a value; it changes state. A good initial guess would
be that to evaluate a statement that starts in state `s` and terminates in state `n`,
we would take as input `State s` and return `State n`. However, things are not
quite as simple, thanks to `Break`. Not only does `Break` require
special case logic to return control to the end of the `Loop`, but
it also requires some additional consideration: in a statement
of type `Stmt l s n`, evaluating `Break` can return `State l`.
To implement this, we'll use the `Either` type. The `Left` constructor
will be contain the state at the time of evaluating a `Break`,
and will indicate to the interpreter that we're breaking out of a loop.
On the other hand, the `Right` constructor will contain the state
as produced by all other statements, which would be considered
{{< sidenote "right" "left-right-note" "the \"normal\" case." >}}
We use <code>Left</code> for the "abnormal" case because of
Idris' (and Haskell's) convention to use it as such. For
instance, the two languages define a <code>Monad</code>
instance for <code>Either a</code> where <code>(>>=)</code>
behaves very much like it does for <code>Maybe</code>, with
<code>Left</code> being treated as <code>Nothing</code>,
and <code>Right</code> as <code>Just</code>. We will
use this instance to clean up some of our computations.
{{< /sidenote >}} Note that this doesn't indicate an error:
we need to represent the two states (breaking out of a loop
and normal execution) to define our language's semantics.
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 88 95 >}}
First, note the type. We return an `Either` value, which will
contain `State l` (in the `Left` constructor) if a `Break`
was evaluated, and `State n` (in the `Right` constructor)
if execution went on without breaking.
The `Store` case is rather simple. We use `setReg` to update the result
of the register `r` with the result of evaluating `e`. Because
a store doesn't cause us to start breaking out of a loop,
we use `Right` to wrap the new state.
The `If` case is also rather simple. Its condition is guaranteed
to evaluate to a boolean, so it's sufficient for us to use
Idris' `if` expression. We use the `prog` function here, which
implements the evaluation of a whole program. We'll get to it
momentarily.
`Loop` is the most interesting case. We start by evaluating
the program `p` serving as the loop body. The result of this
computation will be either a state after a break,
held in `Left`, or as the normal execution state, held
in `Right`. The `(>>=)` operation will do nothing in
the first case, and feed the resulting (normal) state
to `stmt (Loop p)` in the second case. This is exactly
what we want: if we broke out of the current iteration
of the loop, we shouldn't continue on to the next iteration.
At the end of evaluating both `p` and the recursive call to
`stmt`, we'll either have exited normally, or via breaking
out. We don't want to continue breaking out further,
so we return the final state wrapped in `Right` in both cases.
Finally, `Break` returns the current state wrapped in `Left`,
beginning the process of breaking out.
The task of `prog` is simply to sequence several statements
together. The monadic bind operator, `(>>=)`, is again perfect
for this task, since it "stops" when it hits a `Left`, but
continues otherwise. This is the implementation:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 97 99 >}}
Awesome! Let's try it out, shall we? I defined a quick `run` function
as follows:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 101 102 >}}
We then have:
```
*TypesafeImp> run prodProg (MkState (0,0,0))
MkState (0, 9, 63) : State (IntTy, IntTy, IntTy)
```
Which seems correct! The program multiplies seven by nine,
and stops when register `A` reaches zero. Our test program
runs, too:
```
*TypesafeImp> run testProg (MkState (0,0,0))
MkState (1, 2, 3) : State (IntTy, IntTy, IntTy)
```
This is the correct answer: `A` ends up being set to
`1` in the `then` branch of the conditional statement,
`B` is set to 2 right after, and `R`, the sum of `A`
and `B`, is rightly `3`.
As you can see, we didn't have to write any error handling
code! This is because the typechecker _knows_ that type errors
aren't possible: our programs are guaranteed to be
{{< sidenote "right" "termination-note" "type correct." >}}
Our programs <em>aren't</em> guaranteed to terminate:
we're lucky that Idris' totality checker is turned off by default.
{{< /sidenote >}}
This was a fun exercise, and I hope you enjoyed reading along!
I hope to see you in my future posts.