Update typesafe imperative language post draft.
This commit is contained in:
parent
52abe73ef7
commit
b459e9cbfe
@ -2,20 +2,20 @@ data Reg = A | B | R
|
||||
|
||||
data Ty = IntTy | BoolTy
|
||||
|
||||
RegState : Type
|
||||
RegState = (Ty, Ty, Ty)
|
||||
TypeState : Type
|
||||
TypeState = (Ty, Ty, Ty)
|
||||
|
||||
getRegTy : Reg -> RegState -> Ty
|
||||
getRegTy : Reg -> TypeState -> Ty
|
||||
getRegTy A (a, _, _) = a
|
||||
getRegTy B (_, b, _) = b
|
||||
getRegTy R (_, _, r) = r
|
||||
|
||||
setRegTy : Reg -> Ty -> RegState -> RegState
|
||||
setRegTy : Reg -> Ty -> TypeState -> TypeState
|
||||
setRegTy A a (_, b, r) = (a, b, r)
|
||||
setRegTy B b (a, _, r) = (a, b, r)
|
||||
setRegTy R r (a, b, _) = (a, b, r)
|
||||
|
||||
data Expr : RegState -> Ty -> Type where
|
||||
data Expr : TypeState -> Ty -> Type where
|
||||
Lit : Int -> Expr s IntTy
|
||||
Load : (r : Reg) -> Expr s (getRegTy r s)
|
||||
Add : Expr s IntTy -> Expr s IntTy -> Expr s IntTy
|
||||
@ -23,17 +23,17 @@ data Expr : RegState -> Ty -> Type where
|
||||
Not : Expr s BoolTy -> Expr s BoolTy
|
||||
|
||||
mutual
|
||||
data Stmt : RegState -> RegState -> RegState -> Type where
|
||||
data Stmt : TypeState -> TypeState -> TypeState -> Type where
|
||||
Store : (r : Reg) -> Expr s t -> Stmt l s (setRegTy r t s)
|
||||
If : Expr s BoolTy -> Prog l s n -> Prog l s n -> Stmt l s n
|
||||
Loop : Prog s s s -> Stmt l s s
|
||||
Break : Stmt s s s
|
||||
|
||||
data Prog : RegState -> RegState -> RegState -> Type where
|
||||
data Prog : TypeState -> TypeState -> TypeState -> Type where
|
||||
Nil : Prog l s s
|
||||
(::) : Stmt l s n -> Prog l n m -> Prog l s m
|
||||
|
||||
initialState : RegState
|
||||
initialState : TypeState
|
||||
initialState = (IntTy, IntTy, IntTy)
|
||||
|
||||
testProg : Prog Main.initialState Main.initialState Main.initialState
|
||||
@ -64,7 +64,7 @@ repr : Ty -> Type
|
||||
repr IntTy = Int
|
||||
repr BoolTy = Bool
|
||||
|
||||
data State : RegState -> Type where
|
||||
data State : TypeState -> Type where
|
||||
MkState : (repr a, repr b, repr c) -> State (a, b, c)
|
||||
|
||||
getReg : (r : Reg) -> State s -> repr (getRegTy r s)
|
||||
@ -97,3 +97,6 @@ mutual
|
||||
prog : Prog l s n -> State s -> Either (State l) (State n)
|
||||
prog Nil s = Right s
|
||||
prog (st::p) s = stmt st s >>= prog p
|
||||
|
||||
run : Prog l s l -> State s -> State l
|
||||
run p s = either id id $ prog p s
|
||||
|
@ -246,7 +246,7 @@ The following (type-correct) program compiles just fine:
|
||||
|
||||
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 36 47 >}}
|
||||
|
||||
First, it loads a boolean (`True`, to be exact) into register `A`; then,
|
||||
First, it loads a boolean into register `A`; then,
|
||||
inside the `if/else` statement, it stores an integer into `A`. Finally,
|
||||
it stores another integer into `B`, and adds them into `R`. Even though
|
||||
`A` was a boolean at first, the type checker can deduce that it
|
||||
@ -307,3 +307,173 @@ Type mismatch between IntTy and BoolTy
|
||||
And so, we have an encoding of our language that allows registers to
|
||||
be either integers or booleans, while still preventing
|
||||
type-incorrect programs!
|
||||
|
||||
### Building an Interpreter
|
||||
A good test of such an encoding is the implementation
|
||||
of an interpreter. It should be possible to convince the
|
||||
typechecker that our interpreter doesn't need to
|
||||
handle type errors in the toy language, since they
|
||||
cannot occur.
|
||||
|
||||
Let's start with something simple. First of all, suppose
|
||||
we have an expression of type `Expr InTy`. In our toy
|
||||
language, it produces an integer. Our interpreter, then,
|
||||
will probably want to use Idris' type `Int`. Similarly,
|
||||
an expression of type `Expr BoolTy` will produce a boolean
|
||||
in our toy language, which in Idris we can represent using
|
||||
the built-in `Bool` type. Despite the similar naming, though,
|
||||
there's no connection between Idris' `Bool` and our own `BoolTy`.
|
||||
We need to define a conversion from our own types -- which are
|
||||
values of type `Ty` -- into Idris types that result from evaluating
|
||||
expressions. We do so as follows:
|
||||
|
||||
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 63 65 >}}
|
||||
|
||||
Similarly, we want to convert our `TypeState` (a tuple describing the _types_
|
||||
of our registers) into a tuple that actually holds the values of each
|
||||
register, which we will call `State`. The value of each register at
|
||||
any point depends on its type. My first thought was to define
|
||||
`State` as a function from `TypeState` to an Idris `Type`:
|
||||
|
||||
```Idris
|
||||
State : TypeState -> Type
|
||||
State (a, b, c) = (repr a, repr b, repr c)
|
||||
```
|
||||
|
||||
Unfortunately, this doesn't quite cut it. The problem is that this
|
||||
function technically doesn't give Idris any guarantees that `State`
|
||||
will be a tuple. The most Idris knows is that `State` will be some
|
||||
`Type`, which could be `Int`, `Bool`, or anything else! This
|
||||
becomes a problem when we try to pattern match on states to get
|
||||
the contents of a particular register. Instead, we have to define
|
||||
a new data type:
|
||||
|
||||
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 67 68 >}}
|
||||
|
||||
In this snippet, `State` is still a (type level) function from `TypeState` to `Type`.
|
||||
However, by using a GADT, we guarantee that there's only one way to construct
|
||||
a `State (a,b,c)`: using a corresponding tuple. Now, Idris will accept our
|
||||
pattern matching:
|
||||
|
||||
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 70 78 >}}
|
||||
|
||||
The `getReg` function will take out the value of the corresponding
|
||||
register, returning `Int` or `Bool` depending on the `TypeState`.
|
||||
What's important is that if the `TypeState` is known, then so
|
||||
is the type of `getReg`: no `Either` is involved her, and we
|
||||
can directly use the integer or boolean stored in the
|
||||
register. This is exactly what we do:
|
||||
|
||||
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 80 85 >}}
|
||||
|
||||
This is pretty concise. Idris knows that `Lit i` is of type `Expr IntTy`,
|
||||
and it knows that `repr IntTy = Int`, so it also knows that
|
||||
`eval (Lit i)` produces an `Int`. Similarly, we wrote
|
||||
`Reg r` to have type `Expr s (getRegTy r s)`. Since `getReg`
|
||||
returns `repr (getRegTy r s)`, things check out here, too.
|
||||
A similar logic applies to the rest of the cases.
|
||||
|
||||
The situation with statements is somewhat different. As we said, a statement
|
||||
doesn't return a value; it changes state. A good initial guess would
|
||||
be that to evaluate a statement that starts in state `s` and terminates in state `n`,
|
||||
we would take as input `State s` and return `State n`. However, things are not
|
||||
quite as simple, thanks to `Break`. Not only does `Break` require
|
||||
special case logic to return control to the end of the `Loop`, but
|
||||
it also requires some additional consideration: in a statement
|
||||
of type `Stmt l s n`, evaluating `Break` can return `State l`.
|
||||
|
||||
To implement this, we'll use the `Either` type. The `Left` constructor
|
||||
will be contain the state at the time of evaluating a `Break`,
|
||||
and will indicate to the interpreter that we're breaking out of a loop.
|
||||
On the other hand, the `Right` constructor will contain the state
|
||||
as produced by all other statements, which would be considered
|
||||
{{< sidenote "right" "left-right-note" "the \"normal\" case." >}}
|
||||
We use <code>Left</code> for the "abnormal" case because of
|
||||
Idris' (and Haskell's) convention to use it as such. For
|
||||
instance, the two languages define a <code>Monad</code>
|
||||
instance for <code>Either a</code> where <code>(>>=)</code>
|
||||
behaves very much like it does for <code>Maybe</code>, with
|
||||
<code>Left</code> being treated as <code>Nothing</code>,
|
||||
and <code>Right</code> as <code>Just</code>. We will
|
||||
use this instance to clean up some of our computations.
|
||||
{{< /sidenote >}} Note that this doesn't indicate an error:
|
||||
we need to represent the two states (breaking out of a loop
|
||||
and normal execution) to define our language's semantics.
|
||||
|
||||
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 88 95 >}}
|
||||
|
||||
First, note the type. We return an `Either` value, which will
|
||||
contain `State l` (in the `Left` constructor) if a `Break`
|
||||
was evaluated, and `State n` (in the `Right` constructor)
|
||||
if execution went on without breaking.
|
||||
|
||||
The `Store` case is rather simple. We use `setReg` to update the result
|
||||
of the register `r` with the result of evaluating `e`. Because
|
||||
a store doesn't cause us to start breaking out of a loop,
|
||||
we use `Right` to wrap the new state.
|
||||
|
||||
The `If` case is also rather simple. Its condition is guaranteed
|
||||
to evaluate to a boolean, so it's sufficient for us to use
|
||||
Idris' `if` expression. We use the `prog` function here, which
|
||||
implements the evaluation of a whole program. We'll get to it
|
||||
momentarily.
|
||||
|
||||
`Loop` is the most interesting case. We start by evaluating
|
||||
the program `p` serving as the loop body. The result of this
|
||||
computation will be either a state after a break,
|
||||
held in `Left`, or as the normal execution state, held
|
||||
in `Right`. The `(>>=)` operation will do nothing in
|
||||
the first case, and feed the resulting (normal) state
|
||||
to `stmt (Loop p)` in the second case. This is exactly
|
||||
what we want: if we broke out of the current iteration
|
||||
of the loop, we shouldn't continue on to the next iteration.
|
||||
At the end of evaluating both `p` and the recursive call to
|
||||
`stmt`, we'll either have exited normally, or via breaking
|
||||
out. We don't want to continue breaking out further,
|
||||
so we return the final state wrapped in `Right` in both cases.
|
||||
Finally, `Break` returns the current state wrapped in `Left`,
|
||||
beginning the process of breaking out.
|
||||
|
||||
The task of `prog` is simply to sequence several statements
|
||||
together. The monadic bind operator, `(>>=)`, is again perfect
|
||||
for this task, since it "stops" when it hits a `Left`, but
|
||||
continues otherwise. This is the implementation:
|
||||
|
||||
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 97 99 >}}
|
||||
|
||||
Awesome! Let's try it out, shall we? I defined a quick `run` function
|
||||
as follows:
|
||||
|
||||
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 101 102 >}}
|
||||
|
||||
We then have:
|
||||
|
||||
```
|
||||
*TypesafeImp> run prodProg (MkState (0,0,0))
|
||||
MkState (0, 9, 63) : State (IntTy, IntTy, IntTy)
|
||||
```
|
||||
|
||||
Which seems correct! The program multiplies seven by nine,
|
||||
and stops when register `A` reaches zero. Our test program
|
||||
runs, too:
|
||||
|
||||
```
|
||||
*TypesafeImp> run testProg (MkState (0,0,0))
|
||||
MkState (1, 2, 3) : State (IntTy, IntTy, IntTy)
|
||||
```
|
||||
|
||||
This is the correct answer: `A` ends up being set to
|
||||
`1` in the `then` branch of the conditional statement,
|
||||
`B` is set to 2 right after, and `R`, the sum of `A`
|
||||
and `B`, is rightly `3`.
|
||||
|
||||
As you can see, we didn't have to write any error handling
|
||||
code! This is because the typechecker _knows_ that type errors
|
||||
aren't possible: our programs are guaranteed to be
|
||||
{{< sidenote "right" "termination-note" "type correct." >}}
|
||||
Our programs <em>aren't</em> guaranteed to terminate:
|
||||
we're lucky that Idris' totality checker is turned off by default.
|
||||
{{< /sidenote >}}
|
||||
|
||||
This was a fun exercise, and I hope you enjoyed reading along!
|
||||
I hope to see you in my future posts.
|
||||
|
Loading…
Reference in New Issue
Block a user