Make the typesafe imperative language work properly.

This commit is contained in:
Danila Fedorin 2020-10-31 01:34:23 -07:00
parent f0fe481bcf
commit 52abe73ef7
2 changed files with 193 additions and 52 deletions

View File

@ -23,20 +23,20 @@ data Expr : RegState -> Ty -> Type where
Not : Expr s BoolTy -> Expr s BoolTy
mutual
data Stmt : RegState -> RegState -> Type where
Store : (r : Reg) -> Expr s t -> Stmt s (setRegTy r t s)
If : Expr s BoolTy -> Prog s n -> Prog s n -> Stmt s n
Loop : Prog s s -> Stmt s s
data Stmt : RegState -> RegState -> RegState -> Type where
Store : (r : Reg) -> Expr s t -> Stmt l s (setRegTy r t s)
If : Expr s BoolTy -> Prog l s n -> Prog l s n -> Stmt l s n
Loop : Prog s s s -> Stmt l s s
Break : Stmt s s s
data Prog : RegState -> RegState -> Type where
Nil : Prog s s
Break : Prog s s
(::) : Stmt s n -> Prog n m -> Prog s m
data Prog : RegState -> RegState -> RegState -> Type where
Nil : Prog l s s
(::) : Stmt l s n -> Prog l n m -> Prog l s m
initialState : RegState
initialState = (IntTy, IntTy, IntTy)
testProg : Prog Main.initialState Main.initialState
testProg : Prog Main.initialState Main.initialState Main.initialState
testProg =
[ Store A (Lit 1 `Leq` Lit 2)
, If (Load A)
@ -45,3 +45,55 @@ testProg =
, Store B (Lit 2)
, Store R (Add (Load A) (Load B))
]
prodProg : Prog Main.initialState Main.initialState Main.initialState
prodProg =
[ Store A (Lit 7)
, Store B (Lit 9)
, Store R (Lit 0)
, Loop
[ If (Load A `Leq` Lit 0)
[ Break ]
[ Store R (Load R `Add` Load B)
, Store A (Load A `Add` Lit (-1))
]
]
]
repr : Ty -> Type
repr IntTy = Int
repr BoolTy = Bool
data State : RegState -> Type where
MkState : (repr a, repr b, repr c) -> State (a, b, c)
getReg : (r : Reg) -> State s -> repr (getRegTy r s)
getReg A (MkState (a, _, _)) = a
getReg B (MkState (_, b, _)) = b
getReg R (MkState (_, _, r)) = r
setReg : (r : Reg) -> repr t -> State s -> State (setRegTy r t s)
setReg A a (MkState (_, b, r)) = MkState (a, b, r)
setReg B b (MkState (a, _, r)) = MkState (a, b, r)
setReg R r (MkState (a, b, _)) = MkState (a, b, r)
expr : Expr s t -> State s -> repr t
expr (Lit i) _ = i
expr (Load r) s = getReg r s
expr (Add l r) s = expr l s + expr r s
expr (Leq l r) s = expr l s <= expr r s
expr (Not e) s = not $ expr e s
mutual
stmt : Stmt l s n -> State s -> Either (State l) (State n)
stmt (Store r e) s = Right $ setReg r (expr e s) s
stmt (If c t e) s = if expr c s then prog t s else prog e s
stmt (Loop p) s =
case prog p s >>= stmt (Loop p) of
Right s => Right s
Left s => Right s
stmt Break s = Left s
prog : Prog l s n -> State s -> Either (State l) (State n)
prog Nil s = Right s
prog (st::p) s = stmt st s >>= prog p

View File

@ -49,7 +49,7 @@ representation then, wouldn't it?
I am pretty certain that a similar encoding in Haskell is possible. However,
Haskell wasn't originally created for that kind of abuse of its type system,
so it would probably not look very good.
{{< /sidenote >}} I am pretty certain that it _is_ possible to do this
{{< /sidenote >}} I am sure that it _is_ possible to do this
in Idris, a dependently typed programming language. In this post I will
talk about how to do that.
@ -109,12 +109,12 @@ types of each register. To do so, we add the state as a parameter to
our `Expr` type. This would lead to types like the following:
```Idris
-- An expression that produces a boolean
-- when all the registers are integers.
-- An expression that produces a boolean when all the registers
-- are integers.
Expr (IntTy, IntTy, IntTy) BoolTy
-- An expression that produces an integer
-- when A and B are integers, and R is a boolean.
-- An expression that produces an integer when A and B are integers,
-- and R is a boolean.
Expr (IntTy, IntTy, BoolTy) IntTy
```
@ -129,54 +129,115 @@ creates an expression that has `r`'s type in the current state `s`.
Statements are a bit different. Unlike expressions, they don't evaluate to
anything; rather, they do something. That "something" may very well be changing
the current state. We could, for instance, set `A` to be a boolean, while it was
previously an integer. So, the `Stmt` type will take two arguments: the initial
state and the final state. This leads to the following definition:
previously an integer. This suggests equipping our `Stmt` type with two
arguments: the initial state (before the statement's execution), and the final
state (after the statement's execution). This would lead to types like this:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 26 29 >}}
```Idris
-- Statement that, when run while all registers contain integers,
-- terminates with registers B and R having been assigned boolean values.
Stmt (IntTy, IntTy, IntTy) (IntTy, BoolTy, BoolTy)
```
However, there's a problem with `loop` and `break`. When we run a loop,
we will require that the state at the end of one iteration is the
same as the state at its beginning. Otherwise, it would be possible
for a loop to keep changing the types of registers every iteration,
and it would become impossible for us to infer the final state
without actually running the program. In itself, this restriction
isn't a problem; most static type systems require both branches
of an `if/else` expression to be of the same type for a similar
reason. The problem comes from the interaction with `break`.
By itself, the would-be type of `break` seems innocent enough. It
doesn't change any registers, so we could call it `Stmt s s`.
But consider the following program:
```
A := 0
B := 0
R := 0
do
if 5 <= A then
B := 1 <= 1
break
B := 0
else
A := A + 1
end
end
```
The loop starts with all registers having integer values.
As per our aforementioned loop requirement, the body
of the loop must terminate with all registers _still_ having
integer values. For the first five iterations that's exactly
what will happen. However, after we increment `A` the fifth time,
we will set `B` to a boolean value -- using a valid statement --
and then `break`. The `break` statement will be accepted by
the typechecker, and so will the whole `then` branch. After all,
it seems as though we reset `B` back to an integer value.
But that won't be the case. We will have jumped to the end
of the loop, where we are expected to have an all-integer type,
which we will not have.
The solution I came up with to address this issue was to
add a _third_ argument to `Stmt`, which contains the "context"
type. That is, it contains the type of the innermost loop surrounding
the statement. A `break` statement would only be permissible
if the current type matches the loop type. With this, we finally
write down a definition of `Stmt`:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 26 30 >}}
The `Store` constructor takes a register `r` and an expression producing some type `t` in state `s`.
From these, it creates a statement that starts in `s`, and finishes
in a state similar to `s`, but with `r` now having type `t`.
in a state similar to `s`, but with `r` now having type `t`. The loop
type `l` remains unaffected and unused; we are free to assign any register
any value.
The `If` constructor takes a condition, which starts in state `s` and _must_ produce
The `If` constructor takes a condition `Expr`, which starts in state `s` and _must_ produce
a boolean. It also takes two programs (sequences of statements), each of which
start in `s` and finishes in another state `n`. Then, the `If` constructor creates
starts in `s` and finishes in another state `n`. This results in
a statement that starts in state `s`, and finishes in state `n`. Conceptually,
each branch of the `if/else` statement must result in the same final state (in terms of types);
otherwise, we wouldn't know which of the states to pick when deciding the final
state of the `If` itself.
state of the `If` itself. As with `Store`, the loop type `l` is untouched here.
Individual statements are free to modify the state however they wish.
The `Loop` constructor is even more restrictive: it takes a single program (the sequence
of instructions that it will be repeating). This program starts _and_ ends in state `s`;
since the loop can repeat many times, and since we're repeating the same program,
we want to make sure that program is always run from the same initial state.
The `Loop` constructor is very restrictive. It takes a single program (the sequence
of instructions that it will be repeating). As we discussed above, this program
must start _and_ end in the same state `s`. Furthermore, this program's loop
type must also be `s`, since the loop we're constructing will be surrounding the
program. The resulting loop itself still has an arbitrary loop type `l`, since
it doesn't surround itself.
I chose __not__ to encode `Break` as a statement. This is because we don't want `Break`s occurring
in the middle of a program! Otherwise, it would be possible to write a program
that _seems_ like it will terminate in one state, but, because of a break in the middle,
terminates in another! Instead, we'll encode `Break` as a part of the `Prog` encoding.
Finally, `Break` can only be constructed when the loop state matches the current
state. Since we'll be jumping to the end of the innermost loop, the final state
is also the same as the loop state.
These are all the constructors we'll be needing. It's time to move on to
whole programs!
### Programs
A program is basically a list of statements. However, we can't use a regular Idris list for two
reasons:
A program is simply a list of statements. However, we can't use a regular Idris list,
because a regular list wouldn't be able to represent the relationship between
each successive statement. In our program, we want the final state of one
statement to be the initial state of the following one, since they'll
be executed in sequence. To represent this, we have to define our own
list-like GADT. The definition of the type turns out fairly straightforward:
1. Our type is not as simple as `[Stmt]`. We want each statement to begin in the state that the
previous statement ended in; we will have to do some work to ensure that.
2. We have two ways of ending the sequence of statements: either with or without a `break`.
Thus, instead of having a single `Nil` constructor, we'll have two.
The definition of the type turns out fairly straightforward:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 31 34 >}}
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 32 34 >}}
The `Nil` constructor represents an empty program (much like the built-in `Nil` represents an empty list).
Since no actions are done, it creates a `Prog` that starts and ends in the same state: `s`.
The `Break` constructor is similar; however, it represents a `break` instruction, and thus,
must be distinct from the regular `End` constructor. Finally, the `(::)` constructor, much like
the built-in `(::)` constructor, takes a statement and another program. The statement begins in
state `s` and ends in state `n`; the program after that statement must then start in state `n`,
and end in some other state `m`. The combination of the statement and the program starts in state `s`,
and finishes in state `m`; thus, `(::)` yields `Prog s m`.
The `(::)` constructor, much like the built-in `(::)` constructor, takes a statement
and another program. The statement begins in state `s` and ends in state `n`; the program after
that statement must then start in state `n`, and end in some other state `m`.
The combination of the statement and the program starts in state `s`, and finishes in state `m`.
Thus, `(::)` yields `Prog s m`. None of the constructors affect the loop type `l`: we
are free to sequence any statements that we want, and it is impossible for us
to construct statements using `l` that cause runtime errors.
This should be all! Let's try out some programs.
@ -192,7 +253,7 @@ it stores another integer into `B`, and adds them into `R`. Even though
was reset back to an integer after the `if/else`, and the program is accepted.
On the other hand, had we forgotten to set `A` to a boolean first:
```
```Idris
[ If (Load A)
[ Store A (Lit 1) ]
[ Store A (Lit 2) ]
@ -204,17 +265,45 @@ On the other hand, had we forgotten to set `A` to a boolean first:
We would get a type error:
```
Type mismatch between
getRegTy A (IntTy, IntTy, IntTy)
and
BoolTy
Type mismatch between getRegTy A (IntTy, IntTy, IntTy) and BoolTy
```
The type of register `A` (that is, `IntTy`) is incompatible
with `BoolTy`. Our `initialState` says that `A` starts out as
an integer, so it can't be used in an `if/else` right away!
Similar errors occur if we make one of the branches of
the `if/else` empty, or if we set `B` to a boolean. And so,
we have an encoding of our language that allows registers to
the `if/else` empty, or if we set `B` to a boolean.
We can also encode the example program from the beginning
of this post:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 49 61 >}}
This program compiles just fine, too! It is a little reminiscent of
the program we used to demonstrate how `break` could break things
if we weren't careful. So, let's go ahead and try `break` in an invalid
state:
```Idris
[ Store A (Lit 7)
, Store B (Lit 9)
, Store R (Lit 0)
, Loop
[ If (Load A `Leq` Lit 0)
[ Store B (Lit 1 `Leq` Lit 1), Break, Store B (Lit 0) ]
[ Store R (Load R `Add` Load B)
, Store A (Load A `Add` Lit (-1))
]
]
]
```
Again, the type checker complains:
```
Type mismatch between IntTy and BoolTy
```
And so, we have an encoding of our language that allows registers to
be either integers or booleans, while still preventing
type-incorrect programs!