9.6 KiB
title | date | tags | draft | |
---|---|---|---|---|
A Typesafe Representation of an Imperative Language | 2020-10-30T17:19:59-07:00 |
|
true |
A recent homework assignment for my university's programming languages
course was to encode the abstract syntax for a small imperative language
into Haskell data types. The language consisted of very few constructs, and was very much a "toy".
On the expression side of things, it had three registers (A
, B
, and R
),
numbers, addition, comparison using "less than", and logical negation. It also
included a statement for storing the result of an expression into
a register, if/else
, and an infinite loop construct with an associated break
operation.
A sample program in the language which computes the product of two
numbers is as follows:
A := 7
B := 9
R := 0
do
if A <= 0 then
break
else
R := R + B;
A := A + -1;
end
end
The homework notes that type errors may arise in the little imperative language.
We could, for instance, try to add a boolean to a number: 3 + (1 < 2)
. Alternatively,
we could try use a number in the condition of an if/else
expression. A "naive"
encoding of the abstract syntax would allow for such errors.
However, assuming that registers could only store integers and not booleans, it is fairly easy to
separate the expression grammar into two nonterminals, yielding boolean
and integer expressions respectively. Since registers can only store integers,
the (:=)
operation will always require an integer expression, and an if/else
statement will always require a boolean expression. A matching Haskell encoding
would not allow "invalid" programs to compile. That is, the programs would be
type-correct by construction.
Then, a question arose in the ensuing discussion: what if registers could contain booleans? It would be impossible to create such a "correct-by-construction" representation then, wouldn't it? {{< sidenote "right" "haskell-note" "Although I don't know about Haskell," >}} I am pretty certain that a similar encoding in Haskell is possible. However, Haskell wasn't originally created for that kind of abuse of its type system, so it would probably not look very good. {{< /sidenote >}} I am pretty certain that it is possible to do this in Idris, a dependently typed programming language. In this post I will talk about how to do that.
Registers and Expressions
Let's start by encoding registers. Since we only have three registers, we can encode them using a simple data type declaration, much the same as we would in Haskell:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 1 1 >}}
Now that registers can store either integers or booleans (and only those two), we need to know which one is which. For this purpose, we can declare another data type:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 3 3 >}}
At any point in the (hypothetical) execution of our program, each of the registers will have a type, either boolean or integer. The combined state of the three registers would then be the combination of these three states; we can represent this using a 3-tuple:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 5 6 >}}
Let's say that the first element of the tuple will be the type of the register
A
, the second the type of B
, and the third the type of R
. Then,
we can define two helper functions, one for retrieving the type of
a register, and one for changing it:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 8 16 >}}
Now, it's time to talk about expressions. We know now that an expression
can evaluate to either a boolean or an integer value (because a register
can contain either of those types of values). Perhaps we can specify
the type that an expression evaluates to in the expression's own type:
Expr IntTy
would evaluate to integers, and Expr BoolTy
would evaluate
to booleans. Then, we could constructors as follows:
Lit : Int -> Expr IntTy
Not : Expr BoolTy -> Expr BoolTy
Sounds good! But what about loading a register?
Load : Reg -> Expr IntTy -- no; what if the register is a boolean?
Load : Reg -> Expr BoolTy -- no; what if the register is an integer?
Load : Reg -> Expr a -- no; a register access can't be either!
The type of an expression that loads a register depends on the current
state of the program! If we last stored an integer into a register,
then loading from that register would give us an integer. But if we
last stored a boolean into a register, then reading from it would
give us a boolean. Our expressions need to be aware of the current
types of each register. To do so, we add the state as a parameter to
our Expr
type. This would lead to types like the following:
-- An expression that produces a boolean
-- when all the registers are integers.
Expr (IntTy, IntTy, IntTy) BoolTy
-- An expression that produces an integer
-- when A and B are integers, and R is a boolean.
Expr (IntTy, IntTy, BoolTy) IntTy
In Idris, the whole definition becomes:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 18 23 >}}
The only "interesting" constructor is Load
, which, given a register r
,
creates an expression that has r
's type in the current state s
.
Statements
Statements are a bit different. Unlike expressions, they don't evaluate to
anything; rather, they do something. That "something" may very well be changing
the current state. We could, for instance, set A
to be a boolean, while it was
previously an integer. So, the Stmt
type will take two arguments: the initial
state and the final state. This leads to the following definition:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 26 29 >}}
The Store
constructor takes a register r
and an expression producing some type t
in state s
.
From these, it creates a statement that starts in s
, and finishes
in a state similar to s
, but with r
now having type t
.
The If
constructor takes a condition, which starts in state s
and must produce
a boolean. It also takes two programs (sequences of statements), each of which
start in s
and finishes in another state n
. Then, the If
constructor creates
a statement that starts in state s
, and finishes in state n
. Conceptually,
each branch of the if/else
statement must result in the same final state (in terms of types);
otherwise, we wouldn't know which of the states to pick when deciding the final
state of the If
itself.
The Loop
constructor is even more restrictive: it takes a single program (the sequence
of instructions that it will be repeating). This program starts and ends in state s
;
since the loop can repeat many times, and since we're repeating the same program,
we want to make sure that program is always run from the same initial state.
I chose not to encode Break
as a statement. This is because we don't want Break
s occurring
in the middle of a program! Otherwise, it would be possible to write a program
that seems like it will terminate in one state, but, because of a break in the middle,
terminates in another! Instead, we'll encode Break
as a part of the Prog
encoding.
Programs
A program is basically a list of statements. However, we can't use a regular Idris list for two reasons:
- Our type is not as simple as
[Stmt]
. We want each statement to begin in the state that the previous statement ended in; we will have to do some work to ensure that. - We have two ways of ending the sequence of statements: either with or without a
break
. Thus, instead of having a singleNil
constructor, we'll have two.
The definition of the type turns out fairly straightforward:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 31 34 >}}
The Nil
constructor represents an empty program (much like the built-in Nil
represents an empty list).
Since no actions are done, it creates a Prog
that starts and ends in the same state: s
.
The Break
constructor is similar; however, it represents a break
instruction, and thus,
must be distinct from the regular End
constructor. Finally, the (::)
constructor, much like
the built-in (::)
constructor, takes a statement and another program. The statement begins in
state s
and ends in state n
; the program after that statement must then start in state n
,
and end in some other state m
. The combination of the statement and the program starts in state s
,
and finishes in state m
; thus, (::)
yields Prog s m
.
This should be all! Let's try out some programs.
Trying it Out
The following (type-correct) program compiles just fine:
{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 36 47 >}}
First, it loads a boolean (True
, to be exact) into register A
; then,
inside the if/else
statement, it stores an integer into A
. Finally,
it stores another integer into B
, and adds them into R
. Even though
A
was a boolean at first, the type checker can deduce that it
was reset back to an integer after the if/else
, and the program is accepted.
On the other hand, had we forgotten to set A
to a boolean first:
[ If (Load A)
[ Store A (Lit 1) ]
[ Store A (Lit 2) ]
, Store B (Lit 2)
, Store R (Add (Load A) (Load B))
]
We would get a type error:
Type mismatch between
getRegTy A (IntTy, IntTy, IntTy)
and
BoolTy
The type of register A
(that is, IntTy
) is incompatible
with BoolTy
. Our initialState
says that A
starts out as
an integer, so it can't be used in an if/else
right away!
Similar errors occur if we make one of the branches of
the if/else
empty, or if we set B
to a boolean. And so,
we have an encoding of our language that allows registers to
be either integers or booleans, while still preventing
type-incorrect programs!