Add post about the typesafe imperative language.

2020-10-30 19:07:30 -07:00 · 2020-10-30 19:07:30 -07:00 · f0fe481bcf
commit f0fe481bcf
parent 222446a937
2 changed files with 267 additions and 0 deletions
--- a/code/typesafe-imperative/TypesafeImp.idr
+++ b/code/typesafe-imperative/TypesafeImp.idr
@ -0,0 +1,47 @@
+data Reg = A | B | R
+
+data Ty = IntTy | BoolTy
+
+RegState : Type
+RegState = (Ty, Ty, Ty)
+
+getRegTy : Reg -> RegState -> Ty
+getRegTy A (a, _, _) = a
+getRegTy B (_, b, _) = b
+getRegTy R (_, _, r) = r
+
+setRegTy : Reg -> Ty -> RegState -> RegState
+setRegTy A a (_, b, r) = (a, b, r)
+setRegTy B b (a, _, r) = (a, b, r)
+setRegTy R r (a, b, _) = (a, b, r)
+
+data Expr : RegState -> Ty -> Type where
+  Lit : Int -> Expr s IntTy
+  Load : (r : Reg) -> Expr s (getRegTy r s)
+  Add : Expr s IntTy -> Expr s IntTy -> Expr s IntTy
+  Leq : Expr s IntTy -> Expr s IntTy -> Expr s BoolTy
+  Not : Expr s BoolTy -> Expr s BoolTy
+
+mutual
+  data Stmt : RegState -> RegState -> Type where
+    Store : (r : Reg) -> Expr s t -> Stmt s (setRegTy r t s)
+    If : Expr s BoolTy -> Prog s n -> Prog s n -> Stmt s n
+    Loop : Prog s s -> Stmt s s
+
+  data Prog : RegState -> RegState -> Type where
+    Nil : Prog s s
+    Break : Prog s s
+    (::) : Stmt s n -> Prog n m -> Prog s m
+
+initialState : RegState
+initialState = (IntTy, IntTy, IntTy)
+
+testProg : Prog Main.initialState Main.initialState
+testProg =
+  [ Store A (Lit 1 `Leq` Lit 2)
+  , If (Load A)
+    [ Store A (Lit 1) ]
+    [ Store A (Lit 2) ]
+  , Store B (Lit 2)
+  , Store R (Add (Load A) (Load B))
+  ]
--- a/content/blog/typesafe_imperative_lang.md
+++ b/content/blog/typesafe_imperative_lang.md
@ -0,0 +1,220 @@
+---
+title: "A Typesafe Representation of an Imperative Language"
+date: 2020-10-30T17:19:59-07:00
+tags: ["Idris"]
+draft: true
+---
+
+A recent homework assignment for my university's programming languages
+course was to encode the abstract syntax for a small imperative language
+into Haskell data types. The language consisted of very few constructs, and was very much a "toy".
+On the expression side of things, it had three registers (`A`, `B`, and `R`),
+numbers, addition, comparison using "less than", and logical negation. It also
+included a statement for storing the result of an expression into
+a register, `if/else`, and an infinite loop construct with an associated `break` operation.
+A sample program in the language which computes the product of two
+numbers is as follows:
+
+```
+A := 7
+B := 9
+R := 0
+do
+  if A <= 0 then
+    break
+  else
+    R := R + B;
+    A := A + -1;
+  end
+end
+```
+
+The homework notes that type errors may arise in the little imperative language.
+We could, for instance, try to add a boolean to a number: `3 + (1 < 2)`. Alternatively,
+we could try use a number in the condition of an `if/else` expression. A "naive"
+encoding of the abstract syntax would allow for such errors.
+
+However, assuming that registers could only store integers and not booleans, it is fairly easy to
+separate the expression grammar into two nonterminals, yielding boolean
+and integer expressions respectively. Since registers can only store integers,
+the `(:=)` operation will always require an integer expression, and an `if/else`
+statement will always require a boolean expression. A matching Haskell encoding
+would not allow "invalid" programs to compile. That is, the programs would be
+type-correct by construction.
+
+Then, a question arose in the ensuing discussion: what if registers _could_
+contain booleans? It would be impossible to create such a "correct-by-construction"
+representation then, wouldn't it?
+{{< sidenote "right" "haskell-note" "Although I don't know about Haskell," >}}
+I am pretty certain that a similar encoding in Haskell is possible. However,
+Haskell wasn't originally created for that kind of abuse of its type system,
+so it would probably not look very good.
+{{< /sidenote >}} I am pretty certain that it _is_ possible to do this
+in Idris, a dependently typed programming language. In this post I will
+talk about how to do that.
+
+### Registers and Expressions
+Let's start by encoding registers. Since we only have three registers, we
+can encode them using a simple data type declaration, much the same as we
+would in Haskell:
+
+{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 1 1 >}}
+
+Now that registers can store either integers or booleans (and only those two),
+we need to know which one is which. For this purpose, we can declare another
+data type:
+
+{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 3 3 >}}
+
+At any point in the (hypothetical) execution of our program, each
+of the registers will have a type, either boolean or integer. The
+combined state of the three registers would then be the combination
+of these three states; we can represent this using a 3-tuple:
+
+{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 5 6 >}}
+
+Let's say that the first element of the tuple will be the type of the register
+`A`, the second the type of `B`, and the third the type of `R`. Then,
+we can define two helper functions, one for retrieving the type of
+a register, and one for changing it:
+
+{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 8 16 >}}
+
+Now, it's time to talk about expressions. We know now that an expression
+can evaluate to either a boolean or an integer value (because a register
+can contain either of those types of values). Perhaps we can specify
+the type that an expression evaluates to in the expression's own type:
+`Expr IntTy` would evaluate to integers, and `Expr BoolTy` would evaluate
+to booleans. Then, we could constructors as follows:
+
+```Idris
+Lit : Int -> Expr IntTy
+Not : Expr BoolTy -> Expr BoolTy
+```
+
+Sounds good! But what about loading a register?
+
+```Idris
+Load : Reg -> Expr IntTy -- no; what if the register is a boolean?
+Load : Reg -> Expr BoolTy -- no; what if the register is an integer?
+Load : Reg -> Expr a -- no; a register access can't be either!
+```
+
+The type of an expression that loads a register depends on the current
+state of the program! If we last stored an integer into a register,
+then loading from that register would give us an integer. But if we
+last stored a boolean into a register, then reading from it would
+give us a boolean. Our expressions need to be aware of the current
+types of each register. To do so, we add the state as a parameter to
+our `Expr` type. This would lead to types like the following:
+
+```Idris
+-- An expression that produces a boolean
+-- when all the registers are integers.
+Expr (IntTy, IntTy, IntTy) BoolTy
+
+-- An expression that produces an integer
+-- when A and B are integers, and R is a boolean.
+Expr (IntTy, IntTy, BoolTy) IntTy
+```
+
+In Idris, the whole definition becomes:
+
+{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 18 23 >}}
+
+The only "interesting" constructor is `Load`, which, given a register `r`,
+creates an expression that has `r`'s type in the current state `s`.
+
+### Statements
+Statements are a bit different. Unlike expressions, they don't evaluate to
+anything; rather, they do something. That "something" may very well be changing
+the current state. We could, for instance, set `A` to be a boolean, while it was
+previously an integer. So, the `Stmt` type will take two arguments: the initial
+state and the final state. This leads to the following definition:
+
+{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 26 29 >}}
+
+The `Store` constructor takes a register `r` and an expression producing some type `t` in state `s`.
+From these, it creates a statement that starts in `s`, and finishes
+in a state similar to `s`, but with `r` now having type `t`.
+
+The `If` constructor takes a condition, which starts in state `s` and _must_ produce
+a boolean. It also takes two programs (sequences of statements), each of which
+start in `s` and finishes in another state `n`. Then, the `If` constructor creates
+a statement that starts in state `s`, and finishes in state `n`. Conceptually,
+each branch of the `if/else` statement must result in the same final state (in terms of types);
+otherwise, we wouldn't know which of the states to pick when deciding the final
+state of the `If` itself.
+
+The `Loop` constructor is even more restrictive: it takes a single program (the sequence
+of instructions that it will be repeating). This program starts _and_ ends in state `s`;
+since the loop can repeat many times, and since we're repeating the same program,
+we want to make sure that program is always run from the same initial state.
+
+I chose __not__ to encode `Break` as a statement. This is because we don't want `Break`s occurring
+in the middle of a program! Otherwise, it would be possible to write a program
+that _seems_ like it will terminate in one state, but, because of a break in the middle,
+terminates in another! Instead, we'll encode `Break` as a part of the `Prog` encoding.
+
+### Programs
+A program is basically a list of statements. However, we can't use a regular Idris list for two
+reasons:
+
+1. Our type is not as simple as `[Stmt]`. We want each statement to begin in the state that the
+previous statement ended in; we will have to do some work to ensure that.
+2. We have two ways of ending the sequence of statements: either with or without a `break`.
+Thus, instead of having a single `Nil` constructor, we'll have two.
+
+The definition of the type turns out fairly straightforward:
+
+{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 31 34 >}}
+
+The `Nil` constructor represents an empty program (much like the built-in `Nil` represents an empty list).
+Since no actions are done, it creates a `Prog` that starts and ends in the same state: `s`.
+The `Break` constructor is similar; however, it represents a `break` instruction, and thus,
+must be distinct from the regular `End` constructor. Finally, the `(::)` constructor, much like
+the built-in `(::)` constructor, takes a statement and another program. The statement begins in
+state `s` and ends in state `n`; the program after that statement must then start in state `n`,
+and end in some other state `m`. The combination of the statement and the program starts in state `s`,
+and finishes in state `m`; thus, `(::)` yields `Prog s m`.
+
+This should be all! Let's try out some programs.
+
+### Trying it Out
+The following (type-correct) program compiles just fine:
+
+{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 36 47 >}}
+
+First, it loads a boolean (`True`, to be exact) into register `A`; then,
+inside the `if/else` statement, it stores an integer into `A`. Finally,
+it stores another integer into `B`, and adds them into `R`. Even though
+`A` was a boolean at first, the type checker can deduce that it
+was reset back to an integer after the `if/else`, and the program is accepted.
+On the other hand, had we forgotten to set `A` to a boolean first:
+
+```
+  [ If (Load A)
+    [ Store A (Lit 1) ]
+    [ Store A (Lit 2) ]
+  , Store B (Lit 2)
+  , Store R (Add (Load A) (Load B))
+  ]
+```
+
+We would get a type error:
+
+```
+Type mismatch between
+        getRegTy A (IntTy, IntTy, IntTy)
+and
+        BoolTy
+```
+
+The type of register `A` (that is, `IntTy`) is incompatible
+with `BoolTy`. Our `initialState` says that `A` starts out as
+an integer, so it can't be used in an `if/else` right away!
+Similar errors occur if we make one of the branches of
+the `if/else` empty, or if we set `B` to a boolean. And so,
+we have an encoding of our language that allows registers to
+be either integers or booleans, while still preventing
+type-incorrect programs!