From f0fe481bcf6c90aba07a9a01b00e9cfa4513a93c Mon Sep 17 00:00:00 2001 From: Danila Fedorin Date: Fri, 30 Oct 2020 19:07:30 -0700 Subject: [PATCH] Add post about the typesafe imperative language. --- code/typesafe-imperative/TypesafeImp.idr | 47 +++++ content/blog/typesafe_imperative_lang.md | 220 +++++++++++++++++++++++ 2 files changed, 267 insertions(+) create mode 100644 code/typesafe-imperative/TypesafeImp.idr create mode 100644 content/blog/typesafe_imperative_lang.md diff --git a/code/typesafe-imperative/TypesafeImp.idr b/code/typesafe-imperative/TypesafeImp.idr new file mode 100644 index 0000000..b40ab30 --- /dev/null +++ b/code/typesafe-imperative/TypesafeImp.idr @@ -0,0 +1,47 @@ +data Reg = A | B | R + +data Ty = IntTy | BoolTy + +RegState : Type +RegState = (Ty, Ty, Ty) + +getRegTy : Reg -> RegState -> Ty +getRegTy A (a, _, _) = a +getRegTy B (_, b, _) = b +getRegTy R (_, _, r) = r + +setRegTy : Reg -> Ty -> RegState -> RegState +setRegTy A a (_, b, r) = (a, b, r) +setRegTy B b (a, _, r) = (a, b, r) +setRegTy R r (a, b, _) = (a, b, r) + +data Expr : RegState -> Ty -> Type where + Lit : Int -> Expr s IntTy + Load : (r : Reg) -> Expr s (getRegTy r s) + Add : Expr s IntTy -> Expr s IntTy -> Expr s IntTy + Leq : Expr s IntTy -> Expr s IntTy -> Expr s BoolTy + Not : Expr s BoolTy -> Expr s BoolTy + +mutual + data Stmt : RegState -> RegState -> Type where + Store : (r : Reg) -> Expr s t -> Stmt s (setRegTy r t s) + If : Expr s BoolTy -> Prog s n -> Prog s n -> Stmt s n + Loop : Prog s s -> Stmt s s + + data Prog : RegState -> RegState -> Type where + Nil : Prog s s + Break : Prog s s + (::) : Stmt s n -> Prog n m -> Prog s m + +initialState : RegState +initialState = (IntTy, IntTy, IntTy) + +testProg : Prog Main.initialState Main.initialState +testProg = + [ Store A (Lit 1 `Leq` Lit 2) + , If (Load A) + [ Store A (Lit 1) ] + [ Store A (Lit 2) ] + , Store B (Lit 2) + , Store R (Add (Load A) (Load B)) + ] diff --git a/content/blog/typesafe_imperative_lang.md b/content/blog/typesafe_imperative_lang.md new file mode 100644 index 0000000..fc793c3 --- /dev/null +++ b/content/blog/typesafe_imperative_lang.md @@ -0,0 +1,220 @@ +--- +title: "A Typesafe Representation of an Imperative Language" +date: 2020-10-30T17:19:59-07:00 +tags: ["Idris"] +draft: true +--- + +A recent homework assignment for my university's programming languages +course was to encode the abstract syntax for a small imperative language +into Haskell data types. The language consisted of very few constructs, and was very much a "toy". +On the expression side of things, it had three registers (`A`, `B`, and `R`), +numbers, addition, comparison using "less than", and logical negation. It also +included a statement for storing the result of an expression into +a register, `if/else`, and an infinite loop construct with an associated `break` operation. +A sample program in the language which computes the product of two +numbers is as follows: + +``` +A := 7 +B := 9 +R := 0 +do + if A <= 0 then + break + else + R := R + B; + A := A + -1; + end +end +``` + +The homework notes that type errors may arise in the little imperative language. +We could, for instance, try to add a boolean to a number: `3 + (1 < 2)`. Alternatively, +we could try use a number in the condition of an `if/else` expression. A "naive" +encoding of the abstract syntax would allow for such errors. + +However, assuming that registers could only store integers and not booleans, it is fairly easy to +separate the expression grammar into two nonterminals, yielding boolean +and integer expressions respectively. Since registers can only store integers, +the `(:=)` operation will always require an integer expression, and an `if/else` +statement will always require a boolean expression. A matching Haskell encoding +would not allow "invalid" programs to compile. That is, the programs would be +type-correct by construction. + +Then, a question arose in the ensuing discussion: what if registers _could_ +contain booleans? It would be impossible to create such a "correct-by-construction" +representation then, wouldn't it? +{{< sidenote "right" "haskell-note" "Although I don't know about Haskell," >}} +I am pretty certain that a similar encoding in Haskell is possible. However, +Haskell wasn't originally created for that kind of abuse of its type system, +so it would probably not look very good. +{{< /sidenote >}} I am pretty certain that it _is_ possible to do this +in Idris, a dependently typed programming language. In this post I will +talk about how to do that. + +### Registers and Expressions +Let's start by encoding registers. Since we only have three registers, we +can encode them using a simple data type declaration, much the same as we +would in Haskell: + +{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 1 1 >}} + +Now that registers can store either integers or booleans (and only those two), +we need to know which one is which. For this purpose, we can declare another +data type: + +{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 3 3 >}} + +At any point in the (hypothetical) execution of our program, each +of the registers will have a type, either boolean or integer. The +combined state of the three registers would then be the combination +of these three states; we can represent this using a 3-tuple: + +{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 5 6 >}} + +Let's say that the first element of the tuple will be the type of the register +`A`, the second the type of `B`, and the third the type of `R`. Then, +we can define two helper functions, one for retrieving the type of +a register, and one for changing it: + +{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 8 16 >}} + +Now, it's time to talk about expressions. We know now that an expression +can evaluate to either a boolean or an integer value (because a register +can contain either of those types of values). Perhaps we can specify +the type that an expression evaluates to in the expression's own type: +`Expr IntTy` would evaluate to integers, and `Expr BoolTy` would evaluate +to booleans. Then, we could constructors as follows: + +```Idris +Lit : Int -> Expr IntTy +Not : Expr BoolTy -> Expr BoolTy +``` + +Sounds good! But what about loading a register? + +```Idris +Load : Reg -> Expr IntTy -- no; what if the register is a boolean? +Load : Reg -> Expr BoolTy -- no; what if the register is an integer? +Load : Reg -> Expr a -- no; a register access can't be either! +``` + +The type of an expression that loads a register depends on the current +state of the program! If we last stored an integer into a register, +then loading from that register would give us an integer. But if we +last stored a boolean into a register, then reading from it would +give us a boolean. Our expressions need to be aware of the current +types of each register. To do so, we add the state as a parameter to +our `Expr` type. This would lead to types like the following: + +```Idris +-- An expression that produces a boolean +-- when all the registers are integers. +Expr (IntTy, IntTy, IntTy) BoolTy + +-- An expression that produces an integer +-- when A and B are integers, and R is a boolean. +Expr (IntTy, IntTy, BoolTy) IntTy +``` + +In Idris, the whole definition becomes: + +{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 18 23 >}} + +The only "interesting" constructor is `Load`, which, given a register `r`, +creates an expression that has `r`'s type in the current state `s`. + +### Statements +Statements are a bit different. Unlike expressions, they don't evaluate to +anything; rather, they do something. That "something" may very well be changing +the current state. We could, for instance, set `A` to be a boolean, while it was +previously an integer. So, the `Stmt` type will take two arguments: the initial +state and the final state. This leads to the following definition: + +{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 26 29 >}} + +The `Store` constructor takes a register `r` and an expression producing some type `t` in state `s`. +From these, it creates a statement that starts in `s`, and finishes +in a state similar to `s`, but with `r` now having type `t`. + +The `If` constructor takes a condition, which starts in state `s` and _must_ produce +a boolean. It also takes two programs (sequences of statements), each of which +start in `s` and finishes in another state `n`. Then, the `If` constructor creates +a statement that starts in state `s`, and finishes in state `n`. Conceptually, +each branch of the `if/else` statement must result in the same final state (in terms of types); +otherwise, we wouldn't know which of the states to pick when deciding the final +state of the `If` itself. + +The `Loop` constructor is even more restrictive: it takes a single program (the sequence +of instructions that it will be repeating). This program starts _and_ ends in state `s`; +since the loop can repeat many times, and since we're repeating the same program, +we want to make sure that program is always run from the same initial state. + +I chose __not__ to encode `Break` as a statement. This is because we don't want `Break`s occurring +in the middle of a program! Otherwise, it would be possible to write a program +that _seems_ like it will terminate in one state, but, because of a break in the middle, +terminates in another! Instead, we'll encode `Break` as a part of the `Prog` encoding. + +### Programs +A program is basically a list of statements. However, we can't use a regular Idris list for two +reasons: + +1. Our type is not as simple as `[Stmt]`. We want each statement to begin in the state that the +previous statement ended in; we will have to do some work to ensure that. +2. We have two ways of ending the sequence of statements: either with or without a `break`. +Thus, instead of having a single `Nil` constructor, we'll have two. + +The definition of the type turns out fairly straightforward: + +{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 31 34 >}} + +The `Nil` constructor represents an empty program (much like the built-in `Nil` represents an empty list). +Since no actions are done, it creates a `Prog` that starts and ends in the same state: `s`. +The `Break` constructor is similar; however, it represents a `break` instruction, and thus, +must be distinct from the regular `End` constructor. Finally, the `(::)` constructor, much like +the built-in `(::)` constructor, takes a statement and another program. The statement begins in +state `s` and ends in state `n`; the program after that statement must then start in state `n`, +and end in some other state `m`. The combination of the statement and the program starts in state `s`, +and finishes in state `m`; thus, `(::)` yields `Prog s m`. + +This should be all! Let's try out some programs. + +### Trying it Out +The following (type-correct) program compiles just fine: + +{{< codelines "Idris" "typesafe-imperative/TypesafeImp.idr" 36 47 >}} + +First, it loads a boolean (`True`, to be exact) into register `A`; then, +inside the `if/else` statement, it stores an integer into `A`. Finally, +it stores another integer into `B`, and adds them into `R`. Even though +`A` was a boolean at first, the type checker can deduce that it +was reset back to an integer after the `if/else`, and the program is accepted. +On the other hand, had we forgotten to set `A` to a boolean first: + +``` + [ If (Load A) + [ Store A (Lit 1) ] + [ Store A (Lit 2) ] + , Store B (Lit 2) + , Store R (Add (Load A) (Load B)) + ] +``` + +We would get a type error: + +``` +Type mismatch between + getRegTy A (IntTy, IntTy, IntTy) +and + BoolTy +``` + +The type of register `A` (that is, `IntTy`) is incompatible +with `BoolTy`. Our `initialState` says that `A` starts out as +an integer, so it can't be used in an `if/else` right away! +Similar errors occur if we make one of the branches of +the `if/else` empty, or if we set `B` to a boolean. And so, +we have an encoding of our language that allows registers to +be either integers or booleans, while still preventing +type-incorrect programs!