blog-static/content/blog/typesafe_interpreter_tuples.md

10 KiB

title date tags
Meaningfully Typechecking a Language in Idris, With Tuples 2020-08-12T15:48:04-07:00
Idris

Some time ago, I wrote a post titled [Meaningfully Typechecking a Language in Idris]({{< relref "typesafe_interpreter.md" >}}). I then followed it up with [Meaningfully Typechecking a Language in Idris, Revisited]({{< relref "typesafe_interpreter_revisited.md" >}}). In these posts, I described a hypothetical way of 'typechecking' an expression data type Expr into a typesafe form SafeExpr. A SafeExpr can be evaluated without any code to handle type errors, since it's by definition impossible to construct ill-typed expressions using it. In the first post, we implemented the method only for simple arithmetic expressions; in my latter post, we extended this to support if-expressions. Near the end of the post, I made the following comment:

When we add polymorphic tuples and lists, we start being able to construct an arbitrary number of types: [a]. [[a]], and so on. Then, we cease to be able t enumerate all possible pairs of types, and require a recursive solution. I think that this leads us back to [our method].

Recently, I thought about this some more, and decided that it's rather simple to add tuples into our little language. The addition of tuples mean that our language will have an infinite number of possible types. We would have Int, (Int, Int), ((Int, Int), Int), and so on. This would make it impossible to manually test every possible case in our typechecker, but our approach of returning Dec (a = b) will work just fine.

Extending The Syntax

First, let's extend our existing language with expressions for tuples. For simplicity, let's use pairs (a,b) instead of general n-element tuples. This would make typechecking less cumbersome while still having the interesting effect of making the number of types in our language infinite. We can always represent the 3-element tuple (a,b,c) as ((a,b), c), after all. To be able to extract values from our pairs, we'll add the fst and snd functions into our language, which accept a tuple and return its first or second element, respectively.

Our Expr data type, which allows ill-typed expressions, ends up as follows:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV3.idr" 31 39 "hl_lines=7 8 9" >}}

I've highlighted the new lines. The additions consist of the Pair constructor, which represents the tuple expression (a, b), and the Fst and Snd constructors, which represent the fst e and snd e expressions, respectively. In a similar manner, we extend our SafeExpr GADT:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV3.idr" 41 49 "hl_lines=7 8 9" >}}

Finally, to provide the PairType constructor, we extend the ExprType and repr functions:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV3.idr" 1 11 "hl_lines=5 11" >}}

Implementing Equality

An important part of this change is the extension of the decEq function, which compares two types for equality. The kind folks over at #idris previously recommended the use of the Dec data type for this purpose. A value of type Dec P {{< sidenote "right" "decideable-note" "is either a proof that P is true, or a proof that P is false." >}} It's possible that a proposition P is not provable, and neither is \lnot P. It is therefore not possible to construct a value of type Dec P for any proposition P. Having a value of type Dec P, then, provides us nontrivial information. {{< /sidenote >}} Our decEq function, given two types a and b, returns Dec (a = b). Thus, it will return either a proof that a = b (which we can then use to convince the Idris type system that two SafeExpr values are, in fact, of the same type), or a proof of a = b -> Void (which tells us that a and b are definitely not equal). If you're not sure what the deal with (=) and Void is, check out [this section]({{< relref "typesafe_interpreter_revisited.md" >}}#curry-howard-correspondence) of the previous article.

A lot of the work in implementing decEq went into constructing proofs of falsity. That is, we needed to explicitly list every case like decEq IntType BoolType, and create a proof that IntType cannot equal BoolType. However, here's how we use decEq in the typechecking function:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV2.idr" 76 78 >}}

We always throw away the proof inequality! So, rather than spending the time constructing useless proofs like this, we can just switch decEq to return a Maybe (a = b). The Just case will tell us that the two types are equal (and, as before, provide a proof); the Nothing case will tell us that the two types are not equal, and provide no further information. Let's see the implementation of decEq now:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV3.idr" 13 23 >}}

Lines 14 through 16 are pretty simple; in this case, we can tell at a glance that the two types are equal, and Idris can infer an equality proof in the form of Refl. We return this proof by writing it in a Just. Line 23 is the catch-all case for any combination of types we didn't handle. Any combination of types we don't handle is invalid, and thus, this case returns Nothing.

What about lines 17 through 22? This is the case for handling the equality of two pair types, (lt1, lt2) and (rt1, rt2). The equality of the two types depends on the equality of their constituents. That is, if we know that lt1 = rt1 and lt2 = rt2, we know that the two pair types are also equal. If one of the two equalities doesn't hold, the two pairs obviously aren't equal, and thus, we should return Nothing. This should remind us of Maybe's monadic nature: we can first compute decEq lt1 rt1, and then, if it succeeds, compute decEq lt2 rt2. If both succeed, we will have in hand the two proofs, lt1 = rt1 and lt2 = rt2. We achieve this effect using do-notation, storing the sub-proofs into subEq1 and subEq2.

What now? Once again, we have to use replace. Recall its type:

replace : {a:_} -> {x:_} -> {y:_} -> {P : a -> Type} -> x = y -> P x -> P y

Given some proposition in terms of a, and knowing that a = b, replace returns the original proposition, but now in terms of b. We know for sure that:

PairType lt1 lt2 = PairType lt1 lt2

We can start from there. Let's handle one thing at a time, and try to replace the second lt1 with rt1. Then, we can replace the second lt2 with rt2, and we'll have our equality!

Easier said than done, though. How do we tell Idris which lt1 we want to substitute? After all, of the following are possible:

PairType rt1 lt2 = PairType lt1 lt2 -- First lt1 replaced
PairType lt1 lt2 = PairType rt1 lt2 -- Second lt1 replaced
PairType rt1 lt2 = PairType rt1 lt2 -- Both replaced

The key is in the signature, specifically the expressions P x and P y. We can think of P as a function, and of replace as creating a value of P applied to another argument. Thus, the substitution will occur exactly where the argument of P is used. Then, to achieve each of the above substitution, we can write P as follows:

1 2 3 t1 => PairType t1 lt2 = PairType lt1 lt2 t1 => PairType lt1 lt2 = PairType t1 lt2 t1 => PairType t1 lt2 = PairType t1 lt2

The second function (highlighted) is the one we'll need to use to attain the desired result. Since P is an implicit argument to replace, we can explicitly provide it with {P=...}, leading to the following line:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV3.idr" 20 20>}}

We now have a proof of the following proposition:

PairType lt1 lt2 = PairType rt1 lt2

We want to replace the second lt2 with rt2, which means that we write our P as follows:

t2 => PairType lt1 lt2 = PairType rt1 t2

Finally, we perform the second replacement, and return the result:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV3.idr" 21 22 >}}

Great! We have finished implement decEq.

Adjusting The Typechecker

It's time to make our typechecker work with tuples. First, we need to fix the IfElse case to accept Maybe (a=b) instead of Dec (a=b):

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV3.idr" 71 78 "hl_lines=7 8" >}}

Note that the only change is from Dec to Maybe; we didn't need to add new cases or even to know what sort of types are available in the language.

Next, we can write the cases for the new expressions in our language. We can start with Pair, which, given expressions of types a and b, creates an expression of type (a,b). As long as the arguments to Pair are well-typed, so is the Pair expression itself; thus, there are no errors to handle.

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV3.idr" 79 83 >}}

The case for Fst is more complicated. If the argument to Fst is a tuple of type (a, b), then Fst constructs from it an expression of type a. Otherwise, the expression is ill-typed, and we return an error.

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV3.idr" 84 89 >}}

The case for Snd is very similar:

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV3.idr" 90 96 >}}

Evaluation Function and Conclusion

We conclude with our final eval and resultStr functions, which now look as follows.

{{< codelines "Idris" "typesafe-interpreter/TypesafeIntrV3.idr" 97 111 "hl_lines=7-9 13-15" >}}

As you can see, we require no error handling in eval; the expressions returned by typecheck are guaranteed to evaluate to valid Idris values. We have achieved our goal, with very little changes to typecheck other than the addition of new language constructs. In my opinion, this is a win!

As always, you can see the code on my Git server. Here's the latest Idris file, if you want to check it out (and maybe verify that it compiles). I hope you found this interesting!