parent
b96d405270
commit
4043a5c1a3
@ -0,0 +1,101 @@ |
||||
{-# LANGUAGE LambdaCase, DeriveFunctor, DeriveFoldable, MultiParamTypeClasses #-} |
||||
import Prelude hiding (length, sum, fix) |
||||
|
||||
length :: [a] -> Int |
||||
length [] = 0 |
||||
length (_:xs) = 1 + length xs |
||||
|
||||
lengthF :: ([a] -> Int) -> [a] -> Int |
||||
lengthF rec [] = 0 |
||||
lengthF rec (_:xs) = 1 + rec xs |
||||
|
||||
lengthF' = \rec -> \case |
||||
[] -> 0 |
||||
_:xs -> 1 + rec xs |
||||
|
||||
fix f = let x = f x in x |
||||
|
||||
length' = fix lengthF |
||||
|
||||
data MyList = MyNil | MyCons Int MyList |
||||
data MyListF a = MyNilF | MyConsF Int a |
||||
|
||||
newtype Fix f = Fix { unFix :: f (Fix f) } |
||||
|
||||
testList :: Fix MyListF |
||||
testList = Fix (MyConsF 1 (Fix (MyConsF 2 (Fix (MyConsF 3 (Fix MyNilF)))))) |
||||
|
||||
myOut :: MyList -> MyListF MyList |
||||
myOut MyNil = MyNilF |
||||
myOut (MyCons i xs) = MyConsF i xs |
||||
|
||||
myIn :: MyListF MyList -> MyList |
||||
myIn MyNilF = MyNil |
||||
myIn (MyConsF i xs) = MyCons i xs |
||||
|
||||
instance Functor MyListF where |
||||
fmap f MyNilF = MyNilF |
||||
fmap f (MyConsF i a) = MyConsF i (f a) |
||||
|
||||
mySumF :: MyListF Int -> Int |
||||
mySumF MyNilF = 0 |
||||
mySumF (MyConsF i rest) = i + rest |
||||
|
||||
mySum :: MyList -> Int |
||||
mySum = mySumF . fmap mySum . myOut |
||||
|
||||
myCata :: (MyListF a -> a) -> MyList -> a |
||||
myCata f = f . fmap (myCata f) . myOut |
||||
|
||||
myLength = myCata $ \case |
||||
MyNilF -> 0 |
||||
MyConsF _ l -> 1 + l |
||||
|
||||
myMax = myCata $ \case |
||||
MyNilF -> 0 |
||||
MyConsF x y -> max x y |
||||
|
||||
myMin = myCata $ \case |
||||
MyNilF -> 0 |
||||
MyConsF x y -> min x y |
||||
|
||||
myTestList = MyCons 2 (MyCons 1 (MyCons 3 MyNil)) |
||||
|
||||
pack :: a -> (Int -> a -> a) -> MyListF a -> a |
||||
pack b f MyNilF = b |
||||
pack b f (MyConsF x y) = f x y |
||||
|
||||
unpack :: (MyListF a -> a) -> (a, Int -> a -> a) |
||||
unpack f = (f MyNilF, \i a -> f (MyConsF i a)) |
||||
|
||||
class Functor f => Cata a f where |
||||
out :: a -> f a |
||||
|
||||
cata :: Cata a f => (f b -> b) -> a -> b |
||||
cata f = f . fmap (cata f) . out |
||||
|
||||
instance Cata MyList MyListF where |
||||
out = myOut |
||||
|
||||
data ListF a b = Nil | Cons a b deriving Functor |
||||
|
||||
instance Cata [a] (ListF a) where |
||||
out [] = Nil |
||||
out (x:xs) = Cons x xs |
||||
|
||||
sum :: Num a => [a] -> a |
||||
sum = cata $ \case |
||||
Nil -> 0 |
||||
Cons x xs -> x + xs |
||||
|
||||
data BinaryTree a = Node a (BinaryTree a) (BinaryTree a) | Leaf deriving (Show, Foldable) |
||||
data BinaryTreeF a b = NodeF a b b | LeafF deriving Functor |
||||
|
||||
instance Cata (BinaryTree a) (BinaryTreeF a) where |
||||
out (Node a l r) = NodeF a l r |
||||
out Leaf = LeafF |
||||
|
||||
invert :: BinaryTree a -> BinaryTree a |
||||
invert = cata $ \case |
||||
LeafF -> Leaf |
||||
NodeF a l r -> Node a r l |
@ -0,0 +1,381 @@ |
||||
--- |
||||
title: "Generalizing Folds in Haskell" |
||||
date: 2022-04-22T12:19:22-07:00 |
||||
draft: true |
||||
tags: ["Haskell"] |
||||
--- |
||||
|
||||
### Recursive Functions |
||||
Let's start off with a little bit of a warmup, and take a look at a simple recursive function: |
||||
`length`. Here's a straightforward definition: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 4 6 >}} |
||||
|
||||
Haskell is nice because it allows for clean definitions of recursive functions; `length` can |
||||
just reference itself in its definition, and everything works out in the end. In the underlying |
||||
lambda calculus, though, a function definition doesn't come with a name -- you only get anonymous |
||||
functions via the lambda abstraction. There's no way for such functions to just refer to themselves by |
||||
their name in their body. But the lambda calculus is Turing complete, so something is making recursive |
||||
definitions possible. |
||||
|
||||
The trick is to rewrite your recursive function in such a way that instead of calling itself by its name |
||||
(which, with anonymous functions, is hard to come by), it receives a reference to itself as an argument. |
||||
As a concrete example: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 8 10 >}} |
||||
|
||||
This new function can easily me anonymous; if we enable the `LambdaCase` extension, |
||||
we can write it using only lambda functions as: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 12 14 >}} |
||||
|
||||
This function is not equivalent to `length`, however. It expects "itself", or a function |
||||
which has type `[a] -> Int`, to be passed in as the first argument. Once fed this `rec` |
||||
argument, though, `lengthF` returns a length function. Let's try feed it something, then! |
||||
|
||||
```Haskell |
||||
lengthF _something |
||||
``` |
||||
|
||||
But if `lengthF` produces a length function when given this _something_, why can't we feed |
||||
this newly-produced length function back to it? |
||||
|
||||
```Haskell |
||||
lengthF (lengthF _something) |
||||
``` |
||||
|
||||
And again: |
||||
|
||||
```Haskell |
||||
lengthF (lengthF (lengthF _something)) |
||||
``` |
||||
|
||||
If we kept going with this process infinitely, we'd eventually have what we need: |
||||
|
||||
{{< latex >}} |
||||
\text{length} = \text{lengthF}(\text{lengthF}(\text{lengthF}(...))) |
||||
{{< /latex >}} |
||||
|
||||
But hey, the stuff inside the first set of parentheses is still an infinite sequence of applications |
||||
of the function $\\text{lengthF}$, and we have just defined this to be $\\text{length}$. Thus, |
||||
we can rewrite the above equation as: |
||||
|
||||
{{< latex >}} |
||||
\text{length} = \text{lengthF}(\text{length}) |
||||
{{< /latex >}} |
||||
|
||||
What we have just discovered is that the actual function that we want, `length`, is a [fixed point](https://mathworld.wolfram.com/FixedPoint.html) |
||||
of the non-recursive function `lengthF`. Fortunately, Haskell comes with a function that can find |
||||
such a fixed point. It's defined like this: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 16 16 >}} |
||||
|
||||
This definition is as declarative as can be; `fix` returns the \\(x\\) such that \\(x = f(x)\\). With |
||||
this, we finally write: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 18 18 >}} |
||||
|
||||
Loading up the file in GHCi, and running the above function, we get exactly the expected results. |
||||
|
||||
``` |
||||
ghci> Main.length' [1,2,3] |
||||
3 |
||||
``` |
||||
|
||||
You may be dissatisfied with the way we handled `fix` here; we went through and pretended that we didn't |
||||
have recursive function definitions, but then used a recursive `let`-expression in the body `fix`! |
||||
This is a valid criticism, so I'd like to briefly talk about how `fix` is used in the context of the |
||||
lambda calculus. |
||||
|
||||
In the untyped typed lambda calculus, we can just define a term that behaves like `fix` does. The |
||||
most common definition is the \\(Y\\) combinator, defined as follows: |
||||
|
||||
{{< latex >}} |
||||
Y = \lambda f. (\lambda x. f (x x)) (\lambda x. f (x x )) |
||||
{{< /latex >}} |
||||
|
||||
When applied to a function, this combinator goes through the following evaluation steps: |
||||
|
||||
{{< latex >}} |
||||
Y f = f (Y f) = f (f (Y f)) =\ ... |
||||
{{< /latex >}} |
||||
|
||||
This is the exact sort of infinite series of function applications that we saw above with \\(\\text{lengthF}\\). |
||||
|
||||
### Recursive Data Types |
||||
We have now seen how we can rewrite a recursive function as a fixed point of some non-recursive function. |
||||
Another cool thing we can do, though, is to transform recursive __data types__ in a similar manner! |
||||
Let's start with something pretty simple. |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 20 20 >}} |
||||
|
||||
Just like we did with functions, we can extract the recursive occurrences of `MyList` into a parameter. |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 21 21 >}} |
||||
|
||||
Just like `lengthF`, `MyListF` isn't really a list. We can't write a function `sum :: MyListF -> Int`. |
||||
`MyListF` requires _something_ as an argument, and once given that, produces a type of integer lists. |
||||
Once again, let's try feeding it: |
||||
|
||||
``` |
||||
MyListF a |
||||
``` |
||||
|
||||
From the definition, we can clearly see that `a` is where the "rest of the list" is in the original |
||||
`MyList`. So, let's try fill `a` with a list that we can get out of `MyListF`: |
||||
|
||||
``` |
||||
MyListF (MyListF a) |
||||
``` |
||||
|
||||
And again: |
||||
|
||||
``` |
||||
MyListF (MyListF (MyListF a)) |
||||
``` |
||||
|
||||
Much like we used a `fix` function to turn our `lengthF` into `length`, we need a data type, |
||||
which we'll call `Fix` (and which has been [implemented before](https://hackage.haskell.org/package/data-fix-0.3.2/docs/Data-Fix.html)). Here's the definition: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 23 23 >}} |
||||
|
||||
Looking past the constructors and accessors, we might write the above in pseudo-Haskell as follows: |
||||
|
||||
```Haskell |
||||
newtype Fix f = f (Fix f) |
||||
``` |
||||
|
||||
This is just like the lambda calculus \\(Y\\) combinator above! Unfortunately, we _do_ have to |
||||
deal with the cruft induced by the constructors here. Thus, to write down the list `[1,2,3]` |
||||
using `MyListF`, we'd have to produce the following: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 25 26 >}} |
||||
|
||||
This is actually done in practice when using some approaches to help address the |
||||
[expression problem](https://en.wikipedia.org/wiki/Expression_problem); however, |
||||
it's quite unpleasant to write code in this way, so we'll set it aside. |
||||
|
||||
Let's go back to our infinite chain of type applications. We've a similar pattern before, |
||||
with \\(\\text{length}\\) and \\(\\text{lengthF}\\). Just like we did then, it seems like |
||||
we might be able to write something like the following: |
||||
|
||||
{{< latex >}} |
||||
\begin{aligned} |
||||
& \text{MyList} = \text{MyListF}(\text{MyListF}(\text{MyListF}(...))) \\ |
||||
\Leftrightarrow\ & \text{MyList} = \text{MyListF}(\text{MyList}) |
||||
\end{aligned} |
||||
{{< /latex >}} |
||||
|
||||
In something like Haskell, though, the above is not quite true. `MyListF` is a |
||||
non-recursive data type, with a different set of constructors to `MyList`; they aren't |
||||
_really_ equal. Instead of equality, though, we use the next-best thing: isomorphism. |
||||
|
||||
{{< latex >}} |
||||
\text{MyList} \cong \text{MyListF}(\text{MyList}) |
||||
{{< /latex >}} |
||||
|
||||
Two types are isomorphic when there exist a |
||||
{{< sidenote "right" "fix-isomorphic-note" "pair of functions, \(f\) and \(g\)," >}} |
||||
Let's a look at the types of <code>Fix</code> and <code>unFix</code>, by the way. |
||||
Suppose that we <em>did</em> define <code>MyList</code> to be <code>Fix MyListF</code>. |
||||
Let's specialize the <code>f</code> type parameter of <code>Fix</code> to <code>MyListF</code> |
||||
for a moment, and check:<br><br> |
||||
In one direction, <code>Fix :: MyListF MyList -> MyList</code><br> |
||||
And in the other, <code>unFix :: MyList -> MyListF MyList</code><br><br> |
||||
The two mutual inverses \(f\) and \(g\) fall out of the definition of the <code>Fix</code> |
||||
data type! If we didn't have to deal with the constructor cruft, this would be more |
||||
ergonomic than writing our own <code>myIn</code> and <code>myOut</code> functions. |
||||
{{< /sidenote >}} |
||||
that take you from one type to the other (and vice versa), such that applying \\(f\\) after \\(g\\), |
||||
or \\(g\\) after \\(f\\), gets you right back where you started. That is, \\(f\\) and \\(g\\) |
||||
need to be each other's inverses. For our specific case, let's call the two functions `myOut` |
||||
and `myIn` (I'm matching the naming in [this paper](https://maartenfokkinga.github.io/utwente/mmf91m.pdf)). They are not hard to define: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 28 34 >}} |
||||
|
||||
By the way, when a data type is a fixed point of some other, non-recursive type constructor, |
||||
this second type constructor is called a __base functor__. We can verify that `MyListF` is a functor |
||||
by providing an instance (which is rather straightforward): |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 36 38 >}} |
||||
|
||||
### Recursive Functions with Base Functors |
||||
One neat thing you can do with a base functor is define recursive functions on the actual data type! |
||||
|
||||
Let's go back to the very basics. When we write recursive functions, we try to think of it as solving |
||||
a problem, assuming that we are given solutions to the sub-problems that make it up. In the more |
||||
specific case of recursive functions on data types, we think of it as performing a given operation, |
||||
assuming that we know how to perform this operation on the smaller pieces of the data structure. |
||||
Some quick examples: |
||||
|
||||
1. When writing a `sum` function on a list, we assume we know how to find the sum of |
||||
the list's tail (`sum xs`), and add to it the current element (`x+`). Of course, |
||||
if we're looking at a part of a data structure that's not recursive, we don't need to |
||||
perform any work on its constituent pieces. |
||||
```Haskell |
||||
sum [] = 0 |
||||
sum (x:xs) = x + sum xs |
||||
``` |
||||
2. When writing a function to invert a binary tree, we assume that we can invert the left and right |
||||
children of a non-leaf node. We might write: |
||||
```Haskell |
||||
invert Leaf = Leaf |
||||
invert (Node l r) = Node (invert r) (invert l) |
||||
``` |
||||
|
||||
What does this have to do with base functors? Well, recall how we arrived at `MyListF` from |
||||
`MyList`: we replaced every occurrence of `MyList` in the definition with a type parameter `a`. |
||||
Let me reiterate: wherever we had a sub-list in our definition, we replaced it with `a`. |
||||
The `a` in `MyListF` marks the locations where we _would_ have to use recursion if we were to |
||||
define a function on `MyList`. |
||||
|
||||
What if instead of a stand-in for the list type (as it was until now), we use `a` to represent |
||||
the result of the recursive call on that sub-list? To finish computing the sum of the list, then, |
||||
the following would suffice: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 40 42 >}} |
||||
|
||||
Actually, this is enough to define the whole `sum` function. First things first, let's use `myOut` |
||||
to unpack one level of the `Mylist` type: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 28 28 >}} |
||||
|
||||
We know that `MyListF` is a functor; we can thus use `fmap sum` to compute the sum of the |
||||
remaining list: |
||||
|
||||
```Haskell |
||||
fmap mySum :: MyListF MyList -> MyListF Int |
||||
``` |
||||
|
||||
Finally, we can use our `mySumF` to handle the last addition: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 40 40 >}} |
||||
|
||||
Let's put all of these together: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 44 45 >}} |
||||
|
||||
Notice, though, that the exact same approach would work for _any_ function |
||||
with type: |
||||
|
||||
```Haskell |
||||
MyListF a -> a |
||||
``` |
||||
|
||||
We can thus write a generalized version of `mySum` that, instead of using `mySumF`, |
||||
uses some arbitrary function `f` with the aforementioned type: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 47 48 >}} |
||||
|
||||
Let's use `myCata` to write a few other functions: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 50 60 >}} |
||||
|
||||
#### It's just a `foldr`! |
||||
When you write a function with the type `MyListF a -> a`, you are actually |
||||
providing two things: a "base case" element of type `a`, for when you match `MyNilF`, |
||||
and a "combining function" with type `Int -> a -> a`, for when you match `MyConsF`. |
||||
We can thus define: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 64 66 >}} |
||||
|
||||
We could also go in the opposite direction, by writing: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 68 69 >}} |
||||
|
||||
Hey, what was it that we said about types with two functions between them, which |
||||
are inverses of each other? That's right, `MyListF a -> a` and `(a, Int -> a -> a)` |
||||
are isomorphic. The function `myCata`, and the "traditional" definition of `foldr` |
||||
are equivalent! |
||||
|
||||
#### Base Functors for All! |
||||
We've been playing with `MyList` for a while now, but it's kind of getting boring: |
||||
it's just a list of integers! Furthermore, we're not _really_ getting anything |
||||
out of this new "generalization" procedure -- `foldr` is part of the standard library, |
||||
and we've just reinvented the wheel. |
||||
|
||||
But you see, we haven't quite. This is because, while we've only been working with `MyListF`, |
||||
the base functor for `MyList`, our approach works for _any recursive data type_, provided |
||||
an `out` function. Let's define a type class, `Cata`, which pairs a data type `a` with |
||||
its base functor `f`, and specifies how to "unpack" `a`: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 71 72 >}} |
||||
|
||||
We can now provide a more generic version of our `myCata`, one that works for all types |
||||
with a base functor: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 74 75 >}} |
||||
|
||||
Clearly, `MyList` and `MyListF` are one instance of this type class: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 77 78 >}} |
||||
|
||||
We can also write a base functor for Haskell's built-in list type, `[a]`: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 80 84 >}} |
||||
|
||||
We can use our `cata` function for regular lists to define a generic `sum`: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 86 89 >}} |
||||
|
||||
It works perfectly: |
||||
|
||||
``` |
||||
ghci> Main.sum [1,2,3] |
||||
6 |
||||
ghci> Main.sum [1,2,3.0] |
||||
6.0 |
||||
ghci> Main.sum [1,2,3.0,-1] |
||||
5.0 |
||||
``` |
||||
|
||||
What about binary trees, which served as our second example of a recursive data structure? |
||||
We can do that, too: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 91 96 >}} |
||||
|
||||
Given this, here's an implementation of that `invert` function we mentioned earlier: |
||||
|
||||
{{< codelines "Haskell" "catamorphisms/Cata.hs" 98 101 >}} |
||||
|
||||
#### What About `Foldable`? |
||||
If you've been around the Haskell ecosystem, you may know the `Foldable` type class. |
||||
Isn't this exactly what we've been working towards here? No, not at all. Take |
||||
a look at how the documentation describes [`Data.Foldable`](https://hackage.haskell.org/package/base-4.16.1.0/docs/Data-Foldable.html): |
||||
|
||||
> The Foldable class represents data structures that can be reduced to a summary value one element at a time. |
||||
|
||||
One at a time, huh? Take a look at the signature of `foldMap`, which is sufficient for |
||||
an instance of `Foldable`: |
||||
|
||||
```Haskell |
||||
foldMap :: Monoid m => (a -> m) -> t a -> m |
||||
``` |
||||
|
||||
A `Monoid` is just a type with an associative binary operation that has an identity element. |
||||
Then, `foldMap` simply visits the data structure in order, and applies this binary operation |
||||
pairwise to each monoid produced via `f`. Alas, |
||||
this function is not enough to be able to implement something like inverting a binary tree; |
||||
there are different configurations of binary tree that, when visited in-order, result in the |
||||
same sequence of elements. For example: |
||||
|
||||
``` |
||||
ghci> fold (Node "Hello" Leaf (Node ", " Leaf (Node "World!" Leaf Leaf))) |
||||
"Hello, World!" |
||||
ghci> fold (Node "Hello" (Node ", " Leaf Leaf) (Node "World!" Leaf Leaf)) |
||||
"Hello, World!" |
||||
``` |
||||
|
||||
As far as `fold` (which is just `foldMap id`) is concerned, the two trees are equivalent. They |
||||
are very much not equivalent for the purposes of inversion! Thus, whereas `Foldable` helps |
||||
us work with list-like data types, the `Cata` type class lets us express _any_ function on |
||||
a recursive data type similarly to how we'd do it with `foldr` and lists. |
||||
|
||||
#### Catamorphisms |
||||
Why is the type class called `Cata`, and the function `cata`? Well, a function that |
||||
performs a computation by recursively visiting the data structure is called a catamorphism. |
||||
Indeed, `foldr f b`, for function `f` an "base value" `b` is an example of a list catamorophism. |
||||
It's a fancy word, and there are some fancier descriptions of what it is, especially when |
||||
you step into category theory (check out the [Wikipedia entry](https://en.wikipedia.org/wiki/Catamorphism) |
||||
if you want to know what I mean). However, for our purposes, a catamorphism is just a generalization |
||||
of `foldr` from lists to any data type! |
Loading…
Reference in new issue