blog-static/content/blog/search_polynomials.md

24 KiB

title date tags discussionRooms
Search as a Polynomial 2023-05-22T21:39:00-07:00
Mathematics
!qLoehEvJNRndNrdlyU:matrix.org

I read a really neat paper some time ago, and I've been wanting to write about it ever since. The paper is called Algebras for Weighted Search, and it is a tad too deep to dive into in a blog article -- readers of ICFP are rarely the target audience on this site. However, one particular insight I gleaned from the paper merits additional discussion and demonstration. I'm going to do that here.

In particular, the paper pointed out a connection between polynomials and a general concept of search. In the context of the paper, "search" simply referred to a way of finding various solutions to some problem, perhaps like "what are the ways of getting from one place to another?". In this case, a search would be a computation that explores the space of possible routes.

That all sounds very abstract, so let's start with a concrete example. Suppose that you're trying to get from city A to city B, and then from city B to city C. Also suppose that your trips are measured in one-hour intervals (maybe you round trip lengths, turning 2:45 into 3 hours), and that trips of equal duration are considered equivalent ("as long as it gets me there!"). Now, I give you a list of possible routes from city A to city B, and another list of possible routes from city B to city C, grouped by their length. Given these two lists, what are the possible routes from A to C?

Let's make this even more concrete, and start with some actual lists of routes. Maybe there are two routes from A to B that take two hours each, and one "quick" trip that takes only an hour. On top of this, there's one three-hour trip from B to C, and one two-hour trip. Given these building blocks, the list of possible trips from A to C is as follows.

  1. Two two-hour trips from A to B, followed up by the three-hour trip from B to C.
  2. Two two-hour trips from A to B, followed by the shorter two-hour trip from B to C.
  3. One one-hour trip from A to B, followed by the three-hour trip from B to C.
  4. One one-hour trip from A to B, followed by the shorter two-hour trip from B to C.

In the above, to figure out the various ways of getting from A to C, we had to examine all pairings of A-to-B routes with B-to-C routes. But then, multiple pairings end up having the same total length: the second and third bullet points both describe trips that take four hours. Thus, to give our final report, we need to "combine like terms" - add up the trips from the two matching bullet points, ending up with total of three four-hour trips.

Does this feel a little bit familiar? To me, this bears a rather striking resemblance to an operation we've seen in high school algebra class: we're multiplying two binomials! Here's the corresponding multiplication:

{{< latex >}} \left(2x^2 + x\right)\left(x^3+x^2\right) = 2x^5 + 2x^4 + x^4 + x^3 = \underline{2x^5+3x^4+x^3} {{< /latex >}}

It's not just binomials that correspond to our combining paths between cities. We can represent any combination of trips of various lengths as a polynomial. Each term \(ax^n\) represents \(a\) trips of length \(n\). As we just saw, multiplying two polynomials corresponds to "sequencing" the trips they represent -- matching each trip in one with each of the trips in the other, and totaling them up.

What about adding polynomials, what does that correspond to? The answer there is actually quite simple: if two polynomials both represent (distinct) lists of trips from A to B, then adding them just combines the list. If I know one trip that takes two hours (\(x^2\)) and someone else knows a shortcut (\(x\)), then we can combine that knowledge (\(x^2+x\)).

{{< dialog >}} {{< message "question" "reader" >}} Wait a moment. Sure, we learned about polynomials in algebra class: they're functions! You put in a number for x, and get another number out. But you haven't done that, and in fact you haven't even mentioned functions at all. What's going on? {{< /message >}} {{< message "answer" "Daniel" >}} In this article (and in the paper it's based on), polynomials are viewed in a more general way than you might be used to. The point isn't to think of them as defining functions on numbers, but to make use of their "shape": a sum of certain powers of x, like ax^n+bx^m+... {{< /message >}} {{< message "question" "reader" >}} So we won't be plugging numbers in, or trying to graph the polynomials in this section? {{< /message >}} {{< message "answer" "Daniel" >}} That's right, we won't be. The sort of thing we're doing here is a bit closer to abstract algebra than to high school math. Don't worry if you're not familiar with the subject, though: I'm trying to explain everything from first principles. {{< /message >}} {{< /dialog >}}

Well, it's a neat little thing that tracking trips corresponds to adding and mulitpying polynomials like that. We can push this observation a bit further, though. Since our trick relies on multiplying two polynomials, we'll need to better understand what that multiplication needs to behave as we expect. In particular, we'll need to know what the "bare minimum" is for working with polynomial: what arithmetic properties must we bring to the table? Let's take a look at that next.

Polynomials over Semirings

Let's watch what happens when we multiply two binomials, paying really close attention to the operations we're performing. The following (concrete) example should do.

{{< latex >}} \begin{aligned} & (x+1)(1-x)\ =\ & (x+1)1+(x+1)(-x)\ =\ & x+1-x^2-x \ =\ & x-x+1-x^2 \ =\ & 1-x^2 \end{aligned} {{< /latex >}}

The first thing we do is distribute the multiplication over the addition, on the left. We then do that again, on the right this time. After this, we finally get some terms, but they aren't properly grouped together; an \(x\) is at the front, and a \(-x\) is at the very back. We use the fact that addition is commutative (\(a+b=b+a\)) and associative (\(a+(b+c)=(a+b)+c\)) to rearrange the equation, grouping the \(x\) and its negation together. This gives us \((1-1)x=0x=0\). That last step is important: we've used the fact that multiplication by zero gives zero. Another important property (though we didn't use it here) is that multiplication has to be associative, too.

So, what if we didn't use numbers, but rather any thing with two operations, one kind of like \((\times)\) and one kind of like \((+)\)?

{{< dialog >}} {{< message "question" "reader" >}} Here, it seems like you're saying that in the polynomials we've seen so far, it's numbers themselves that need to be commutative, associative, etc.. {{< /message >}} {{< message "answer" "Daniel" >}} That's right, I am saying that. We need the (+) and (\times) operations on numbers to follow the laws I laid out above. {{< /message >}} {{< message "question" "reader" >}} Okay, but in your equations above, it's not just numbers that were moved around using commutativity and associativity: it was variables, like x. Just earlier you said that we're thinking of the polynomials in terms of their "shape", and not as functions. If that's the case, why we allowed to blur the lines between polynomial and number like that? {{< /message >}} {{< message "answer" "Daniel" >}} Good question. If you want to get really precise, in the abstract view, adding numbers is not quite the same as adding polynomials. Because of this, saying that addition commutes for numbers does not immediately tel us that it commutes for something like x. However, also in the abstract view, we define how addition and multiplication on polynomials work using addition and multiplication numbers. Thus, properties of numbers make their way into properties of polynomials. {{< /message >}} {{< /dialog >}}

As I was saying, what if we used some other kind of thing other than numbers, together with notions of what it means to "add" and "multiply" this thing? As long as these operations satisfy the properties we have used so far, we should be able to create polynomials using them, and do this same sort of "combining paths" we did earlier. Before we get to that, let me just say that "things with addition and multiplication that work in the way we described" have an established name in math - they're called semirings.

A semiring is a set equipped with two operations, one called "multiplicative" (and thus carrying the symbol \(\times)\) and one called "additive" (and thus written as \(+\)). Both of these operations need to have an "identity element". The identity element for multiplication is usually {{< sidenote "right" "written-as-note" "written as 1," >}} And I do mean "written as": a semiring need not be over numbers. We could define one over graphs, sets, and many other things! Nevertheless, because most of us learn the properties of addition and multiplication much earlier than we learn about other more "esoteric" things, using numbers to stand for special elements seems to help use intuition. {{< /sidenote >}} and the identity element for addition is written as \(0\). Furthermore, a few equations hold. I'll present them in groups. First, multiplication is associative and multiplying by \(1\) does nothing; in mathematical terms, the set forms a monoid with multiplication and \(1\). {{< latex >}} \begin{array}{cl} (a\times b)\times c = a\times(b\times c) & \text{(multiplication associative)}\ 1\times a = a = a \times 1 & \text{(1 is multiplicative identity)}\ \end{array} {{< /latex >}}

Similarly, addition is associative and adding \(0\) does nothing. Addition must also be commutative; in other words, the set forms a commutative monoid with addition and \(0\). {{< latex >}} \begin{array}{cl} (a+b)+c = a+(b+c) & \text{(addition associative)}\ 0+a = a = a+0 & \text{(0 is additive identity)}\ a+b = b+a & \text{(addition is commutative)}\ \end{array} {{< /latex >}}

Finally, a few equations determine how addition and multiplication interact. {{< latex >}} \begin{array}{cl} 0\times a = 0 = a \times 0 & \text{(annihilation)}\ a\times(b+c) = a\times b + a\times c & \text{(left distribution)}\ (a+b)\times c = a\times c + b\times c & \text{(right distribution)}\ \end{array} {{< /latex >}}

That's it, we've defined a semiring. First, notice that numbers do indeed form a semiring; all the equations above should be quite familiar from algebra class. When using polynomials with numbers to do our city path finding, we end up tracking how many different ways there are to get from one place to another in a particular number of hours. There are, however, other semirings we can use that yield interesting results, even though we continue to add and multiply polynomials.

One last thing before we look at other semirings: given a semiring \(R\), the polynomials using that \(R\), and written in terms of the variable \(x\), are denoted as \(R[x]\).

The Semiring of Booleans, \(\mathbb{B}\)

Alright, it's time for our first non-number example. It will be a simple one, though - booleans (that's right, true and false from your favorite programming language!) form a semiring. In this case, addition is the "or" operation (aka ||), in which the result is true if either operand is true, and false otherwise.

{{< latex >}} \begin{array}{c} \text{true} + b = \text{true}\ b + \text{true} = \text{true}\ \text{false} + \text{false} = \text{false} \end{array} {{< /latex >}}

For addition, the identity element -- our \(0\) -- is \(\text{false}\).

Correspondingly, multiplication is the "and" operation (aka &&), in which the result is false if either operand is false, and true otherwise.

{{< latex >}} \begin{array}{c} \text{false} \times b = \text{false}\ b \times \text{false} = \text{false}\ \text{true} \times \text{true} = \text{true} \end{array} {{< /latex >}}

For multiplication, the identity element -- the \(1\) -- is \(\text{true}\).

It's not hard to see that both operations are commutative - the first and second equations for addition, for instance, can be combined to get \(\text{true}+b=b+\text{true}\), and the third equation clearly shows commutativity when both operands are false. The other properties are easy enough to verify by simple case analysis (there are 8 cases to consider). The set of booleans is usually denoted as \(\mathbb{B}\), which means polynomials using booleans are denoted by \(\mathbb{B}[x]\).

Let's try some examples. We can't count how many ways there are to get from A to B in a certain number of hours anymore: booleans aren't numbers! Instead, what we can do is track whether or not there is a way to get from A to B in a certain number of hours (call it \(n\)). If we can, we write that as \(\text{true}\ x^n = 1x^n = x^n\). If we can't, we write that as \(\text{false}\ x^n = 0x^n = 0\). The polynomials corresponding to our introductory problem are \(x^2+x^1\) and \(x^3+x^2\). Multiplying them out gives:

{{< latex >}} (x^2+x^1)(x^3+x^2) = x^5 + x^4 + x^4 + x^3 = x^5 + x^4 + x^2 {{< /latex >}}

And that's right; if it's possible to get from A to B in either two hours or one hour, and then from B to C in either three hours or two hours, then it's possible to get from A to C in either five, four, or three hours. In a way, polynomials like this give us {{< sidenote "right" "homomorphism-note" "less information than our original ones" >}} In fact, we can construct a semiring homomorphism (kind of like a ring homomorphism, but for semirings) from \mathbb{N}[x] to \mathbb{B}[x] as follows:

{{< latex >}} \sum_{i=0}^n a_ix^i \mapsto \sum_{i=0}^n \text{clamp}(a_i)x^i {{< /latex >}}

Where the \text{clamp} function checks if its argument is non-zero. In the case of city path search, \text{clamp} asks the questions "are there any routes at all?".

{{< latex >}} \text{clamp}(n) = \begin{cases} \text{false} & n = 0 \ \text{true} & n > 0 \end{cases} {{< /latex >}}

We can't construct the inverse of the above homomorphism (a mapping that would undo our clamping, and take polynomials in \mathbb{B}[x] to \mathbb{N}[x]). This fact gives us a more "mathematical" confirmation that we lost information, rather than gained it, but switching to boolean polynomials: we can always recover a boolean polynomial from the natural number one, but not the other way around. {{< /sidenote >}} (which were \(\mathbb{N}[x]\), polynomials over natural numbers \(\mathbb{N} = \{ 0, 1, 2, ... \}\)), so it's unclear why we'd prefer them. However, we're just warming up - there are more interesting semirings for us to consider!

The Semiring of Sets of Paths, \(\mathcal{P}(\Pi)\)

Until now, we explicitly said that "all paths of the same length are equivalent". If we're giving directions, though, we might benefit from knowing not just that there is a way, but what roads that way is made up of!

To this end, we define the set of paths, \(\Pi\). This set will consist of the empty path (which we will denote \(\circ\), why not?), street names (e.g. \(\text{Mullholland Dr.}\) or \(\text{Sunset Blvd.}\)), and concatenations of paths, written using \(\rightarrow\). For instance, a path that first takes us on \(\text{Highway}\) and then on \(\text{Exit 4b}\) will be written as:

{{< latex >}} \text{Highway}\rightarrow\text{Exit 4b} {{< /latex >}}

Furthermore, it's not too much of a stretch to say that adding an empty path to the front or the back of another path doesn't change it. If we use the letter \(\pi\) to denote a path, this means the following equation:

{{< latex >}} \circ \rightarrow \pi = \pi = \pi \rightarrow \circ {{< /latex >}}

{{< sidenote "right" "paths-monoid-note" "So those are paths." 0.25 >}} Actually, if you clicked through the monoid link earlier, you might be interested to know that paths as defined here form a monoid with concatenation \rightarrow and the empty path \circ as a unit. {{< /sidenote >}} Paths alone, though, aren't enough for our polynomials; we're tracking different ways to get from one place to another. This is an excellent use case for sets!

Our next semiring will be that of sets of paths. Some example elements of this semiring are \(\varnothing\), also known as the empty set, \(\{\circ\}\), the set containing only the empty path, and the set containing a path via the highway, and another path via the suburbs:

{{< latex >}} {\text{Highway}\rightarrow\text{Exit 4b}, \text{Suburb Rd.}} {{< /latex >}}

So what are the addition and multiplication on sets of paths? Addition is the easier one: it's just the union of sets (the "triangle equal sign" symbol means "defined as"):

{{< latex >}} A + B \triangleq A \cup B {{< /latex >}}

It's well known (and not hard to verify) that set union is commutative and associative. The additive identity \(0\) is simply the empty set \(\varnothing\). Intuitively, adding "no paths" to another set of paths doesn't add anything, and thus leaves that other set unchanged.

Multiplication is a little bit more interesting, and uses the path concatenation operation we defined earlier. We will use this operation to describe path sequencing; given two sets of paths, \(A\) and \(B\), we'll create a new set of paths consisting of each path from \(A\) concatenated with each path from \(B\):

{{< latex >}} A \times B \triangleq { a \rightarrow b\ |\ a \in A, b \in B } {{< /latex >}}

The fact that this definition of multiplication on sets is associative relies on the associativity of path concatenation; if path concatenation weren't associative, the second equality below would not hold.

{{< latex >}} \begin{array}{rcl} A \times (B \times C) & = & { a \rightarrow (b \rightarrow c)\ |\ a \in A, b \in B, c \in C } \ & \stackrel{?}{=} & { (a \rightarrow b) \rightarrow c \ |\ a \in A, b \in B, c \in C } \ & = & (A \times B) \times C \end{array} {{< /latex >}}

What's the multiplicative identity? Well, since multiplication concatenates all the combinations of paths from two sets, we could try making a set of elements that don't do anything when concatenating. Sound familiar? It should, that's \(\circ\), the empty path element! We thus define our multiplicative identity as \(\{\circ\}\), and verify that it is indeed the identity:

{{< latex >}} \begin{gathered} {\circ} \times A = { \circ \rightarrow a\ |\ a \in A } = { a \ |\ a \in A } = A \ A \times {\circ}= { a\rightarrow \circ \ |\ a \in A } = { a \ |\ a \in A } = A \end{gathered} {{< /latex >}}

It's not too difficult to verify the annihilation and distribution laws for sets of paths, either; I won't do that here, though. Finally, let's take a look at an example. Like before, we'll try make one that corresponds to our introductory description of paths from A to B and from B to C. Now we need to be a little bit creative, and come up with names for all these different roads between our hypothetical cities. Let's say that \(\text{Highway A}\) and \(\text{Highway B}\) are the two paths from A to B that take two hours each, and then \(\text{Shortcut}\) is the path that takes one hour. As for paths from B to C, let's just call them \(\text{Long}\) for the three-hour path, and \(\text{Short}\) for the two-hour path. Our two polynomials are then:

{{< latex >}} \begin{array}{rcl} P_1 & = & {\text{Highway A}, \text{Highway B}}x^2 + {\text{Shortcut}}x \ P_2 & = & {\text{Long}}x^3 + {\text{Short}}x^2 \end{array} {{< /latex >}}

Multiplying them gives: {{< latex >}} \begin{array}{rl} & {\text{Highway A} \rightarrow \text{Long}, \text{Highway B} \rightarrow \text{Long}}x^5\ + & {\text{Highway A} \rightarrow \text{Short}, \text{Highway B} \rightarrow \text{Short}, \text{Shortcut} \rightarrow \text{Long}}x^4\ + & {\text{Shortcut} \rightarrow \text{Short}}x^3 \end{array} {{< /latex >}}

This resulting polynomial gives us all the paths from city A to city C, grouped by their length!

The Tropical Semiring, \(\mathbb{R}\)

I only have one last semiring left to show you. It's a fun semiring though, as even its name might suggest: we'll take a look at a tropical semiring.

In this semiring, we go back to numbers; particularly, real numbers (e.g., \(1.34\), \(163\), \(e\), that kind of thing). We even use addition -- sort of. In the tropical semiring, addition serves as the multiplicative operation! This is even confusing to write, so I'm going to switch up notation; in the rest of this section, I'll use \(\otimes\) to represent the multiplicative operation in semirings, and \(\oplus\) to represent the additive one. The symbols \(\times\) and \(+\) will be used to represent the regular operations on real numbers. With that, the operations on our tropical semiring over real numbers are defined as follows:

{{< latex >}} \begin{array}{rcl} x \otimes y & \triangleq & x + y\ x \oplus y & \triangleq & \min(x,y) \end{array} {{< /latex >}}

What is this new semiring good for? How about this: suppose that in addition to the duration of the trip, you'd like to track the distance you must travel for each route (shorter routes do sometimes have more traffic!). Let's watch what happens when we add and multiply polynomials over this semiring. When we add terms with the same power but different coefficients, like \(ax\oplus bx\), we end up with a term \(\min(a,b)x\). In other words, for each trip duration, we pick the shortest length. When we multiply two polynomials, like \(ax\otimes bx\), we get \((a+b)x\); in other words, when sequencing two trips, we add up the distances to get the combined distance, just like we'd expect.

We can, of course, come up with a polynomial to match our initial example. Say that the trips from A to B are represented by \(2.0x^2\oplus1.5x\) (the shortest two-hour trip is \(2\) units of distance long, and the one-hour trip is \(1.5\) units long), and that the trips from B to C are represented by \(4.0x^3\oplus1.0x^2\). Multiplying the two polynomials out gives:

{{< latex >}} \begin{array}{rcl} (2.0x^2\oplus1.5x)(4.0x^3\oplus1.0x^2) & = & 6.0x^5 \oplus \min(2.0+1.0, 1.5+4.0)x^4 \oplus 2.5x^3 \ & = & 6.0x^5 \oplus 3.0x^4 \oplus 2.5x^3 \end{array} {{< /latex >}}

The only time we used the additive operation in this case was to pick between two trips of equal druation but different length (two-hour trip from A to B followed by a two-hour trip from B to C, or one-hour trip from A to C followed by a three-hour trip from B to C). The first trip wins out, since it requires only \(3.0\) units of distance.

Anything but Routes

So far, all I've done can be reduced to variations on a theme: keeping track of some aspects of a trip between cities, using polynomials for structure. However, that's just the beginning. This sort of trick can be be made even more powerful by further relaxing the notion of a "polynomial". By doing so, we can make our polynomials represent arbitrary effects (in the computer science sense -- things like errors, logging to a console, storing and accessing information from a database). Relying for just a little longer on our example of journeys between cities, we might be able to represent trips with random variation (traffic can be unpredicatable!), or maybe cities where you will get stuck. But the point isn't routes: the same approach can be used to represent traversing a binary tree, performing Prolog-like proof search, or evaluating a non-deterministic program. The sky's the limit!

Unfortunately, doing so would require even more background and buildup, for which I just don't have space for in this article. I'll save these things for next time, though -- stay tuned!