420 lines
19 KiB
Markdown
420 lines
19 KiB
Markdown
---
|
|
title: "Search as a Polynomial"
|
|
date: 2022-10-22T14:51:15-07:00
|
|
draft: true
|
|
tags: ["Mathematics"]
|
|
---
|
|
|
|
I read a really neat paper some time ago, and I've been wanting to write about
|
|
it ever since. The paper is called [Algebras for Weighted Search](https://dl.acm.org/doi/pdf/10.1145/3473577),
|
|
and it is a tad too deep to dive into in a blog article -- readers of ICFP are
|
|
rarely the target audience on this site. However, one particular insight I
|
|
gleaned from the paper merits additional discussion and demonstration. I'm
|
|
going to do that here.
|
|
|
|
We can start with something concrete. Suppose that you're trying to get from
|
|
city A to city B, and then from city B to city C. Also suppose that your
|
|
trips are measured in one-hour intervals, and that trips of equal duration are
|
|
considered equivalent. Given possible routes from A to B, and then given more
|
|
routes from B to C, what are the possible routes from A to C you can build up?
|
|
|
|
In many cases, starting with an example helps build intuition. Maybe there
|
|
are two routes from A to B that take two hours each, and one "quick" trip
|
|
that takes only an hour. On top of this, there's one three-hour trip from B
|
|
to C, and one two-hour trip. Given these building blocks, the list of
|
|
possible trips from A to C is as follows.
|
|
|
|
1. Two two-hour trips from A to B, followed up by the three-hour trip from B to
|
|
C.
|
|
2. Two two-hour trips from A to B, followed by the shorter two-hour trip from B
|
|
to C.
|
|
3. One one-hour trip from A to B, followed by the three-hour trip from B to C.
|
|
4. One one-hour trip from A to B, followed by the shorter two-hour trip from B to C.
|
|
|
|
In the above, to figure out the various ways of getting from A to C, we had to
|
|
examine all pairings of A-to-B routes with B-to-C routes. But then, multiple
|
|
pairings end up having the same total length: the second and third bullet
|
|
points both describe trips that take four hours. Thus, to give
|
|
our final report, we need to "combine like terms" - add up the trips from
|
|
the two matching bullet points, ending up with total of three four-hour trips.
|
|
|
|
Does this feel a little bit familiar? To me, this bears a rather striking
|
|
resemblance to an operation we've seen in algebra class: we're multiplying
|
|
two binomials! Here's the corresponding multiplication:
|
|
|
|
{{< latex >}}
|
|
\left(2x^2 + x\right)\left(x^3+x^2\right) = 2x^5 + 2x^4 + x^4 + x^3 = \underline{2x^5+3x^4+x^3}
|
|
{{< /latex >}}
|
|
|
|
It's not just binomials that correspond to our combining paths between cities.
|
|
We can represent any combination of trips of various lengths as a polynomial.
|
|
Each term \\(ax^n\\) represents \\(a\\) trips of length \\(n\\). As we just
|
|
saw, multiplying two polynomials corresponds to "sequencing" the trips they
|
|
represent -- matching each trip in one with each of the trips in the other,
|
|
and totaling them up.
|
|
|
|
What about adding polynomials, what does that correspond to? The answer there
|
|
is actually quite simple: if two polynomials both represent (distinct) lists of
|
|
trips from A to B, then adding them just combines the list. If I know one trip
|
|
that takes two hours (\\(x^2\\)) and someone else knows a shortcut (\\(x\\\)),
|
|
then we can combine that knowledge (\\(x^2+x\\)).
|
|
|
|
Well, that's a neat little thing. But we can push this observation a bit
|
|
further. To generalize what we've already seen, however, we'll need to
|
|
figure out "the bare minimum" of what we need to make polynomial
|
|
multiplication work as we'd expect.
|
|
|
|
### Polynomials over Semirings
|
|
Let's watch what happens when we multiply two binomials, paying really close
|
|
attention to the operations we're performing. The following (concrete)
|
|
example should do.
|
|
|
|
{{< latex >}}
|
|
\begin{aligned}
|
|
& (x+1)(1-x)\\
|
|
=\ & (x+1)1+(x+1)(-x)\\
|
|
=\ & x+1-x^2-x \\
|
|
=\ & x-x+1-x^2 \\
|
|
=\ & 1-x^2
|
|
\end{aligned}
|
|
{{< /latex >}}
|
|
|
|
The first thing we do is _distribute_ the multiplication over the addition, on
|
|
the left. We then do that again, on the right this time. After this, we finally
|
|
get some terms, but they aren't properly grouped together; an \\(x\\) is at the
|
|
front, and a \\(-x\\) is at the very back. We use the fact that addition is
|
|
_commutative_ (\\(a+b=b+a\\)) and _associative_ (\\(a+(b+c)=(a+b)+c\\)) to
|
|
rearrange the equation, grouping the \\(x\\) and its negation together. This
|
|
gives us \\((1-1)x=0x=0\\). That last step is important: we've used the fact
|
|
that multiplication by zero gives zero. Another important property (though
|
|
we didn't use it here) is that multiplication has to be associative, too.
|
|
|
|
So, what if we didn't use numbers, but rather any _thing_ with two
|
|
operations, one kind of like \\((\\times)\\) and one kind of like \\((+)\\)?
|
|
As long as these operations satisfy the properties we have used so far, we
|
|
should be able to create polynomials using them, and do this same sort of
|
|
"combining paths" we did earlier. Before we get to that, let me just say
|
|
that "things with addition and multiplication that work in the way we
|
|
described" have an established name in math - they're called semirings.
|
|
|
|
A __semiring__ is a set equipped with two operations, one called
|
|
"multiplicative" (and thus carrying the symbol \\(\\times)\\) and one
|
|
called "additive" (and thus written as \\(+\\)). Both of these operations
|
|
need to have an "identity element". The identity element for multiplication
|
|
is usually
|
|
{{< sidenote "right" "written-as-note" "written as \(1\)," >}}
|
|
And I do mean "written as": a semiring need not be over numbers. We could
|
|
define one over <a href="https://en.wikipedia.org/wiki/Graph">graphs</a>,
|
|
sets, and many other things! Nevertheless, because most of us learn the
|
|
properties of addition and multiplication much earlier than we learn about
|
|
other more "esoteric" things, using numbers to stand for special elements
|
|
seems to help use intuition.
|
|
{{< /sidenote >}}
|
|
and the identity element for addition is written
|
|
as \\(0\\). Furthermore, a few equations hold. I'll present them in groups.
|
|
First, multiplication is associative and multiplying by \\(1\\) does nothing;
|
|
in mathematical terms, the set forms a [monoid](https://mathworld.wolfram.com/Monoid.html)
|
|
with multiplication and \\(1\\).
|
|
{{< latex >}}
|
|
\begin{array}{cl}
|
|
(a\times b)\times c = a\times(b\times c) & \text{(multiplication associative)}\\
|
|
1\times a = a = a \times 1 & \text{(1 is multiplicative identity)}\\
|
|
\end{array}
|
|
{{< /latex >}}
|
|
|
|
Similarly, addition is associative and adding \\(0\\) does nothing.
|
|
Addition must also be commutative; in other words, the set forms a
|
|
commutative monoid with addition and \\(0\\).
|
|
{{< latex >}}
|
|
\begin{array}{cl}
|
|
(a+b)+c = a+(b+c) & \text{(addition associative)}\\
|
|
0+a = a = a+0 & \text{(0 is additive identity)}\\
|
|
a+b = b+a & \text{(addition is commutative)}\\
|
|
\end{array}
|
|
{{< /latex >}}
|
|
|
|
Finally, a few equations determine how addition and multiplication interact.
|
|
{{< latex >}}
|
|
\begin{array}{cl}
|
|
0\times a = 0 = a \times 0 & \text{(annihilation)}\\
|
|
a\times(b+c) = a\times b + a\times c & \text{(left distribution)}\\
|
|
(a+b)\times c = a\times c + b\times c & \text{(right distribution)}\\
|
|
\end{array}
|
|
{{< /latex >}}
|
|
|
|
That's it, we've defined a semiring. First, notice that numbers do indeed
|
|
form a semiring; all the equations above should be quite familiar from algebra
|
|
class. When using polynomials with numbers to do our city path finding,
|
|
we end up tracking how many different ways there are to get from one place to
|
|
another in a particular number of hours. There are, however, other semirings
|
|
we can use that yield interesting results, even though we continue to add
|
|
and multiply polynomials.
|
|
|
|
One last thing before we look at other semirings: given a semiring \\(R\\),
|
|
the polynomials using that \\(R\\), and written in terms of the variable
|
|
\\(x\\), are denoted as \\(R[x]\\).
|
|
|
|
|
|
#### The Semiring of Booleans, \\(\\mathbb{B}\\)
|
|
Alright, it's time for our first non-number example. It will be a simple one,
|
|
though - booleans (that's right, `true` and `false` from your favorite
|
|
programming language!) form a semiring. In this case, addition is the
|
|
"or" operation (aka `||`), in which the result is true if either operand
|
|
is true, and false otherwise.
|
|
|
|
{{< latex >}}
|
|
\begin{array}{c}
|
|
\text{true} + b = \text{true}\\
|
|
b + \text{true} = \text{true}\\
|
|
\text{false} + \text{false} = \text{false}
|
|
\end{array}
|
|
{{< /latex >}}
|
|
|
|
For addition, the identity element -- our \\(0\\) -- is \\(\\text{false}\\).
|
|
|
|
Correspondingly, multiplication is the "and" operation (aka `&&`), in which the
|
|
result is false if either operand is false, and true otherwise.
|
|
|
|
{{< latex >}}
|
|
\begin{array}{c}
|
|
\text{false} \times b = \text{false}\\
|
|
b \times \text{false} = \text{false}\\
|
|
\text{true} \times \text{true} = \text{true}
|
|
\end{array}
|
|
{{< /latex >}}
|
|
|
|
For multiplication, the identity element -- the \\(1\\) -- is \\(\\text{true}\\).
|
|
|
|
It's not hard to see that _both_ operations are commutative - the first and
|
|
second equations for addition, for instance, can be combined to get
|
|
\\(\\text{true}+b=b+\\text{true}\\), and the third equation clearly shows
|
|
commutativity when both operands are false. The other properties are
|
|
easy enough to verify by simple case analysis (there are 8 cases to consider).
|
|
The set of booleans is usually denoted as \\(\\mathbb{B}\\), which means
|
|
polynomials using booleans are denoted by \\(\\mathbb{B}[x]\\).
|
|
|
|
Let's try some examples. We can't count how many ways there are to get from
|
|
A to B in a certain number of hours anymore: booleans aren't numbers!
|
|
Instead, what we _can_ do is track _whether or not_ there is a way to get
|
|
from A to B in a certain number of hours (call it \\(n\\)). If we can,
|
|
we write that as \\(\text{true}\ x^n = 1x^n = x^n\\). If we can't, we write
|
|
that as \\(\\text{false}\ x^n = 0x^n = 0\\). The polynomials corresponding
|
|
to our introductory problem are \\(x^2+x^1\\) and \\(x^3+x^2\\). Multiplying
|
|
them out gives:
|
|
|
|
{{< latex >}}
|
|
(x^2+x^1)(x^3+x^2) = x^5 + x^4 + x^4 + x^3 = x^5 + x^4 + x^2
|
|
{{< /latex >}}
|
|
|
|
And that's right; if it's possible to get from A to B in either two hours
|
|
or one hour, and then from B to C in either three hours or two hours, then
|
|
it's possible to get from A to C in either five, four, or three hours. In a
|
|
way, polynomials like this give us
|
|
{{< sidenote "right" "homomorphism-note" "less information than our original ones" >}}
|
|
In fact, we can construct a semiring homomorphism (kind of like a
|
|
<a href="https://en.wikipedia.org/wiki/Ring_homomorphism">ring homomorphism</a>,
|
|
but for semirings) from \(\mathbb{N}[x]\) to \(\mathbb{B}[x]\) as follows:
|
|
|
|
{{< latex >}}
|
|
\sum_{i=0}^n a_ix^i \mapsto \sum_{i=0}^n \text{clamp}(a_i)x^i
|
|
{{< /latex >}}
|
|
|
|
Where the \(\text{clamp}\) function checks if its argument is non-zero.
|
|
In the case of city path search, \(\text{clamp}\) asks the questions
|
|
"are there any routes at all?".
|
|
|
|
{{< latex >}}
|
|
\text{clamp}(n) = \begin{cases}
|
|
\text{false} & n = 0 \\
|
|
\text{true} & n > 0
|
|
\end{cases}
|
|
{{< /latex >}}
|
|
|
|
We can't construct the inverse of the above homomorphism (a mapping
|
|
that would undo our clamping, and take polynomials in \(\mathbb{B}[x]\) to
|
|
\(\mathbb{N}[x]\)). This fact gives us a more "mathematical" confirmation
|
|
that we lost information, rather than gained it, but switching to
|
|
boolean polynomials: we can always recover a boolean polynomial from the
|
|
natural number one, but not the other way around.
|
|
{{< /sidenote >}}
|
|
(which were \\(\\mathbb{N}[x]\\), polynomials over natural numbers \\(\\mathbb{N} = \\{ 0, 1, 2, ... \\}\\)), so it's unclear why we'd prefer them. However,
|
|
we're just warming up - there are more interesting semirings for us to
|
|
consider!
|
|
|
|
#### The Semiring of Sets of Paths, \\(\\mathcal{P}(\\Pi)\\)
|
|
Until now, we explicitly said that "all paths of the same length are
|
|
equivalent". If we're giving directions, though, we might benefit
|
|
from knowing not just that there _is_ a way, but what roads that
|
|
way is made up of!
|
|
|
|
To this end, we define the set of paths, \\(\\Pi\\). This set will consist
|
|
of the empty path (which we will denote \\(\\circ\\), why not?), street
|
|
names (e.g. \\(\\text{Mullholland Dr.}\\) or \\(\\text{Sunset Blvd.}\\)), and
|
|
concatenations of paths, written using \\(\\rightarrow\\). For instance,
|
|
a path that first takes us on \\(\\text{Highway}\\) and then on
|
|
\\(\\text{Exit 4b}\\) will be written as:
|
|
|
|
{{< latex >}}
|
|
\text{Highway}\rightarrow\text{Exit 4b}
|
|
{{< /latex >}}
|
|
|
|
Furthermore, it's not too much of a stretch to say that adding an empty path
|
|
to the front or the back of another path doesn't change it. If we use
|
|
the letter \\(\\pi\\) to denote a path, this means the following equation:
|
|
|
|
{{< latex >}}
|
|
\circ \rightarrow \pi = \pi = \pi \rightarrow \circ
|
|
{{< /latex >}}
|
|
|
|
{{< sidenote "right" "paths-monoid-note" "So those are paths." 0.25 >}}
|
|
Actually, if you clicked through the
|
|
<a href="https://mathworld.wolfram.com/Monoid.html">monoid</a>
|
|
link earlier, you might be interested to know that paths as defined here
|
|
form a monoid with concatenation \(\rightarrow\) and the empty path \(\circ\)
|
|
as a unit.
|
|
{{< /sidenote >}}
|
|
Paths alone, though, aren't enough for our polynomials; we're tracking
|
|
different ways to get from one place to another. This is an excellent
|
|
use case for sets!
|
|
|
|
Our next semiring will be that of _sets of paths_. Some example elements
|
|
of this semiring are \\(\\varnothing\\), also known as the empty set,
|
|
\\(\\{\\circ\\}\\), the set containing only the empty path, and the set
|
|
containing a path via the highway, and another path via the suburbs:
|
|
|
|
{{< latex >}}
|
|
\{\text{Highway}\rightarrow\text{Exit 4b}, \text{Suburb Rd.}\}
|
|
{{< /latex >}}
|
|
|
|
So what are the addition and multiplication on sets of paths? Addition
|
|
is the easier one: it's just the union of sets (the "triangle equal sign"
|
|
symbol means "defined as"):
|
|
|
|
{{< latex >}}
|
|
A + B \triangleq A \cup B
|
|
{{< /latex >}}
|
|
|
|
It's well known (and not hard to verify) that set union is commutative
|
|
and associative. The additive identity \\(0\\) is simply the empty set
|
|
\\(\\varnothing\\). Intuitively, adding "no paths" to another set of
|
|
paths doesn't add anything, and thus leaves that other set unchanged.
|
|
|
|
Multiplication is a little bit more interesting, and uses the path
|
|
concatenation operation we defined earlier. We will use this
|
|
operation to describe path sequencing; given two sets of paths,
|
|
\\(A\\) and \\(B\\), we'll create a new set of paths
|
|
consisting of each path from \\(A\\) concatenated with each
|
|
path from \\(B\\):
|
|
|
|
{{< latex >}}
|
|
A \times B \triangleq \{ a \rightarrow b\ |\ a \in A, b \in B \}
|
|
{{< /latex >}}
|
|
|
|
The fact that this definition of multiplication on sets is associative
|
|
relies on the associativity of path concatenation; if path concatenation
|
|
weren't associative, the second equality below would not hold.
|
|
|
|
{{< latex >}}
|
|
\begin{array}{rcl}
|
|
A \times (B \times C) & = & \{ a \rightarrow (b \rightarrow c)\ |\ a \in A, b \in B, c \in C \} \\
|
|
& \stackrel{?}{=} & \{ (a \rightarrow b) \rightarrow c \ |\ a \in A, b \in B, c \in C \} \\
|
|
& = & (A \times B) \times C
|
|
\end{array}
|
|
{{< /latex >}}
|
|
|
|
What's the multiplicative identity? Well, since multiplication concatenates
|
|
all the combinations of paths from two sets, we could try making a set of
|
|
elements that don't do anything when concatenating. Sound familiar? It should,
|
|
that's \\(\\circ\\), the empty path element! We thus define our multiplicative
|
|
identity as \\(\\{\\circ\\}\\), and verify that it is indeed the identity:
|
|
|
|
{{< latex >}}
|
|
\begin{gathered}
|
|
\{\circ\} \times A = \{ \circ \rightarrow a\ |\ a \in A \} = \{ a \ |\ a \in A \} = A \\
|
|
A \times \{\circ\}= \{ a\rightarrow \circ \ |\ a \in A \} = \{ a \ |\ a \in A \} = A
|
|
\end{gathered}
|
|
{{< /latex >}}
|
|
|
|
It's not too difficult to verify the annihilation and distribution laws for
|
|
sets of paths, either; I won't do that here, though. Finally, let's take
|
|
a look at an example. Like before, we'll try make one that corresponds to
|
|
our introductory description of paths from A to B and from B to C. Now we need
|
|
to be a little bit creative, and come up with names for all these different
|
|
roads between our hypothetical cities. Let's say that \\(\\text{Highway A}\\)
|
|
and \\(\\text{Highway B}\\) are the two paths from A to B that take two hours
|
|
each, and then \\(\\text{Shortcut}\\) is the path that takes one hour. As for
|
|
paths from B to C, let's just call them \\(\\text{Long}\\) for the three-hour
|
|
path, and \\(\\text{Short}\\) for the two-hour path. Our two polynomials
|
|
are then:
|
|
|
|
{{< latex >}}
|
|
\begin{array}{rcl}
|
|
P_1 & = & \{\text{Highway A}, \text{Highway B}\}x^2 + \{\text{Shortcut}\}x \\
|
|
P_2 & = & \{\text{Long}\}x^3 + \{\text{Short}\}x^2
|
|
\end{array}
|
|
{{< /latex >}}
|
|
|
|
Multiplying them gives:
|
|
{{< latex >}}
|
|
\begin{array}{rl}
|
|
& \{\text{Highway A} \rightarrow \text{Long}, \text{Highway B} \rightarrow \text{Long}\}x^5\\
|
|
+ & \{\text{Highway A} \rightarrow \text{Short}, \text{Highway B} \rightarrow \text{Short}, \text{Shortcut} \rightarrow \text{Long}\}x^4\\
|
|
+ & \{\text{Shortcut} \rightarrow \text{Short}\}x^3
|
|
\end{array}
|
|
{{< /latex >}}
|
|
|
|
This resulting polynomial gives us all the paths from city A to city C,
|
|
grouped by their length!
|
|
|
|
#### The Tropical Semiring, \\(\\mathbb{R}\\)
|
|
I only have one last semiring left to show you before we move on to something
|
|
other than paths between cities. It's a fun semiring though, as even its name
|
|
might suggest: we'll take a look at a _tropical semiring_.
|
|
|
|
In this semiring, we go back to numbers; particularly, real numbers (e.g.,
|
|
\\(1.34\\), \\(163\\), \\(e\\), that kind of thing). We even use addition --
|
|
sort of. In the tropical semiring, addition serves as the _multiplicative_
|
|
operation! This is even confusing to write, so I'm going to switch up notation;
|
|
in the rest of this section, I'll use \\(\\otimes\\) to represent the
|
|
multiplicative operation in semirings, and \\(\\oplus\\) to represent the
|
|
additive one. The symbols \\(\\times\\) and \\(+\\) will be used to represent
|
|
the regular operations on real numbers. With that, the operations on our
|
|
tropical semiring over real numbers are defined as follows:
|
|
|
|
{{< latex >}}
|
|
\begin{array}{rcl}
|
|
x \otimes y & \triangleq & x + y\\
|
|
x \oplus y & \triangleq & \min(x,y)
|
|
\end{array}
|
|
{{< /latex >}}
|
|
|
|
What is this new semiring good for? How about this: suppose that in addition to
|
|
the duration of the trip, you'd like to track the distance you must travel for
|
|
each route (shorter routes do sometimes have more traffic!). Let's watch what
|
|
happens when we add and multiply polynomials over this semiring.
|
|
When we add terms with the same power but different coefficients, like
|
|
\\(ax\oplus bx\\), we end up with a term \\(\min(a,b)x\\). In other words,
|
|
for each trip duration, we pick the shortest length. When we multiply two
|
|
polynomials, like \\(ax\otimes bx\\), we get \\((a+b)x\\); in other words,
|
|
when sequencing two trips, we add up the distances to get the combined
|
|
distance, just like we'd expect.
|
|
|
|
We can, of course, come up with a polynomial to match our initial example.
|
|
Say that the trips from A to B are represented by \\(2.0x^2\oplus1.5x\\\) (the
|
|
shortest two-hour trip is \\(2\\) units of distance long, and the one-hour
|
|
trip is \\(1.5\\) units long), and that the trips from B to C are represented
|
|
by \\(4.0x^3\oplus1.0x^2\\). Multiplying the two polynomials out gives:
|
|
|
|
{{< latex >}}
|
|
\begin{array}{rcl}
|
|
(2.0x^2\oplus1.5x)(4.0x^3\oplus1.0x^2) & = & 6.0x^5 \oplus \min(2.0+1.0, 1.5+4.0)x^4 \oplus 2.5x^3 \\
|
|
& = & 6.0x^5 \oplus 3.0x^4 \oplus 2.5x^3
|
|
\end{array}
|
|
{{< /latex >}}
|
|
|
|
The only time we used the additive operation in this case was to pick between
|
|
two trips of equal druation but different length (two-hour trip from A to B
|
|
followed by a two-hour trip from B to C, or one-hour trip from A to C followed
|
|
by a three-hour trip from B to C). The first trip wins out, since it requires
|
|
only \\(3.0\\) units of distance.
|