Here comes the science.
Well, here comes the math, really, in which Matrix-Chain-Order gets a little bit sharper around the edges.
When you multiply a long list of matrices together, there’s a few (usually one, really) ways of doing that that are significantly faster than the others. That doesn’t mean you’re going to get the wrong answer, it’s just that, like much of life, going about it one way can be much, much faster than the others.
If you are not interested in this sort of thing, please, come back tomorrow, where I will try hard to be entertaining. Right now, you’re getting abstract algebra.
If you don’t remember your matrix multiplication from high-school algebra, it works like this: if you’ve got two matrices A1 and A2, you can only multiply them together if the number
of columns in the first one is the same as the number of rows in the second one – that is, if your A1 is a 2×5 matrix and your A2 is 5×3, for example. Then, when you multiply
them together, the thing that comes out will have as many columns as the first matrix, and as many rows as the second; for our given examples, the result will be a 2×3 matrix.
This is pretty straightforward stuff, if you’re just multiplying
two matrices. It’s tedious, you’ll note (assuming that you don’t
find this whole discussion tedious at this point anyway), because
to finish that multiplication, you’ll need to do five operations for
each of the 2×3 entries in the resulting matrix. Generally, if you’ve
got a matrix that’s X by Y and one that’s Y by Z, their product will
be a matrix of X by Z size, and take X*Y*Z steps to complete, the
“cost” of the operation. Still pretty straightforward, but it gets
marginally more interesting when you’re multiplying a whole bunch
of them in a row. Let me give you a somewhat exaggerated example,
so you can see what I’m getting at here.
Say you want to multiply three matrices together, A1 (a 2×100 matrix),
A2 (100×5) and A3 (5×1000). You have a
choice now, because you don’t have to do them in order. So, do you
do A1*A2 first, or A2*A3? If I multiply
A1 by A2 first, at a cost of 2x100x5 = 1000,
I then have to multiply the result by A3, costing me
2x5x1000 = 10,000, for a total cost of 11,000 operations. But, if
I do it the other way, doing A2 times A3 first
at a cost of 100x5x1000 = 500,000, and then multiply A1
by the result at a cost of 2x100x1000 = 200,000, I need to perform a
total of 700,000 operations to get the very same result. For these
numbers, the good way of doing the work takes approximately 1/700th the time of the bad way. Onerous!
That might not seem to you like a big deal, given how fast modern computers can be with problems this size, and often you’d be right – modern machines are blazingly fast, and will grind through half a million arithmetic operations horrifically quickly. But take my word for it when I say that matrices can get way bigger than that, and once you start multiplying a whole bunch of them in a row, the difference between the good way and the bad way can go from fractions of a second to minutes. And these are not minutes that this hardware was going to sit twiddling its thumbs, no sir; other muscular problems are queuing up outside while this one’s being worked over second after agonizing second. Sometimes making the effort to figure out the best way to do calculation-intensive stuff like this can be the difference between getting your answer in a fraction of a second and having the whole system get snowed under.
A digression, kplzthx.
Generally speaking, when the people who care about
algorithms talk about them, they’ll describe them in terms of
how long they’ll take to come back to you with an answer, for a
certain number of inputs. It’s called “Big-Oh” Notation, and it’s
not really a description of the time a process takes, but of how
that process “scales”. That is, how the amount of time it will
take increases with the size of its input. If I have a process
for sorting cards, say, and it takes about 10 seconds to sort 10
cards, but 40 seconds to sort 20 cards, and 90 seconds to sort 30
cards and so on, that’s called an O(n2) algorithm –
roughly speaking, for n inputs, it will take n2 units of time.
O(n2) processes aren’t great, but in the grand scheme
of things they’re not bad; compare that to the similar-looking
O(2n). 1002 is only 10,000;
2100 = 1267650600228229401496703205376, and doing that many of anything will cut into your beauty sleep something fierce.
Anyway, I can pretty much tell this is boring those stalwart few
of you who’ve read so far to death, so where I’m going with this is
two things, first that the “figuring out the right order” step I
hinted at earlier is a well-known O(n3) algorithm,
and that generally speaking the more superscripty your big numbers
get, the worse trouble you’re in. N3 really isn’t that bad, though, because the amount of time you’re going to save is pretty much always going to make doing it worthwhile.
Most of the time, I say! But not all the time! First off, this might not be new. But if it isn’t, well, I can’t find the paper that puts it forward despite my best googly, scifindery efforts, and nevertheless I came up with it all by my very lonesome. So don’t hate the player, or something. There is a specific degenerate case that I’d like to address here.
As discussed in CLRS, it is
trivial to construct an O(2n) algorithm to determine the optimal parenthetical ordering of a sequence of multiplication. It is further observed that this would suck, though not in so many words. They then provide an O(n3) algorithm as part of a dynamic programming exercise set before moving on to other things.
I think that I’ve got an edge case here, though, that can be solved much faster than the algorithm they’ve provided, so let me tell you what I’ve got:
For a sequence of matrices A1…An,
where the dimensions of the matrices in this sequence can be expressed as x0, x1 … xn, and such that (xa <= xb) for all (a < b) (A sequence of matrices of constantly increasing dimension, basically) it is always optimal to do them in order, from left to right.
I’m writing out a formal proof for this right now, but the gist of it is that by doing it in order, you carry the x0 element, the smallest in your list, through every step.
So the O(n3) algorithm in CLRS is not optimal in all cases – you should, in an O(n) first pass, determine if your matrix dimensions are constantly increasing (or decreasing, of course, and then you just do the multiplication from back to front – it’s a symmetric proof), and if they are, boom, you’re done, you can skip the rest.
Better yet, if you magically know ahead of time that you’re going to be multiplying matrices of this form, you can just skip the whole optimization step, and declare it optimized by pure, mathematically-backed fiat.
As mentioned above, I don’t see how this can possibly be new, or my own discovery, or whatever. But there it is; I found a way to do a thing faster than the guys in the book did, and I did it with my very own brain.