Here comes the science.

Well, here comes the math, really, in which Matrix-Chain-Order gets a little bit sharper around the edges.

When you multiply a long list of matrices together, there’s a few (usually one, really) ways of doing that that are significantly faster than the others. That doesn’t mean you’re going to get the wrong answer, it’s just that, like much of life, going about it one way can be much, much faster than the others.

If you are not interested in this sort of thing, please, come back tomorrow, where I will try hard to be entertaining. Right now, you’re getting abstract algebra.

If you don’t remember your matrix multiplication from high-school algebra, it works like this: if you’ve got two matrices A_{1} and A_{2}, you can only multiply them together if the number

of columns in the first one is the same as the number of rows in the second one – that is, if your A_{1} is a 2×5 matrix and your A_{2} is 5×3, for example. Then, when you multiply

them together, the thing that comes out will have as many columns as the first matrix, and as many rows as the second; for our given examples, the result will be a 2×3 matrix.

This is pretty straightforward stuff, if you’re just multiplying

two matrices. It’s tedious, you’ll note (assuming that you don’t

find this whole discussion tedious at this point anyway), because

to finish that multiplication, you’ll need to do five operations for

each of the 2×3 entries in the resulting matrix. Generally, if you’ve

got a matrix that’s X by Y and one that’s Y by Z, their product will

be a matrix of X by Z size, and take X*Y*Z steps to complete, the

“cost” of the operation. Still pretty straightforward, but it gets

marginally more interesting when you’re multiplying a whole bunch

of them in a row. Let me give you a somewhat exaggerated example,

so you can see what I’m getting at here.

Say you want to multiply three matrices together, A_{1} (a 2×100 matrix),

A_{2} (100×5) and A_{3} (5×1000). You have a

choice now, because you don’t have to do them in order. So, do you

do A_{1}*A_{2} first, or A_{2}*A_{3}? If I multiply

A_{1} by A_{2} first, at a cost of 2x100x5 = 1000,

I then have to multiply the result by A_{3}, costing me

2x5x1000 = 10,000, for a total cost of 11,000 operations. But, if

I do it the other way, doing A_{2} times A_{3} first

at a cost of 100x5x1000 = 500,000, and then multiply A_{1}

by the result at a cost of 2x100x1000 = 200,000, I need to perform a

total of 700,000 operations to get the very same result. For these

numbers, the good way of doing the work takes approximately *1/700th* the time of the bad way. Onerous!

That might not seem to you like a big deal, given how fast modern computers can be with problems this size, and often you’d be right – modern machines are blazingly fast, and will grind through half a million arithmetic operations horrifically quickly. But take my word for it when I say that matrices can get way bigger than that, and once you start multiplying a whole bunch of them in a row, the difference between the good way and the bad way can go from fractions of a second to minutes. And these are not minutes that this hardware was going to sit twiddling its thumbs, no sir; other muscular problems are queuing up outside while this one’s being worked over second after agonizing second. Sometimes making the effort to figure out the best way to do calculation-intensive stuff like this can be the difference between getting your answer in a fraction of a second and having the whole system get snowed under.

A digression, kplzthx.

Generally speaking, when the people who care about

algorithms talk about them, they’ll describe them in terms of

how long they’ll take to come back to you with an answer, for a

certain number of inputs. It’s called “Big-Oh” Notation, and it’s

not really a description of the time a process takes, but of how

that process “scales”. That is, how the amount of time it will

take increases with the size of its input. If I have a process

for sorting cards, say, and it takes about 10 seconds to sort 10

cards, but 40 seconds to sort 20 cards, and 90 seconds to sort 30

cards and so on, that’s called an O(n^{2}) algorithm –

roughly speaking, for n inputs, it will take n^{2}units of time.O(n

^{2}) processes aren’t great, but in the grand scheme

of things they’re not bad; compare that to the similar-looking

O(2^{n}). 100^{2}is only 10,000;

2^{100}= 1267650600228229401496703205376, and doing that many ofanythingwill cut into your beauty sleep something fierce.

Anyway, I can pretty much tell this is boring those stalwart few

of you who’ve read so far to death, so where I’m going with this is

two things, first that the “figuring out the right order” step I

hinted at earlier is a well-known O(n^{3}) algorithm,

and that generally speaking the more superscripty your big numbers

get, the worse trouble you’re in. N^{3} really isn’t that bad, though, because the amount of time you’re going to save is pretty much always going to make doing it worthwhile.

Most of the time, I say! But not all the time! First off, this might not be new. But if it isn’t, well, I can’t find the paper that puts it forward despite my best googly, scifindery efforts, and nevertheless I came up with it all by my very lonesome. So don’t hate the player, or something. There is a specific degenerate case that I’d like to address here.

As discussed in CLRS, it is

trivial to construct an O(2^{n}) algorithm to determine the optimal parenthetical ordering of a sequence of multiplication. It is further observed that this would suck, though not in so many words. They then provide an O(n^{3}) algorithm as part of a dynamic programming exercise set before moving on to other things.

I think that I’ve got an edge case here, though, that can be solved much faster than the algorithm they’ve provided, so let me tell you what I’ve got:

For a sequence of matrices A

_{1}…A_{n},

where the dimensions of the matrices in this sequence can be expressed as x_{0}, x_{1}… x_{n}, and such that (x_{a}<= x_{b}) for all (a < b) (A sequence of matrices of constantly increasing dimension, basically) it isalways optimalto do them in order, from left to right.

I’m writing out a formal proof for this right now, but the gist of it is that by doing it in order, you carry the x_{0} element, the smallest in your list, through every step.

So the O(n^{3}) algorithm in CLRS is not optimal in all cases – you should, in an O(n) first pass, determine if your matrix dimensions are constantly increasing (or decreasing, of course, and then you just do the multiplication from back to front – it’s a symmetric proof), and if they are, boom, you’re done, you can skip the rest.

Better yet, if you magically know ahead of time that you’re going to be multiplying matrices of this form, you can just skip the whole optimization step, and declare it optimized by pure, mathematically-backed fiat.

As mentioned above, I don’t see how this can possibly be new, or my own discovery, or whatever. But there it is; I found a way to do a thing faster than the guys in the book did, and I did it with my very own brain.

Go me.

## 11 Comments

Interesting stuff, it’s been a while since I’ve been forced to think about Big-O notation.

And I forgot to fill in the info part .. silly me.

Have you thought about running this by a math/CS prof, and possibly publishing it? (Or would that not be considered a normal practice for this type of thing?)

I really don’t know what the normal practice for this kind of thing is, but I’ll ask around.

Mike, this is more than a little nifty. I’m working up a few examples right now, and may be posting them later; if I do, I’ll link back over.

Also, for mathematical markup, you may want to check out ASCIIMathML, which is probably the least frivolous use of the language that I’ve seen.

Joe, are you the same Joe Hache who was once president of the CCSS? If so, I’ve got a couple of questions for you.

Also, what the heck are you people doing up at this hour on a Saturday? You’re all nuts.

Yes, it’s me. Email me if you’d like, but if it’s CCSS-related I haven’t been involved with them for a couple years.

Thank you, anonymous commenter – I’ve made the correction.

Like Joe, I forgot to fill in the info part … anonymous commenter, c’est moi.

If you feel like going over your proof, let me know.

Thank you, Sean Ross. And, I’d love to, and happy new year!

I’ve noticed with textbooks and examples from profs that most of the algorithms they provide are intended for the ‘general’ case. However, it’s optimizations like this which set our algorithms apart from the stock versions that people lookup online and plug into their programs, so hey, congrats on tweaking this one!

If I may, I’ll run through a similar thing I’ve found for d-CNF clauses.. (conjunctive normal form clauses having ‘d’ elements in each disjunction)

From class, “It’s not always possible to generate a boolean assignment for a d-CNF that yields a true value for the overall clause. However, it is often of interest to generate an assignment that yields a high number of true terms in the clause.”

Their solution is that randomly generating the boolean assignment will have a (2^d – 1)/(2^d) probability of being true anyway, yielding ‘n’ times that for the expected number of true terms in an n-term clause.

My solution is to count up the number of terms in which each boolean assignment occurs, and assign the necessary value to the individual variable that sets most of them true, then repeat this step for the remaining false clauses until you’ve made assignments for all ‘d’ variables. I’m not sure how to prove its correctness, but it seems like a sound argument for maximizing terms… thoughts?