August 6, 2002
Odds are, if you dealt with me this morning, you didn’t deal with
me at my personal best. I’ve borked my right thumb, and you really
don’t appreciate how much of the difference between you and, say, an
intellectually ambitious puppy, is due to those opposable thumbs. I’m
glad I’ve got one left.
Our connection to an important (“Misson-Critical” in the jargon of
the day) server went down today, or at least sometime over the long
weekend, and the amount of screwing around it took to ressurect it
was somewhat instructional. First off, let it be trumpeted to the
world that there are different ways of solving problems, some of
which are good and some of which are not. When I strolled into the
server room first thing this morning, it was clear that a very large
number of “fixes” had been applied to this particular system. It was
so thoroughly fixed, in fact, that it was almost impossible to fix it
again, or even to tell what the previous fixes were or what plugged
There’s a place for that kind of thing, of course – when
something has to be working now, get it working as soon as
possible. That’s cool, but the relative merit of a solution, I think,
is a function of time. If it becomes obvious that something keeps
falling down and it’s not clear why, it’s time for a plan. Important
parts of this plan are clarity and scaleability – once it’s worked
out, it should be obvious not only where everything goes, but where
any new things should be put when they’re required. An example that
doesn’t fit those criteria (and this is purely hypothetical,
you understand) might include, say, having everything stuffed to on
to a steel shelf in the back of the building with a pile of archived
documents and held together with garbage ties and scotch tape.
The other instructional thing that I learned is that it is
very helpful to not hide boxes from your admins. This
mission-critical box was exactly where reason would dictate
it should be – under a desk in an unlit room at the opposite end
of the building from the server room. I had assumed, their setup
being what it was, that this server was plain old off-site, somebody
else’s problem entirely. It turns out that no, that’s not the case
– in fact, it had been unplugged, replugged, and was now flashing
“Press F1 to Continue”, once I got the monitor turned on.
The abject stupidity of the solution, I have since convinced myself,
was mitigated by the fact that I did clean up and reorg the server
room, replace a broken hub and diagram the hell out of everything.
Yes, mitigated. Yes it was.