Planet: Secure For Now

Elevation

This is a followup to a followup – hopefully the last one for a while – about Planet. First of all, I apologize to the community for taking this long to resolve it. It turned out to have a lot more moving parts than were visible at first, and I didn’t know enough about the problem’s context to be able to solve it quickly. I owe a number of people an apology for that, first among them Ehsan who originally brought it to my attention.

The root cause of the problem was that HTTPlib2 in Python 2.x doesn’t – and apparently will never – support Server Name Indication, an important part of Transport Layer Security on shared hosts. This is probably not a big deal for anyone who doesn’t need to make legacy web-facing Python utilities interact securely with modernity, but… well. It me, as the kids say. Here we are.

For some context, our particular SSL problems manifested themselves with error messages like “Error urllib2 Python. SSL: TLSV1_ALERT_INTERNAL_ERROR ssl.c:590” behind the scenes and “internal error” in Planet proper, and I think it’s fair to feel like those messages are less than helpful. I also – no slight on my colleagues in this – don’t have a lot of say in the infrastructure Planet is running on, and it’s equally fair to say I’m not much of a programmer. Python feature-backporting is kind of a rodeo too, and I had a hard time mapping from “I’m using this version of Python on this OS” to “therefore, I have these tools available to me.” Ultimately this combination of OS constraints, library opacity and learning how (and if, where and when) SSL works (or doesn’t, and why) while working in the dated idioms of a language I only half-know didn’t add up to the smoothest experience.

I had a few options open to me, or at least I thought I did. Refactoring for Python 3.x was a non-starter, but I spent far more time than I should have trying to rewrite Planet to work directly with Requests. That turned out to be harder than I’d expected, largely because Planet code has a lot of expectations all over it about HTTPlib2 and how it behaves. I mistakenly thought re-engineering that behavior would be straightforward, and I definitely wasn’t expecting the surprising number of rusty edge cases I’d run into when my assumptions hit the real live web.

Partway through this exercise, in a curious set of coincidences, Mike Connor and I were talking about an old line – misquoted by John F. Kennedy as “Don’t ever take a fence down until you know the reason why it was put up” – by G. K. Chesterton, that went:

In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away.” To which the more intelligent type of reformer will do well to answer: “If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.”

Infrastructure

One nice thing about ancient software is that it builds up these fences; they look like cruft, like junk you should tear out and throw away, until you really, really understand that your code, and you, are being tested. That conversation reminded me of this blog post from Joel Spolsky, about The Worst Thing you can do with software, which smelled suspiciously like what I was right in the middle of doing.

There’s a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming:

It’s harder to read code than to write it.

This is why code reuse is so hard. This is why everybody on your team has a different function they like to use for splitting strings into arrays of strings. They write their own function because it’s easier and more fun than figuring out how the old function works.

As a corollary of this axiom, you can ask almost any programmer today about the code they are working on. “It’s a big hairy mess,” they will tell you. “I’d like nothing better than to throw it out and start over.”

Why is it a mess?

“Well,” they say, “look at this function. It is two pages long! None of this stuff belongs in there! I don’t know what half of these API calls are for.”

[…] I know, it’s just a simple function to display a window, but it has grown little hairs and stuff on it and nobody knows why. Well, I’ll tell you why: those are bug fixes. One of them fixes that bug that Nancy had when she tried to install the thing on a computer that didn’t have Internet Explorer. Another one fixes that bug that occurs in low memory conditions. Another one fixes that bug that occurred when the file is on a floppy disk and the user yanks out the disk in the middle. That LoadLibrary call is ugly but it makes the code work on old versions of Windows 95.

Each of these bugs took weeks of real-world usage before they were found. The programmer might have spent a couple of days reproducing the bug in the lab and fixing it. If it’s like a lot of bugs, the fix might be one line of code, or it might even be a couple of characters, but a lot of work and time went into those two characters.

When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes. Years of programming work.

The first one of these fences I hit was when I discovered that HTTPlib2.Response objects are (somehow) case-insensitive dictionaries because HTTP headers, per spec, are case-insensitive (though normal Python dictionaries very much not, even though examining Response objects with basic tools like “print” makes them look just like they’re a perfectly standard python dict(), nothing to see here move along. Which definitely has this kind of a vibe to it.) Another was hitting what might be a bug in Requests, where usually it gives you “200” as the HTTP “Everything’s Fine” response, which Python will happily and silently turn into the integer HTTPlib2 is expecting, but sometimes gives you “200 OK” which: kaboom.

On the bright side, I did get to spend a few minutes reminiscing fondly to myself about working with Dave Humphrey way back in the day; in hindsight he warned me about this kind of thing when we were working through a similar problem. “It’s the Web. You get whatever you get, whenever you get it, and you’ll like it.”

I was mulling over all of this earlier this week when I decided to take the best (and also worst, and also last) available option: I threw out everything I’d done up to that point and just started lying to the program until it did what I wanted.

This gist is the meat of that effort; the rest of it (swap out the HTTPlib2 calls for Requests and update your error handling) is straightforward, and running in production now. It boils down to taking a Requests object, giving it an imaginary friend, and then standing it on that imaginary friend’s shoulders, throwing a trenchcoat over it and telling it to act like a grownup. The content both calls returns is identical but the supplementary data – headers, response codes, etc – isn’t, so using this technique as a shim potentially makes Requests a drop-in replacement for HTTPlib2. On the off chance that you’re facing the same problems Planet was facing, I hope it’s useful to you.

Again, I apologize for the delay in sorting this out, and thank you for your patience.

2 Comments

  1. Posted April 8, 2017 at 2:13 pm | Permalink

    Pretty sure your comment that Python 2 will never support SNI is wrong. Some Googling supports my recollection that 2.7.9 includes a number of security-related backports, including SNI support for the built-in SSL routines. I’m pretty sure this means that httplib2 will gain the support, too.

  2. mhoye
    Posted April 8, 2017 at 2:37 pm | Permalink

    The situation is pretty confusing. HTTPLib2 is apparently only taking bugfixes for 2.x, but some OS vendors have backported fixes and support for various things, but I don’t know how to figure out where that list lives. Or where all of those lists live; I also don’t know how to figure out what I can or can’t have in Python that will be stable across OSes or vendors. If there’s a chart somewhere explaining all this, I haven’t seen it.

    This is the only approach I could find that let me ignore all that uncertainty and ship a solution today.