August 15, 2018

Time Dilation

Filed under: academic,digital,documentation,interfaces,lunacy,mozilla,science,work — mhoye @ 11:17 am

[ ]

I riffed on this a bit over at twitter some time ago; this has been sitting in the drafts folder for too long, and it’s incomplete, but I might as well get it out the door. Feel free to suggest additions or corrections if you’re so inclined.

You may have seen this list of latency numbers every programmer should know, and I trust we’ve all seen Grace Hopper’s classic description of a nanosecond at the top of this page, but I thought it might be a bit more accessible to talk about CPU-scale events in human-scale transactional terms. So: if a single CPU cycle on a modern computer was stretched out as long as one of our absurdly tedious human seconds, how long do other computing transactions take?

If a CPU cycle is 1 second long, then:

  • Getting data out of L1 cache is about the same as getting your data out of your wallet; about 3 seconds.
  • At 9 to 10 seconds, getting data from L2 cache is roughly like asking your friend across the table for it.
  • Fetching data from the L3 cache takes a bit longer – it’s roughly as fast as having an Olympic sprinter bring you your data from 400 meters away.
  • If your data is in RAM you can get it in about the time it takes to brew a pot of coffee; this is how long it would take a world-class athlete to run a mile to bring you your data, if they were running backwards.
  • If your data is on an SSD, though, you can have it six to eight days, equivalent to having it delivered from the far side of the continental U.S. by bicycle, about as fast as that has ever been done.
  • In comparison, platter disks are delivering your data by horse-drawn wagon, over the full length of the Oregon Trail. Something like six to twelve months, give or take.
  • Network transactions are interesting – platter disk performance is so poor that fetching data from your ISP’s local cache is often faster than getting it from your platter disks; at two to three months, your data is being delivered to New York from Beijing, via container ship and then truck.
  • In contrast, a packet requested from a server on the far side of an ocean might as well have been requested from the surface of the moon, at the dawn of the space program – about eight years, from the beginning of the Apollo program to Armstrong, Aldrin and Collin’s successful return to earth.
  • If your data is in a VM, things start to get difficult – a virtualized OS reboot takes about the same amount of time as has passed between the Renaissance and now, so you would need to ask Leonardo Da Vinci to secretly encode your information in one of his notebooks, and have Dan Brown somehow decode it for you in the present? I don’t know how reliable that guy is, so I hope you’re using ECC.
  • That’s all if things go well, of course: a network timeout is roughly comparable to the elapsed time between the dawn of the Sumerian Empire and the present day.
  • In the worst case, if a CPU cycle is 1 second, cold booting a racked server takes approximately all of recorded human history, from the earliest Indonesian cave paintings to now.

August 13, 2018

Licensing Edgecases

Filed under: digital,documentation,interfaces,linux,mozilla,work — mhoye @ 4:37 pm

While I’m not a lawyer – and I’m definitely not your lawyer – licensing questions are on my plate these days. As I’ve been digging into one, I’ve come across what looks like a strange edge case in GPL licensing compliance that I’ve been trying to understand. Unfortunately it looks like it’s one of those Affero-style, unforeseen edge cases that (as far as I can find…) nobody’s tested legally yet.

I spent some time trying to understand how the definition of “linking” applies in projects where, say, different parts of the codebase use disparate, potentially conflicting open source licenses, but all the code is interpreted. I’m relatively new to this area, but generally speaking outside of copying and pasting, “linking” appears to be the critical threshold for whether or not the obligations imposed by the GPL kick in and I don’t understand what that means for, say, Javascript or Python.

I suppose I shouldn’t be surprised by this, but it’s strange to me how completely the GPL seems to be anchored in early Unix architectural conventions. Per the GPL FAQ, unless we’re talking about libraries “designed for the interpreter”, interpreted code is basically data. Using libraries counts as linking, but in the eyes of the GPL any amount of interpreted code is just a big, complicated config file that tells the interpreter how to run.

At a glance this seems reasonable but it seems like a pretty strange position for the FSF to take, particularly given how much code in the world is interpreted, at some level, by something. And honestly: what’s an interpreter?

The text of the license and the interpretation proposed in the FAQ both suggest that as long as all the information that a program relies on to run is contained in the input stream of an interpreter, the GPL – and if their argument sticks, other open source licenses – simply… doesn’t apply. And I can’t find any other major free or open-source licenses that address this question at all.

It just seems like such a weird place for an oversight. And given the often-adversarial nature of these discussions, given the stakes, there’s no way I’m the only person who’s ever noticed this. You have to suspect that somewhere in the world some jackass with a very expensive briefcase has an untested legal brief warmed up and ready to go arguing that a CPU’s microcode is an “interpreter” and therefore the GPL is functionally meaningless.

Whatever your preferred license of choice, that really doesn’t seem like a place we want to end up; while this interpretation may be technically correct it’s also very-obviously a bad-faith interpretation of both the intent of the GPL and that of the authors in choosing it.

The position I’ve taken at work is that “are we technically allowed to do this” is a much, much less important question than “are we acting, and seen to be acting, as good citizens of the larger Open Source community”. So while the strict legalities might be blurry, seeing the right thing to do is simple: we treat the integration of interpreted code and codebases the same way we’d treat C/C++ linking, respecting the author’s intent and the spirit of the license.

Still, it seems like something the next generation of free and open-source software licenses should explicitly address.

Powered by WordPress