blarg?

August 7, 2019

Ten More Simple Rules

Untitled

The Public Library of Science‘s Ten Simple Rules series can be fun reading; they’re introductory papers intended to provide novices or non-domain-experts with a set of quick, evidence-based guidelines for dealing with common problems in and around various fields, and it’s become a pretty popular, accessible format as far as scientific publication goes.

Topic-wise, they’re all over the place: protecting research integrity, creating a data-management plan and taking advantage of Github are right there next to developing good reading habits, organizing an unconference or drawing a scientific comic, and lots of them are kind of great.

I recently had the good fortune to be co-author on one of them that’s right in my wheelhouse and has recently been accepted for publication: Ten Simple Rules for Helping Newcomers Become Contributors to Open Projects. They are, as promised, simple:

  1. Be welcoming.
  2. Help potential contributors evaluate if the project is a good fit.
  3. Make governance explicit.
  4. Keep knowledge up to date and findable.
  5. Have and enforce a code of conduct.
  6. Develop forms of legitimate peripheral participation.
  7. Make it easy for newcomers to get started.
  8. Use opportunities for in-person interaction – with care.
  9. Acknowledge all contributions, and
  10. Follow up on both success and failure.

You should read the whole thing, of course; what we’re proposing are evidence-based practices, and the details matter, but the citations are all there. It’s been a privilege to have been a small part of it, and to have done the work that’s put me in the position to contribute.

June 29, 2019

Blitcha

Blit

March 27, 2019

Defined By Prosodic And Morphological Properties

Filed under: academia,academic,awesome,interfaces,lunacy,science — mhoye @ 5:09 pm

Untitled

I am fully invested in these critical advances in memetic linguistics research:

[…] In this paper, we go beyond the aforementioned prosodic restrictions on novel morphology, and discuss gradient segmental preferences. We use morphological compounding to probe English speakers’ intuitions about the phonological goodness of long-distance vowel and consonant identity, or complete harmony. While compounding is a central mechanism for word-building in English, its phonology does not impose categorical vowel or consonant agreement patterns, even though such patterns are attested cross-linguistically. The compound type under investigation is a class of insult we refer to as shitgibbons, taking their name from the most prominent such insult which recently appeared in popular online media (§2). We report the results of three online surveys in which speakers rated novel shitgibbons, which did or did not instantiate long-distance harmonies (§3). With the ratings data established, we follow two lines of inquiry to consider their source: first, we compare shitgibbon harmony preferences with the frequency of segmental harmony in English compounds more generally, and conclude that the lexicon displays both vowel and consonant harmony (§4); second, we attribute the lack of productive consonant harmony in shitgibbons to the attested cross-linguistic harmonies, which we implement as a locality bias in MaxEnt grammar (§5).

You had me at “diddly-infixation”.

December 13, 2018

Looking Skyward

Filed under: awesome,beauty,documentation,flickr,future,life,science — mhoye @ 12:43 pm

PC050781

PC050776

Space

November 9, 2018

The Evolution Of Open

Filed under: digital,future,interfaces,linux,losers,mozilla,science,toys,vendetta,work — mhoye @ 5:00 pm

This started its life as a pair of posts to the Mozilla governance forum, about the mismatch between private communication channels and our principles of open development. It’s a little long-winded, but I think it broadly applies not just to Mozilla but to open source in general. This version of it interleaves those two posts into something I hope is coherent, if kind of rambly. Ultimately the only point I want to make here is that the nature of openness has changed, and while it doesn’t mean we need to abandon the idea as a principle or as a practice, we can’t ignore how much has changed or stay mired in practices born of a world that no longer exists.

If you’re up for the longer argument, well, you can already see the wall of text under this line. Press on, I believe in you.

Even though open source software has essentially declared victory, I think that openness as a practice – not just code you can fork but the transparency and accessibility of the development process – matters more than ever, and is in a pretty precarious position. I worry that if we – the Royal We, I guess – aren’t willing to grow and change our understanding of openness and the practical realities of working in the open, and build tools to help people navigate those realities, that it won’t be long until we’re worse off than we were when this whole free-and-open-source-software idea got started.

To take that a step further: if some of the aspirational goals of openness and open development are the ideas of accessibility and empowerment – that reducing or removing barriers to participation in software development, and granting people more agency over their lives thereby, is self-evidently noble – then I think we need to pull apart the different meanings of the word “open” that we use as if the same word meant all the same things to all the same people. My sense is that a lot of our discussions about openness are anchored in the notion of code as speech, of people’s freedom to move bits around and about the limitations placed on those freedoms, and I don’t think that’s enough.

A lot of us got our start when an internet connection was a novelty, computation was scarce and state was fragile. If you – like me – are a product of this time, “open” as in “open source” is likely to be a core part of your sense of personal safety and agency; you got comfortable digging into code, standing up your own services and managing your own backups pretty early, because that was how you maintained some degree of control over your destiny, how you avoided the indignities of data loss, corporate exploitation and community collapse.

“Open” in this context inextricably ties source control to individual agency. The checks and balances of openness in this context are about standards, data formats, and the ability to export or migrate your data away from sites or services that threaten to go bad or go dark. This view has very little to say about – and is often hostile to the idea of – granular access restrictions and the ability to impose them, those being the tools of this worldview’s bad actors.

The blind spots of this worldview are the products of a time where someone on the inside could comfortably pretend that all the other systems that had granted them the freedom to modify this software simply didn’t exist. Those access controls were handled, invisibly, elsewhere; university admission, corporate hiring practices or geography being just a few examples of the many, many barriers between the network and the average person.

And when we’re talking about blind spots and invisible social access controls, of course, what we’re really talking about is privilege. “Working in the open”, in a world where computation was scarce and expensive, meant working in front of an audience that was lucky enough to go to university or college, whose parents could afford a computer at home, who lived somewhere with broadband or had one of the few jobs whose company opened low-numbered ports to the outside world; what it didn’t mean was doxxing, cyberstalking, botnets, gamergaters, weaponized social media tooling, carrier-grade targeted-harassment-as-a-service and state-actor psy-op/disinformation campaigns rolling by like bad weather. The relentless, grinding day-to-day malfeasance that’s the background noise of this grudgefuck of a zeitgeist we’re all stewing in just didn’t inform that worldview, because it didn’t exist.

In contrast, a more recent turn on the notion of openness is one of organizational or community openness; that is, openness viewed through the lens of the accessibility and the experience of participation in the organization itself, rather than unrestricted access to the underlying mechanisms. Put another way, it puts the safety and transparency of the organization and the people in it first, and considers the openness of work products and data retention as secondary; sometimes (though not always) the open-source nature of the products emerges as a consequence of the nature of the organization, but the details of how that happens are community-first, code-second (and sometimes code-sort-of, code-last or code-never). “Openness” in this context is about accessibility and physical and emotional safety, about the ability to participate without fear. The checks and balances are principally about inclusivity, accessibility and community norms; codes of conduct and their enforcement.

It won’t surprise you, I suspect, to learn that environments that champion this brand of openness are much more accessible to women, minorities and otherwise marginalized members of society that make up a vanishingly small fraction of old-school open source culture. The Rust and Python communities are doing good work here, and the team at Glitch have done amazing things by putting community and collaboration ahead of everything else. But a surprising number of tool-and-platform companies, often in “pink-collar” fields, have taken the practices of open community building and turned themselves into something that, code or no, looks an awful lot like the best of what modern open source has to offer. If you can bring yourself to look past the fact that you can’t fork their code, Salesforce – Salesforce, of all the damn things – has one of the friendliest, most vibrant and supportive communities in all of software right now.

These two views aren’t going to be easy to reconcile, because the ideas of what “accountability” looks like in both contexts – and more importantly, the mechanisms of accountability built in to the systems born from both contexts – are worse than just incompatible. They’re not even addressing something the other worldview is equipped to recognize as a problem. Both are in some sense of the word open, both are to a different view effectively closed and, critically, a lot of things that look like quotidian routine to one perspective look insanely, unacceptably dangerous to the other.

I think that’s the critical schism the dialogue, the wildly mismatched understandings of the nature of risk and freedom. Seen in that light the recent surge of attention being paid to federated systems feels like a weirdly reactionary appeal to how things were better in the old days.

I’ve mentioned before that I think it’s a mistake to think of federation as a feature of distributed systems, rather than as consequence of computational scarcity. But more importantly, I believe that federated infrastructure – that is, a focus on distributed and resilient services – is a poor substitute for an accountable infrastructure that prioritizes a distributed and healthy community.  The reason Twitter is a sewer isn’t that Twitter is centralized, it’s that Jack Dorsey doesn’t give a damn about policing his platform and Twitter’s board of directors doesn’t give a damn about changing his mind. Likewise, a big reason Mastodon is popular with the worst dregs of the otaku crowd is that if they’re on the right instance they’re free to recirculate shit that’s so reprehensible even Twitter’s boneless, soporific safety team can’t bring themselves to let it slide.

That’s the other part of federated systems we don’t talk about much – how much the burden of safety shifts to the individual. The cost of evolving federated systems that require consensus to interoperate is so high that structural flaws are likely to be there for a long time, maybe forever, and the burden of working around them falls on every endpoint to manage for themselves. IRC’s (Remember IRC?) ongoing borderline-unusability is a direct product of a notion of openness that leaves admins few better tools than endless spammer whack-a-mole. Email is (sort of…) decentralized, but can you imagine using it with your junkmail filters off?

I suppose I should tip my hand at this point, and say that as much as I value the source part of open source, I also believe that people participating in open source communities deserve to be free not only to change the code and build the future, but to be free from the brand of arbitrary, mechanized harassment that thrives on unaccountable infrastructure, federated or not. We’d be deluding ourselves if we called systems that are just too dangerous for some people to participate in at all “open” just because you can clone the source and stand up your own copy. And I am absolutely certain that if this free software revolution of ours ends up in a place where asking somebody to participate in open development is indistinguishable from asking them to walk home at night alone, then we’re done. People cannot be equal participants in environments where they are subject to wildly unequal risk. People cannot be equal participants in environments where they are unequally threatened. And I’d have a hard time asking a friend to participate in an exercise that had no way to ablate or even mitigate the worst actions of the internet’s worst people, and still think of myself as a friend.

I’ve written about this before:

I’d like you to consider the possibility that that’s not enough.

What if we agreed to expand what freedom could mean, and what it could be. Not just “freedom to” but a positive defense of opportunities to; not just “freedom from”, but freedom from the possibility of.

In the long term, I see that as the future of Mozilla’s responsibility to the Web; not here merely to protect the Web, not merely to defend your freedom to participate in the Web, but to mount a positive defense of people’s opportunities to participate. And on the other side of that coin, to build accountable tools, systems and communities that promise not only freedom from arbitrary harassment, but even freedom from the possibility of that harassment.

More generally, I still believe we should work in the open as much as we can – that “default to open”, as we say, is still the right thing – but I also think we and everyone else making software need to be really, really honest with ourselves about what open means, and what we’re asking of people when we use that word. We’re probably going to find that there’s not one right answer. We’re definitely going to have to build a bunch of new tools.  But we’re definitely not going to find any answers that matter to the present day, much less to the future, if the only place we’re looking is backwards.

[Feel free to email me, but I’m not doing comments anymore. Spammers, you know?]

August 15, 2018

Time Dilation

Filed under: academic,digital,documentation,interfaces,lunacy,mozilla,science,work — mhoye @ 11:17 am


[ https://www.youtube.com/embed/JEpsKnWZrJ8 ]

I riffed on this a bit over at twitter some time ago; this has been sitting in the drafts folder for too long, and it’s incomplete, but I might as well get it out the door. Feel free to suggest additions or corrections if you’re so inclined.

You may have seen this list of latency numbers every programmer should know, and I trust we’ve all seen Grace Hopper’s classic description of a nanosecond at the top of this page, but I thought it might be a bit more accessible to talk about CPU-scale events in human-scale transactional terms. So: if a single CPU cycle on a modern computer was stretched out as long as one of our absurdly tedious human seconds, how long do other computing transactions take?

If a CPU cycle is 1 second long, then:

  • Getting data out of L1 cache is about the same as getting your data out of your wallet; about 3 seconds.
  • At 9 to 10 seconds, getting data from L2 cache is roughly like asking your friend across the table for it.
  • Fetching data from the L3 cache takes a bit longer – it’s roughly as fast as having an Olympic sprinter bring you your data from 400 meters away.
  • If your data is in RAM you can get it in about the time it takes to brew a pot of coffee; this is how long it would take a world-class athlete to run a mile to bring you your data, if they were running backwards.
  • If your data is on an SSD, though, you can have it six to eight days, equivalent to having it delivered from the far side of the continental U.S. by bicycle, about as fast as that has ever been done.
  • In comparison, platter disks are delivering your data by horse-drawn wagon, over the full length of the Oregon Trail. Something like six to twelve months, give or take.
  • Network transactions are interesting – platter disk performance is so poor that fetching data from your ISP’s local cache is often faster than getting it from your platter disks; at two to three months, your data is being delivered to New York from Beijing, via container ship and then truck.
  • In contrast, a packet requested from a server on the far side of an ocean might as well have been requested from the surface of the moon, at the dawn of the space program – about eight years, from the beginning of the Apollo program to Armstrong, Aldrin and Collin’s successful return to earth.
  • If your data is in a VM, things start to get difficult – a virtualized OS reboot takes about the same amount of time as has passed between the Renaissance and now, so you would need to ask Leonardo Da Vinci to secretly encode your information in one of his notebooks, and have Dan Brown somehow decode it for you in the present? I don’t know how reliable that guy is, so I hope you’re using ECC.
  • That’s all if things go well, of course: a network timeout is roughly comparable to the elapsed time between the dawn of the Sumerian Empire and the present day.
  • In the worst case, if a CPU cycle is 1 second, cold booting a racked server takes approximately all of recorded human history, from the earliest Indonesian cave paintings to now.

March 24, 2017

Mechanized Capital

Construction at Woodbine Station

Elon Musk recently made the claim that humans “must merge with machines to remain relevant in an AI age”, and you can be forgiven if that doesn’t make a ton of sense to you. To fully buy into that nonsense, you need to take a step past drinking the singularity-flavored Effective Altruism kool-aid and start bobbing for biblical apples in it.

I’ll never pass up a chance to link to Warren Ellis’ NerdGod Delusion whenever this posturing about AI as an existential threat comes along:

The Singularity is the last trench of the religious impulse in the technocratic community. The Singularity has been denigrated as “The Rapture For Nerds,” and not without cause. It’s pretty much indivisible from the religious faith in describing the desire to be saved by something that isn’t there (or even the desire to be destroyed by something that isn’t there) and throws off no evidence of its ever intending to exist.

… but I think there’s more to this silliness than meets the rightly-jaundiced eye, particularly when we’re talking about far-future crypto-altruism as pitched by present-day billionaire industrialists.

Let me put this idea to you: one byproduct of processor in everything is that it has given rise to automators as a social class, one with their own class interests, distinct from both labor and management.

Marxist class theory – to pick one framing; there are a few that work here, and Marx is nothing if not quotable – admits the existence of management, but views it as a supervisory, quasi-enforcement role. I don’t want to get too far into the detail weeds there, because the most important part of management across pretty much all the theories of class is the shared understanding that they’re supervising humans.

To my knowledge, we don’t have much in the way of political or economic theory written up about automation. And, much like the fundamentally new types of power structures in which automators live and work, I suspect those people’s class interests are very different than those of your typical blue or white collar worker.

For example, the double-entry bookkeeping of automation is: an automator writes some code that lets a machine perform a task previously done by a human, or ten humans, or ten thousand humans, freeing those humans to… do what?

If you’re an automator, the answer to that is “write more code”. If you’re one of the people whose job has been automated away, it’s “starve”. Unless we have an answer for what happens to the humans displaced by automation, it’s clearly not some hypothetical future AI that’s going to destroy humanity. It’s mechanized capital.

Maybe smarter people than me see a solution to this that doesn’t result in widespread starvation and crushing poverty, but I only see one: an incremental and ongoing reduction in the supply of human labor. And in a sane society, that’s pretty straightforward; it means the progressive reduction of maximum hours in a workweek, women with control over their own bodies, a steadily rising minimum wage and a large, sustained investments in infrastructure and the arts. But for the most part we’re not in one of those societies.

Instead, what it’s likely to mean is much, much more of what we already have: terrified people giving away huge amounts of labor for free to barter with the machine. You get paid for a 35 hours week and work 80 because if you don’t the next person in line will and you’ll get zero. Nobody enforces anything like safety codes or labor laws, because once you step off that treadmill you go to the back of the queue, and a thousand people are lined up in front of you to get back on.

This is the reason I think this singularity-infected enlightened-altruism is so pernicious, and morally bankrupt; it gives powerful people a high-minded someday-reason to wash their hands of the real problems being suffered by real people today, problems that they’re often directly or indirectly responsible for. It’s a story that lets the people who could be making a difference today trade it in for a difference that might matter someday, in a future their sitting on their hands means we might not get to see.

It’s a new faith for people who think they’re otherwise much too evolved to believe in the Flying Spaghetti Monster or any other idiot back-brain cult you care to suggest.

Vernor Vinge, the originator of the term, is a scientist and novelist, and occupies an almost unique space. After all, the only other sf writer I can think of who invented a religion that is also a science-fiction fantasy is L Ron Hubbard.
– Warren Ellis, 2008

May 27, 2016

Developers Are The New Mainframes

Filed under: documentation,future,interfaces,lunacy,mozilla,science,weird,work — mhoye @ 3:20 pm

This is another one of those rambling braindump posts. I may come back for some fierce editing later, but in the meantime, here’s some light weekend lunacy. Good luck getting through it. I believe in you.

I said that thing in the title with a straight face the other day, and not without reason. Maybe not good reasons? I like the word “reason”, I like the little sleight-of-hand it does by conflating “I did this on purpose” and “I thought about this beforehand”. It may not surprise you to learn that in my life at least those two things are not the same at all. In any case this post by Moxie Marlinspike was rattling around in the back of my head when somebody asked me on IRC why it’s hard-and-probably-impossible to make a change to a website in-browser and send a meaningful diff back to the site’s author, so I rambled for a bit and ended up here.

This is something I’ve asked for in the past myself: something like dom-diff and dom-merge, so site users could share changes back with creators. All the “web frameworks” I’ve ever seen are meant to make development easier and more manageable but at the end of the day what goes over the wire is a pile of minified angle-bracket hamburger that has almost no connection the site “at rest” on the filesystem. The only way share a usable change with a site author, if it can be done at all, is to stand up a containerized version of the entire site and edit that. This disconnect between the scale of the change and the work needed to make it is, to put it mildly, a huge barrier to somebody who wants to correct a typo, tweak a color or add some alt-text to an image.

I ranted about this for a while, about how JavaScript has made classic View Source obsolete and how even if you had dom-diff and dom-merge you’d need a carefully designed JS framework underneath designed specifically to support them, and how it makes me sad that I don’t have the skill set or free time to make that happen. But I think that if you dig a little deeper, there are some cold economics underneath that whole state of affairs that are worth thinking about.

I think that the basic problem here is the misconception that federation is a feature of distributed systems. I’m pretty confident that it’s not; specifically, I believe that federated systems are a byproduct of computational scarcity.

Building and deploying federated systems has a bunch of hard tradeoffs around development, control and speed of iteration that people are stuck with when computation is so expensive that no single organization can have or do enough of it to give a service global reach. Usenet, XMPP, email and so forth were products of this mainframe-and-minicomputer era; the Web is the last and best of them.

Protocol consensus is hard, but not as hard or expensive as a room full of $40,000 or $4,000,000 computers, so you do that work and accept the fact that what you gain in distributed stability you lose in iteration speed and design flexibility. The nature of those costs means the pressure to get it pretty close to right on the first try is very high, because real opportunities to revisit will be rare and costly. You’re fighting your own established success at that point, and nothing in tech has more inertia than a status quo whose supporters think is good enough. (See also: how IPV6 has been “right around the corner” for 20 years.)

But that’s just not true anymore. If you need a few thousand more CPUs, you twiddle the dials on your S3 page and go back to unified deployment, rapid experimental iteration and trying to stay ahead of everyone else who’s doing the same. That’s how WhatsApp can deploy end to end encryption with one software update, just like that. It’s how Facebook can update a billion users’ experiences whenever they feel like it, and presumably how Twitter does whatever the hell Twitter’s doing this week. They don’t ask permission or seek consensus because they don’t have to; they deploy, test and iterate.

So the work that used to enable, support and improve federated systems now mostly exists where domain-computation is still scarce and expensive: the development process itself. Specifically the inside of developers heads, developers who stubbornly and despite our best efforts remain expensive, high-maintenance and relatively low-bandwidth, with lots of context and application-reasoning locked up in their heads and poorly distributed.

Which is to say: developers are the new mainframes.

Right now great majority of what they’re “connected” to from a development-on-device perspective are de-facto dumb terminals. Apps, iPads, Android phones. Web pages you can’t meaningfully modify for values of “meaningful” that involve upstreaming a diff. From a development perspective those are the endpoints of one-way transmissions, and there’s no way to duplex that line to receive development-effort back.

So, if that’s the trend – that is, if in general centralized-then-federated systems get reconsolidated in socially-oriented verticals, (and that’s what issue trackers are when compared to mailing lists) – then development as a practice is floating around the late middle step, but development as an end product – via cheap CPU and hackable IoT devices – that’s just getting warmed up. The obvious Next Thing in that space will be a resurgence of something like the Web, made of little things that make little decisions – effectively distributing, commodifying and democratizing programming as a product, duplexing development across those newly commodified development-nodes.

That’s the real revolution that’s coming, not the thousand-dollar juicers or the bluetooth nosehair trimmers, but the mess of tiny hackable devices that start to talk to each other via decentralized, ultracommodified feedback loops. We’re missing a few key components – bug trackers aren’t quite source-code-managers or social-ey, IoT build tools aren’t one-click-to-deploy and so forth, but eventually there will be a single standard for how these things communicate and run despite everyone’s ongoing efforts to force users into the current and very-mainframey vendor lock-in, the same way there were a bunch of proprietary transport protocols before TCP/IP settled the issue. Your smarter long-game players will be the ones betting on JavaScript to come out on top there, though it’s possible there will be other contenders.

The next step will be the social one, though “tribal” might be a better way of putting it – the eventual recentralization of this web of thing-code into cultural-preference islands making choices about how they speak to the world around them and the world speaks back. Basically a hardware scripting site with a social aspect built in, communities and trusted sources building social/subscriber model out for IoT agency. What the Web became and is still in a lot of ways becoming as we figure the hard part – the people at scale part, out. The Web of How Stuff Works.

Anyway, if you want to know what the next 15-20 years will look like, that’s the broad strokes. Probably more like 8-12, on reflection. Stuff moves pretty quick these days, but like I said, building consensus is hard. The hard part is always people. This is one of the reasons I think Mozilla’s mission is only going to get more important for the foreseeable future; the Web was the last and best of the federated systems, worth fighting for on those grounds alone, and we’re nowhere close to done learning everything it’s got to teach us about ourselves, each other and what it’s possible for us to become. It might be the last truly open, participatory system we get, ever. Consensus is hard and maybe not necessary anymore, so if we can’t keep the Web and the lessons we’ve learned and can still learn from it alive long enough to birth its descendants, we may never get a chance to build another system like it.

[minor edits since first publication. -mhoye]

July 24, 2015

Hostage Situation

(This is an edited version of a rant that started life on Twitter. I may add some links later.)

Can we talk for a few minutes about the weird academic-integrity hostage situation going on in CS research right now?

We share a lot of data here at Mozilla. As much as we can – never PII, not active security bugs, but anyone can clone our repos or get a bugzilla account, follow our design and policy discussions, even watch people design and code live. We default to open, and close up only selectively and deliberately. And as part of my job, I have the enormous good fortune to periodically go to conferences where people have done research, sometimes their entire thesis, based on our data.

Yay, right?

Some of the papers I’ve seen promise results that would be huge for us. Predicting flaws in a patch prereview. Reducing testing overhead 80+% with a four-nines promise of no regressions and no loss of quality.

I’m excitable, I get that, but OMFG some of those numbers. 80 percent reductions of testing overhead! Let’s put aside the fact that we spend a gajillion dollars on the physical infrastructure itself, let’s only count our engineers’ and contributors’ time and happiness here. Even if you’re overoptimistic by a factor of five and it’s only a 20% savings we’d hire you tomorrow to build that for us. You can have a plane ticket to wherever you want to work and the best hardware money can buy and real engineering support to deploy something you’ve already mostly built and proven. You want a Mozilla shirt? We can get you that shirt! You like stickers? We have stickers! I’ll get you ALL THE FUCKING STICKERS JUST SHOW ME THE CODE.

I did mention that I’m excitable, I think.

But that’s all I ask. I go to these conferences and basically beg, please, actually show me the tools you’re using to get that result. Your result is amazing. Show me the code and the data.

But that never happens. The people I talk to say I don’t, I can’t, I’m not sure, but, if…

Because there’s all these strange incentives to hold that data and code hostage. You’re thinking, maybe I don’t need to hire you if you publish that code. If you don’t publish your code and data and I don’t have time to reverse-engineer six years of a smart kid’s life, I need to hire you for sure, right? And maybe you’re not proud of the code, maybe you know for sure that it’s ugly and awful and hacks piled up over hacks, maybe it’s just a big mess of shell scripts on your lab account. I get that, believe me; the day I write a piece of code I’m proud of before it ships will be a pretty good day.

But I have to see something. Because from our perspective, making a claim about software that doesn’t include the software you’re talking about is very close to worthless. You’re not reporting a scientific result at that point, however miraculous your result is; you’re making an unverifiable claim that your result exists.

And we look at that and say: what if you’ve got nothing? How can we know, without something we can audit and test? Of course, all the supporting research is paywalled PDFs with no concomitant code or data either, so by any metric that matters – and the only metric that matters here is “code I can run against data I can verify” – it doesn’t exist.

Those aren’t metrics that matter to you, though. What matters to you is either “getting a tenure-track position” or “getting hired to do work in your field”. And by and large the academic tenure track doesn’t care about open access, so you’re afraid that actually showing your work will actively hurt your likelihood of getting either of those jobs.

So here we are in this bizarro academic-research standoff, where I can’t work with you without your tipping your hand, and you can’t tip your hand for fear I won’t want to work with you. And so all of this work that could accomplish amazing things for a real company shipping real software that really matters to real people – five or six years of the best work you’ve ever done, probably – just sits on the shelf rotting away.

So I go to academic conferences and I beg people to publish their results and paper and data open access, because the world needs your work to matter. Because open access plus data/code as a minimum standard isn’t just important to the fundamental principles of repeatable experimental science, the integrity of your field, and your career. It’s important because if you want your work to matter to people, then you’d better put it somewhere that people can see it and use it and thank you for it and maybe even improve on it.

You did this as an undergrad. You insist on this from your undergrads, for exactly the same reasons I’m asking you to do the same: understanding, integrity and plain old better results. And it’s a few clicks and a GitHub account for you to do the same now. But I need you to do it one last time.

Full marks here isn’t “a job” or “tenure”. Your shot at those will be no worse, though I know you can’t see it from where you’re standing. But they’re still only a strong B. An A is doing something that matters, an accomplishment that changes the world for the better.

And if you want full marks, show your work.

October 3, 2014

Rogue Cryptocurrency Bootstrapping Robots

Cuban Shoreline

I tried to explain to my daughter why I’d had a strange day.

“Why was it strange?”

“Well… There’s a thing called a cryptocurrency. ‘Currency’ is another word for money; a cryptocurrency is a special kind of money that’s made out of math instead of paper or metal.”

That got me a look. Money that’s made out of made out of math, right.

“… and one of the things we found today was somebody trying to make a new cryptocurrency. Now, do you know why money is worth anything? It’s a coin or a paper with some ink on it – what makes it ‘money’?”

“… I don’t know.”

“The only answer we have is that it’s money if enough people think it is. If enough people think it’s real, it becomes real. But making people believe in a new kind of money isn’t easy, so what this guy did was kind of clever. He decided to give people little pieces of his cryptocurrency for making contributions to different software projects. So if you added a patch to one of the projects he follows, he’d give you a few of these math coins he’d made up.”

“Um.”

“Right. Kind of weird. And then whoever he is, he wrote a program to do that automatically. It’s like a little robot – every time you change one of these programs, you get a couple of math coins. But the problem is that we update a lot of those programs with our robots, too. Our scripts run, our robots, and then his robots try to give our robots some of his pretend money.”

“…”

“So that’s why my day was weird. Because we found somebody else’s programs trying to give our programs made-up money, in the hope that this made-up money would someday become real.”

“Oh.”

“What did you to today?”

“I painted different animals and gave them names.”

“What kind of names?”

“French names like zaval.”

“Cheval. Was it a good day?”

“Yeah, I like painting.”

“Good, good.”

(Charlie Stross warned us about this. It’s William Gibson’s future, but we still need to clean up after it.)

Older Posts »

Powered by WordPress