Citation Needed

I may revisit this later. Consider this a late draft. I’m calling this done.

“Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration.” — Stan Kelly-Bootle

Sometimes somebody says something to me, like a whisper of a hint of an echo of something half-forgotten, and it lands on me like an invocation. The mania sets in, and it isn’t enough to believe; I have to know.

I’ve spent far more effort than is sensible this month crawling down a rabbit hole disguised, as they often are, as a straightforward question: why do programmers start counting at zero?

Now: stop right there. By now your peripheral vision should have convinced you that this is a long article, and I’m not here to waste your time. But if you’re gearing up to tell me about efficient pointer arithmetic or binary addition or something, you’re wrong. You don’t think you’re wrong and that’s part of a much larger problem, but you’re still wrong.

For some backstory, on the off chance anyone still reading by this paragraph isn’t an IT professional of some stripe: most computer languages including C/C++, Perl, Python, some (but not all!) versions of Lisp, many others – are “zero-origin” or “zero-indexed”. That is to say, in an array A with 8 elements in it, the first element is A[0], and the last is A[7]. This isn’t universally true, though, and other languages from the same (and earlier!) eras are sometimes one-indexed, going from A[1] to A[8].

While it’s a relatively rare practice in modern languages, one-origin arrays certainly aren’t dead; there’s a lot of blood pumping through Lua these days, not to mention MATLAB, Mathematica and a handful of others. If you’re feeling particularly adventurous Haskell apparently lets you pick your poison at startup, and in what has to be the most lunatic thing I’ve seen on a piece of silicon since I found out the MIPS architecture had runtime-mutable endianness, Visual Basic (up to v6.0) featured the OPTION BASE flag, letting you flip that coin on a per-module basis. Zero- and one-origin arrays in different corners of the same program! It’s just software, why not?

All that is to say that starting at 1 is not an unreasonable position at all; to a typical human thinking about the zeroth element of an array doesn’t make any more sense than trying to catch the zeroth bus that comes by, but we’ve clearly ended up here somehow. So what’s the story there?

The usual arguments involving pointer arithmetic and incrementing by sizeof(struct) and so forth describe features that are nice enough once you’ve got the hang of them, but they’re also post-facto justifications. This is obvious if you take the most cursory look at the history of programming languages; C inherited its array semantics from B, which inherited them in turn from BCPL, and though BCPL arrays are zero-origin, the language doesn’t support pointer arithmetic, much less data structures. On top of that other languages that antedate BCPL and C aren’t zero-indexed. Algol 60 uses one-indexed arrays, and arrays in Fortran are arbitrarily indexed – they’re just a range from X to Y, and X and Y don’t even need to be positive integers.

So by the early 1960’s, there are three different approaches to the data structure we now call an array.

  • Zero-indexed, in which the array index carries no particular semantics beyond its implementation in machine code.
  • One-indexed, identical to the matrix notation people have been using for quite some time. It comes at the cost of a CPU instruction or disused word to manage the offset; usability isn’t free.
  • Arbitrary indices, in which the range is significant with regards to the problem you’re up against.

So if your answer started with “because in C…”, you’ve been repeating a good story you heard one time, without ever asking yourself if it’s true. It’s not about *i = a + n*sizeof(x) because pointers and structs didn’t exist. And that’s the most coherent argument I can find; there are dozens of other arguments for zero-indexing involving “natural numbers” or “elegance” or some other unresearched hippie voodoo nonsense that are either wrong or too dumb to rise to the level of wrong.

The fact of it is this: before pointers, structs, C and Unix existed, at a time when other languages with a lot of resources and (by the standard of the day) user populations behind them were one- or arbitrarily-indexed, somebody decided that the right thing was for arrays to start at zero.

So I found that person and asked him.

His name is Dr. Martin Richards; he’s the creator of BCPL, now almost 7 years into retirement; you’ve probably heard of one of his doctoral students Eben Upton, creator of the Raspberry Pi. I emailed him to ask why he decided to start counting arrays from zero, way back then. He replied that…

As for BCPL and C subscripts starting at zero. BCPL was essentially designed as typeless language close to machine code. Just as in machine code registers are typically all the same size and contain values that represent almost anything, such as integers, machine addresses, truth values, characters, etc. BCPL has typeless variables just like machine registers capable of representing anything. If a BCPL variable represents a pointer, it points to one or more consecutive words of memory. These words are the same size as BCPL variables. Just as machine code allows address arithmetic so does BCPL, so if p is a pointer p+1 is a pointer to the next word after the one p points to. Naturally p+0 has the same value as p. The monodic indirection operator ! takes a pointer as it’s argument and returns the contents of the word pointed to. If v is a pointer !(v+I) will access the word pointed to by v+I. As I varies from zero upwards we access consecutive locations starting at the one pointed to by v when I is zero. The dyadic version of ! is defined so that v!i = !(v+I). v!i behaves like a subscripted expression with v being a one dimensional array and I being an integer subscript. It is entirely natural for the first element of the array to have subscript zero. C copied BCPL’s approach using * for monodic ! and [ ] for array subscription. Note that, in BCPL v!5 = !(v+5) = !(5+v) = 5!v. The same happens in C, v[5] = 5[v]. I can see no sensible reason why the first element of a BCPL array should have subscript one. Note that 5!v is rather like a field selector accessing a field in a structure pointed to by v.

This is interesting for a number of reasons, though I’ll leave their enumeration to your discretion. The one that I find most striking, though, is that this is the earliest example I can find of the understanding that a programming language is a user interface, and that there are difficult, subtle tradeoffs to make between resources and usability. Remember, all this was at a time when everything about the future of human-computer interaction was up in the air, from the shape of the keyboard and the glyphs on the switches and keycaps right down to how the ones and zeros were manifested in paper ribbon and bare metal; this note by the late Dennis Ritchie might give you a taste of the situation, where he mentions that five years later one of the primary reasons they went with C’s square-bracket array notation was that it was getting steadily easier to reliably find square brackets on the world’s keyboards.

“Now just a second, Hoye”, I can hear you muttering. “I’ve looked at the BCPL manual and read Dr. Richards’ explanation and you’re not fooling anyone. That looks a lot like the efficient-pointer-arithmetic argument you were frothing about, except with exclamation points.” And you’d be very close to right. That’s exactly what it is – the distinction is where those efficiencies take place, and why.

BCPL was first compiled on an IBM 7094here’s a picture of the console, though the entire computer took up a large room – running CTSS – the Compatible Time Sharing System – that antedates Unix much as BCPL antedates C. There’s no malloc() in that context, because there’s nobody to share the memory core with. You get the entire machine and the clock starts ticking, and when your wall-clock time block runs out that’s it. But here’s the thing: in that context none of the offset-calculations we’re supposedly economizing are calculated at execution time. All that work is done ahead of time by the compiler.

You read that right. That sheet-metal, “wibble-wibble-wibble” noise your brain is making is exactly the right reaction.

Whatever justifications or advantages came along later – and it’s true, you do save a few processor cycles here and there and that’s nice – the reason we started using zero-indexed arrays was because it shaved a couple of processor cycles off of a program’s compilation time. Not execution time; compile time.

Does it get better? Oh, it gets better:

IBM had been very generous to MIT in the fifties and sixties, donating or discounting its biggest scientific computers. When a new top of the line 36-bit scientific machine came out, MIT expected to get one. In the early sixties, the deal was that MIT got one 8-hour shift, all the other New England colleges and universities got a shift, and the third shift was available to IBM for its own use. One use IBM made of its share was yacht handicapping: the President of IBM raced big yachts on Long Island Sound, and these boats were assigned handicap points by a complicated formula. There was a special job deck kept at the MIT Computation Center, and if a request came in to run it, operators were to stop whatever was running on the machine and do the yacht handicapping job immediately.

Jobs on the IBM 7090, one generation behind the 7094, were batch-processed, not timeshared; you queued up your job along with a wall-clock estimate of how long it would take, and if it didn’t finish it was pulled off the machine, the next job in the queue went in and you got to try again whenever your next block of allocated time happened to be. As in any economy, there is a social context as well as a technical context, and it isn’t just about managing cost, it’s also about managing risk. A programmer isn’t just racing the clock, they’re also racing the possibility that somebody will come along and bump their job and everyone else’s out of the queue.

I asked Tom Van Vleck, author of the above paragraph and also now retired, how that worked. He replied in part that on the 7090…

“User jobs were submitted on cards to the system operator, stacked up in a big tray, and a rudimentary system read, loaded, and ran jobs in sequence. Typical batch systems had accounting systems that read an ID card at the beginning of a user deck and punched a usage card at end of job. User jobs usually specified a time estimate on the ID card, and would be terminated if they ran over. Users who ran too many jobs or too long would use up their allocated time. A user could arrange for a long computation to checkpoint its state and storage to tape, and to subsequently restore the checkpoint and start up again.

The yacht handicapping job pertained to batch processing on the MIT 7090 at MIT. It was rare — a few times a year.”

So: the technical reason we started counting arrays at zero is that in the mid-1960’s, you could shave a few cycles off of a program’s compilation time on an IBM 7094. The social reason is that we had to save every cycle we could, because if the job didn’t finish fast it might not finish at all and you never know when you’re getting bumped off the hardware because the President of IBM just called and fuck your thesis, it’s yacht-racing time.

There are a few points I want to make here.

The first thing is that as far as I can tell nobody has ever actually looked this up.

Whatever programmers think about themselves and these towering logic-engines we’ve erected, we’re a lot more superstitious than we realize. We tell and retell this collection of unsourced, inaccurate stories about the nature of the world without ever doing the research ourselves, and there’s no other word for that but “mythology”. Worse, by obscuring the technical and social conditions that led humans to make these technical and social decisions, by talking about the nature of computing as we find it today as though it’s an inevitable consequence of an immutable set of physical laws, we’re effectively denying any responsibility for how we got here. And worse than that, by refusing to dig into our history and understand the social and technical motivations for those choices, by steadfastly refusing to investigate the difference between a motive and a justification, we’re disavowing any agency we might have over the shape of the future. We just keep mouthing platitudes and pretending the way things are is nobody’s fault, and the more history you learn and the more you look at the sad state of modern computing the the more pathetic and irresponsible that sounds.

Part of the problem is access to the historical record, of course. I was in favor of Open Access publication before, but writing this up has cemented it: if you’re on the outside edge of academia, $20/paper for any research that doesn’t have a business case and a deep-pocketed backer is completely untenable, and speculative or historic research that might require reading dozens of papers to shed some light on longstanding questions is basically impossible. There might have been a time when this was OK and everyone who had access to or cared about computers was already an IEEE/ACM member, but right now the IEEE – both as a knowledge repository and a social network – is a single point of a lot of silent failure. “$20 for a forty-year-old research paper” is functionally indistinguishable from “gone”, and I’m reduced to emailing retirees to ask them what they remember from a lifetime ago because I can’t afford to read the source material.

The second thing is how profoundly resistant to change or growth this field is, and apparently has always been. If you haven’t seen Bret Victor’s talk about The Future Of Programming as seen from 1975 you should, because it’s exactly on point. Over and over again as I’ve dredged through this stuff, I kept finding programming constructs, ideas and approaches we call part of “modern” programming if we attempt them at all, sitting abandoned in 45-year-old demo code for dead languages. And to be clear: that was always a choice. Over and over again tools meant to make it easier for humans to approach big problems are discarded in favor of tools that are easier to teach to computers, and that decision is described as an inevitability.

This isn’t just Worse Is Better, this is “Worse Is All You Get Forever”. How many off-by-one disasters could we have avoided if the “foreach” construct that existed in BCPL had made it into C? How much more insight would all of us have into our code if we’d put the time into making Michael Chastain’s nearly-omniscient debugging framework – PTRACE_SINGLESTEP_BACKWARDS! – work in 1995? When I found this article by John Backus wondering if we can get away from Von Neumann architecture completely, I wonder where that ambition to rethink our underpinnings went. But the fact of it is that it didn’t go anywhere. Changing how you think is hard and the payoff is uncertain, so by and large we decided not to. Nobody wanted to learn how to play, much less build, Engelbart’s Violin, and instead everyone gets a box of broken kazoos.

In truth maybe somebody tried – maybe even succeeded! – but it would cost me hundreds of dollars to even start looking for an informed guess, so that’s the end of that.

It’s hard for me to believe that the IEEE’s membership isn’t going off a demographic cliff these days as their membership ages, and it must be awful knowing they’ve got decades of delicious, piping-hot research cooked up that nobody is ordering while the world’s coders are lining up to slurp watery gruel out of a Stack-Overflow-shaped trough and pretend they’re well-fed. You might not be surprised to hear that I’ve got a proposal to address both those problems; I’ll let you work out what it might be.


  1. Michael Anthony
    Posted November 18, 2013 at 4:42 pm | Permalink

    I’m ambivalent on this, well I’m the sort of person that would give his right arm to be ambidextrous, but would i still be?

    People get lost in tall buildings because of zero indexing of the floors / levels / storeys.
    During the town planning stages Ground Floor is usually ( in most countries ) exempt from rateable floor area because it is there anyway without the building. So it gets to be floor 0.

    At construction ( analogous to computer technicians ) they just roll with whatever. I doubt the boffins would have had any advantage in starting at 1 index when interrupted for the yacht interrupt ). But the builder has to build the 0 floor, it has costs to pass on to the client. A 3 storey building under the above designation has 4 levels of cost. He sees ground floor as Floor 1

    The lift manufacturer gets handed the specification for marking the floors, depending on who that spec came from; the architect or the builder.

    So the end user, Joe Public who is quite separate from the development (compiler) process could either “naturally” see First-Floor as the first floor in the air, or, the belonging to the first door in that she encounters. Pizza deliveries meant for the ninth floor gets delivered to the eight. It all depends on whether his or her’s perception is history based ( book learnin’), or for-present-analytical based (free-thinker,fly by the seat of m’ pants rebel type)

    I’m done

  2. dr2chase
    Posted November 20, 2013 at 10:36 pm | Permalink

    I think the comments about “close to the machine” and mentioning the number of indexing modes are on point; that is surely the impression I got of BCPL, way back when.

    A couple of other notes: it is my distinct recollection that LET A = VEC N
    would allocate memory for cells at addresses A+0 through A+N; that is VEC N allocates N+1 cells.

    My memory of string addressing in BCPL is a little hazy, because cells were typically larger than bytes, and I have a distinct recollection that their length did not exceed 255, because the byte zero contained the length.

    And though I recall a desire to be “close to the machine”, there were was also INTCODE, for interpreting BCPL, and at one time students used a machine whose entire operating system (OS6) was executed as INTCODE (for a 15x performance penalty — _The Computer Journal_, 15:2, p.118 ). My copies of other Historical Documents are at work, not here, but you might be amused to know that one summer I worked at IBM San Jose Research Labs, and hacked up an interpreter for Backus’s FP84 — in BCPL, on VM/CMS. The BCPL port to OS/360 and VM/CMS was quite nice, and gave clean access to many features of the OS.

    And lo, some of the Historical Documents are now available as PDF (sadly, not indexed for search — yet)

  3. Claude
    Posted November 21, 2013 at 1:30 am | Permalink

    Thanks for digging this up! From the comments it is certainly polarizing, as in 0 against 1.

    I think of 0-based indexes to be pointing to the boundary, between the array elements, and 1-based indexes to be pointing to the center of the array elements.

  4. Ashley Yakeley
    Posted November 21, 2013 at 3:15 am | Permalink

    Clearly bytes should be one-based: they have 256 possible values, and they are most easily counted 1,2,3…256. I wonder who first came up with the idea of zero-based bytes?

    The Romans used one-based addition, for example, the day after tomorrow is three days away (those three days are today, tomorrow, the day after), while going up the diatonic musical scale an octave is, like its name suggests, an interval of eight notes. However we seem to have switched to a less intuitive zero-based system since…

  5. Martin Roller
    Posted November 21, 2013 at 4:50 am | Permalink

    Ah, I was waiting for the elevator argument. I remember in Siemens (who invented the electric elevator) office buildings they start numbering floors with 2 at the ground floor, even if there are no floors below. For Siemens employees this seemed very natural.

  6. foobar
    Posted November 21, 2013 at 5:24 am | Permalink

    As for IEEE and ACM … yes, it’s highly ironic how this field completely fails to leverage the technology it has created for advancing itself.

    As for your array index rant, though, it seems to me like you missed the point.

    First, let me point out a myth that you are perpetuating: “first” does not mean “the element with index 1”, you are mixing up ordinal and cardinal numbers there.

    “First” means “the element that has no predecessor”, which would be the element with index 20 for an array that is indexed starting at 20, the element with index 1 for an array that is indexed starting at 1, the element with index -42 for an array that is indexed starting at -42, and the element with index 0 for an array that is indexed starting at 0. Or to give an even more illustrative example: “A” is not the “Ath letter of the alphabet”, it’s the “first letter of the alphabet” – indices are simply names, possibly with some defined order, and there is no fixed relationship between those names and the ordinal numbers specifying their position in the respective order.

    Thus, there also is no such thing as “zeroth”, other than in some colloquial or highly confused use.

    Your confusion in that regard probably stems from the fact that the most common everyday use of ordinal numbers is for referring to elements of some collection of (material) objects, and it so happens that in a collection consisting of a single object, that object is the first object (its ordinal number), and the corresponding size of the collection is 1 (a cardinal number), if you add a second object, the size of the collection grows to 2, and so on. So, “1” and “first”, “2” and “second”, […] tend to coincide in this manner, but that really is not much more than a coincidence.

    Now, it also so happens that most programming languages do not distinguish explicitly between ordinal and cardinal numbers, but rather just use some subset type of the integers both for arithmetic and for naming elements of arrays. As such, they need to define some mapping between the two uses – which usually happens by simply designating some cardinal number to be the index of the first element of an array and using the same order for both. From that it may seem like any choice would really be as good as any other, it’s just an ordered set of names after all, isn’t it? Well, yes, and no.

    Many commenters refer to matrix algebra’s use of 1-based indices. If you think about it, that really is just an exercise in naming and ordering, you might as well use letters as indices, any sufficiently large ordered set would do. For this kind of use, it really doesn’t matter. It’s just one step up from a dictionary, in that the keys do have a defined order.

    But there is another use case for array indexing: Algorithms where you do arithmetic with indices – and algebraically, 0 and 1 are very different beasts indeed, and they are both very special, too, being the neutral elements of addition and multiplication and the absorbing element of multiplication. Whether indexing starting at 0 or starting at 1 is easier depends a bit on the data structure you are implementing on top of the array, though, depending on whether common relations between indices are multiplicative (trees) or additive (non-trees ;-).

    However, if you consider building an array on top of another array (which is a very common thing to do, even as part of tree implementations), which happens by specifying an offset of the abstract array relative to the underlying array, you will notice that the underlying computation is an addition, and so the neutral element of that abstraction is the 0. And that is what I would assume Mr. Richards really means – it just so happens that building a PL array on top of RAM is also essentially “building an array on top of an array”, but it’s not natural to use 0-based indexing because the hardware works this way, but rather the hardware works this way because it is natural for this very common abstraction, common both in hardware and in software, and if it’s good for “hardware arrays” to work this way, why shouldn’t the same reason be a good reason for “software arrays” to work the same way? (Well, there might be reasons, of course, but my point is that the hardware is the way it is for a reason, not purely by accident.)

    And yes, this abstraction is essentially “pointer arithmetic”, however you don’t need any kind of dynamic memory allocation in your language to do “pointer arithmetic” – any array index is a pointer, just relative to an array at the language level instead of to “the process address space” or “the physical RAM” (the former also being just an array built upon the latter array, more or less). As soon as you have one array, you can build other arrays upon it by adding offets, and you can do index (pointer) arithmetic for element access.

    That the folklore around this is commonly wildly misleading and not very well understood may still be true, though ;-)

  7. clord
    Posted November 21, 2013 at 12:04 pm | Permalink

    I work on compilers (C++ and Fortran) for my day-job (yes people still (barely) paid to write them) and I have a bit of perspective on the issue. It’s not just a compilation time issue. Let’s tackle the reasons first.

    Zero-based indexing has some non-historical reasons. In my view it is a consequence of:

    1) how we assigned numbers to bit patterns,
    2) the desire to fully utilize all of our machine,
    3) simplicity of implementation at all levels

    In binary, zero is the *first* pattern. There is the origin of the off-by-one error right there in my opinion. Everything else flows out of that choice. And it’s pretty much unavoidable, since zero needs some representation in the machine. There are two domains here: the integers, and the bit patterns representing them. Decimal digits have this problem too, but no one uses decimal digits in the way we use decimal digits. e.g., 0 is the first digit, then 1 is the second, third is 2. When we index arrays, we’re indexing with bit-patterns, not with integers.

    So why not just change the patterns to match how we expect? Registers hold bit-patterns, not numbers. And the number of bit-patterns a register can hold is a precious resource — using all of the bits in that register is important to people. If we 1-index (by doing a decrement on the index before dereference), then we throw out one whole bit-pattern as meaningless during address computations (the zero). Because of the decrement, index 2^n is not representable in a n-bit register, which means not all of memory can be reached by pointer arithmetic. Some people say ‘use the zero-pattern for that last element’.

    That’s where #3 comes in to play. CPU designer says to that: “why should my CPU have to have special indexing hardware? Aren’t we claiming to be RISC? Just use math operators.” And then the language designer says “Well, CPU provides these math operators, and I don’t want to impose any performance penalty on my users just for natural indexing, so i’ll pass the buck to the user.”

    If you want to design a language that fixes this, the Fortran way is correct IMO. Let the user pick their bounds in the domain of integers, and provide sensible defaults. The language can flatten out to the domain of bit-patterns in the machine. This way the compiler can constant-fold the transformation to the under-ground zero-based-indexing. Also this approach generalizes out to multi-dimensional matrices quite nicely (perhaps with annotations about how to map the indexing domain to the bit-pattern domain for cache-line reasons — an optimization our compiler does routinely for Fortran code, but which is impossible in C/C++ code, another reason Fortran is still the fastest language we ship).

  8. bmm6o
    Posted November 21, 2013 at 1:02 pm | Permalink

    Fascinating article. Count me among the crowd that would like to see further clarification of this:

    “But here’s the thing: in that context none of the offset-calculations we’re supposedly economizing are calculated at execution time. All that work is done ahead of time by the compiler”

    Are you saying that the compiler converts x!y to something other than the machine equivalent of “load x+y”? Surely it can’t always pre-compute the sum. Unless your point is that inserting the “subtract 1” instruction would slow down the compile time more than run time? I feel like I’m missing some point that you think is obvious.