ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


How an Accident of Hardware Design Encouraged Open Source

by Mark Rosenthal
02/22/2007

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I,
I took the one less traveled by,
And that has made all the difference.
—Robert Frost

Back in the early 1970s, the hardware engineers at Digital Equipment Corporation made a decision about how their new computer, the PDP-11, would address memory. I believe their decision had the unintended, butterfly-effect consequence of helping to bring the open source software movement into existence.

A Bit of History

In the 1970s, IBM was the behemoth of computing. It sold big expensive mainframe computers known as the IBM 360, and later the IBM 370. By contrast, Digital Equipment Corporation was the prevalent company selling small, inexpensive minicomputers. DEC's PDP-8 was to computing what Ford's Model-T had been to automobiles. Although very limited in its capabilities (its opcode was only 3 bits, and a typical configuration was either 8K or 12K 12-bit words), it was cheap in comparison to IBM's big iron. It was inexpensive enough that virtually every college in the country was able to afford several of them. DEC followed the PDP-8 with the PDP-11; its low price and nicely orthogonal instruction set made it DEC's biggest seller throughout the 1970s.

For computers produced prior to IBM's 360 and DEC's PDP-11, the smallest addressable unit of memory was whatever the processor's wordsize happened to be, which generally wasn't a multiple of 8 bits. One architecture used a 36-bit word; another used a 12-bit word. This made it cumbersome to write software that manipulated text. For example, Figure 1 shows the PDP-8's representation of the string DIGITAL. To write code to walk through the characters of a string, you had to fetch the low order 8 bits of the first word of the string, then the low order 8 bits of the second word of the string, then glue together the high order 4 bits of the first word to the high order 4 bits of the second word. For the fourth character, you started the process all over again.

data
G
(hi bits)
D
0100 01000100
G
(low bits)
I
0111 01001001
A
(hi bits)
I
0100 01001001
A
(low bits)
T
0001 01010100
 
 
L
0000 01001100
word address 0 1 2 3 4

Figure 1. Character string representation of "DIGITAL" as packed into PDP-8 12-bit words.

Needless to say, writing code to manipulate strings on such an architecture was quite cumbersome. Although the idea may seem self-evident today, giving each byte its own unique address was an important innovation. The 360 and the PDP-11 were the first machines IBM and DEC had produced with memory organized into 8-bit groups known as bytes, and an addressing scheme that allowed each byte of memory to be individually addressed. But the IBM and DEC hardware engineers made different decisions about how to number the bytes within words (larger groupings of bytes). The decision by the PDP11's designers to not use the same scheme used by the IBM 360 had far-reaching and unforeseeable consequences.

Little-Endian vs. Big-Endian Byte Order

The IBM 360 designers had numbered the bytes within a word (4 bytes on the 360) based on the way English words are written -- from left to right. So if text is stored in a word of memory, the first character is stored in the byte that would hold the most significant bits of an integer if that word were used to store a binary integer.

By convention, each bit in a word is numbered according to the power of 2 that it represents. Thus, in a four-byte word, the lowest order bit is numbered 0 (because its value is 1, that is 20), and the highest order bit is numbered 31 (because its value in an unsigned integer is 231) (see Figure 2).

Because the design that numbered the bytes left-to-right but numbered the bits right-to-left was IBM's mainframe architecture, this came to be known as big-endian byte order.

character data (the word UNIX) U N I X
binary data (the
number 259)
 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  1  1
byte address 0 1 2 3
bit number 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0

Figure 2. Bit-numbering and byte-numbering on the IBM-360, and the word
"UNIX" and the binary representation of 259 as stored in one word on the IBM-360

The PDP11 designers, on the other hand, thought it more logical to number the bytes within a word in the same order as the bits within a byte. Being an inexpensive minicomputer, the PDP11 had 16-bit words instead of 32-bit words. In a 2-byte integer, the byte that held the least significant bits had a lower numbered address than the byte that held the most significant bits.

binary data (the
number 259)
 0  0  0  0  0  0  0  1  0  0  0  0  0  0  1  1
byte address 1 0
bit number 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0

Figure 3. Bit-numbering and byte-numbering on the DEC PDP-11, and the
binary representation of 259 as stored in one word on the DEC PDP-11.

character data (the word UNIX) X I N U
binary data (the number 259)  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  1  1
byte address 3 2 1 0
bit number 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0

Figure 4. Bit-numbering and byte-numbering on the DEC PDP-11, and the
word "UNIX" as stored in two words on the DEC PDP-11.

Because the design that numbered both the bytes and the bits right-to-left was DEC's minicomputer architecture, this came to be known as little-endian byte order.

The PDP-11's byte order being exactly backward from the IBM 360's byte order is what created the little-endian/big-endian byte-ordering nightmare. In the 1980s, I ported a lot of C code from one machine to another, and at least half of my effort went into fixing byte-order dependencies in the code.

Pages: 1, 2

Next Pagearrow





Sponsored by: