What Is A Byte?

Before exploring the various databending experiments I have pursued, it is helpful to understand a little bit about how data is stored on a computer's disk drive and how that data can later be read back and interpreted in meaningful ways for the human user.

Modern computers store and retrieve data in a binary encoding using only 0s and 1s. Each 0/1 is called a "bit" and each grouping of 8 bits is called a "byte". When saving a file to disk or transmitting data over a network connection, bytes are used; Computers do not typically send/receive/manipulate individual bits, but rather each 8-bit byte represents one discreet datum in a computer. It is not possible to save a file to your computer which is less than one byte; If you only need 4 bits, 8 will still be written. When you view the details of a file, its size is shown in bytes and there are (SIZE * 8) bits stored on disk for that file. The resaons for this are beyond the scope of this document.

Files are each just long numbers, possibly with leading (left-most) zeros padding up one or more bytes, and can be converted as such.

Counting In Binary

Humans worldwide typically count up numbers using a decimal or "Base 10" numbering scheme because we typically have ten fingers and ten toes; It is easy for us to count this way. Thus we use ten digits 0,1,2,3,4,5,6,7,8,9 and when we reach number ten we start over in the second position: "10".

It is important to recognize what is happening here: After exhausting all possible digits, we expand into another numeric column starting again at "1". We do the same when we exhaust the next column, and after "99" comes "100". Another way of describing the number ten is to say that it is the sum of (zero ones + one ten), and another way of describing the number one hundred twenty eight is to write "128" or also (one hundred + two tens + eight ones).

Binary or "Base 2" works this way too, but with only two digits 0,1 instead of ten. Just like decimal Base 10 the first numbers look the same: zero is written "0" and one is written "1". But then we have exhausted all two digits and we must expand to the next column, so two in binary is "10" and three is "11", then the digits are exhausted and we expand again meaning four is "100" and five is "101" and six is "110" and seven is "111", and so on...

This example might help (I hope so):

English Decimal In other words (Base 10) Binary In other words (Base 2)
Five 5 One five 101 One four
+ Zero twos
+ One one
Twelve 12 One ten
+ Two ones
1100 One eight
+ One four
+ Zero twos
One hundred eight 108 One hundred
+ Zero tens
+ Eight ones
1101100 One sixty-four
+ One thirty-two
+ Zero sixteens
+ One eight
+ One four
+ Zero twos
+ Zero ones

Note that each column of Base 10 decimal numbers is a power of ten (1000 = 103 = one thousand), while each column of Base 2 binary numbers is a power of two (1000 = 23 = eight).

Because computers store bytes rather than individual bits, each byte is a discreet representation of a value in the range (0 <= x <= 255) because 28 = 256.

Storing Negative Signed Numbers

The examples above are all positive integer values, but what happens when we need to store negative numbers in binary?

In computing and mathematics, numbers having only positive values (0+) are called "unsigned" and numbers having a possible negative sign in front of them such as "-5" are called "signed". Because a computer data file uses only two digits or "glyphs" for EVERYTHING, how is it possible to store and retrieve a negative value using only "0" and "1" with no "-"? When a number is stored with a negative sign, one bit of the value is sacrificed for storing the sign: "0" for positive or "1" for negative. In this fashion, an unsigned byte is in the range (0 <= x < 256) while a signed byte may represent numbers in the range (-128 <= x < 128); There are still 256 possible values, but since one bit is reserved for negativity the remaining seven bits represent only 0 - 127 (27-1) instead of 0 - 255 (28-1).

It is slightly more complex than explained here, and if you are curious why the binary byte "00000000" is zero and the byte "10000000" is negative zero, the common solution for this peculiarity is called two's complement.

My examples all use Unsigned integer data interpretations, because it is much easier to understand what is happening this way.

Byte Ordering

Some computers such as the popular Pentium CPU from Intel store multi-byte values in a mixed-up ordering, often referred to as "Little-Endian". Other computers use a straight ordering as humans would write on paper, often called "Big-Endian". The byte ordering of a certain CPU is called Endianness.

The specifics of byte-ordering are especially important when databending, because the bytes may not be in the order we expect as humans.

For the examples and experiments in this project however, byte-ordering "endianness" is not so important. What IS important is the order of bits within each byte, called "significance". When we write a decimal Base 10 number on paper, the "Most Significant" value is on the left and the "Least Significant" value is on the right. For example, in the number one thousand two hundred thirty four written "1234" the "1" is the Most Significant number because it represents how many Thousands are included, while the "4" is the Least Significant number because it represents how many Ones are included; Thus changing the "1" to a "2" has a much higher significance on the overall value than changing the "4" to a "5". In other words 2234 is much more different or further away from 1234 than 1235 is from 1234.

The same is true in binary numbers; The left-most bit of a byte is therefore called the Most Significant Bit or "MSB", while the right-most is the Least Significant Bit or "LSB".

This will become important to understand for Experiment Four.