ASCII & Binary Codes

1.  Overview

An SPSS systems file, one that is created with the SPSS Editor, stores data in two different formats: the American Standard Code for Information Interchange (ASCII) and binary numeric code.  All text information (e.g., variable names, variable labels, value labels, and the values of variables of data type text) is stored using ASCII code while numeric information (e.g., variables of data type numeric) is stored using binary numeric code.

A word processor stores all information using ASCII.  Word processing files are great for storing and manipulating text information, but ASCII codes are not convenient if you want to perform mathematical computations on numbers.

This section describes how letters, numbers, and symbols are translated into ASCII and how numeric information is translated into binary code. We will also look binary arithmetic and the difficulties of doing arithmetic with ASCII values.

Finally, we will discuss why it is not a good idea to try to edit an SPSS systems file using a word processor.

2.  Bits, bytes, and ASCII Code

The memory in your computer can be thought of as a linear series of points that are either "on" or "off". Each point in the series is called a bit. In the following linear series of numbers a 0 represents the "off" state and a "1" represents the on state for each bit on information.

 01001000 01100101 01101100 01101100 01101111

The challenge for early computer makers was to devise a way to store information in the computer using bits of information. Think about devising a coding scheme that using only bits (0 and 1) for all the characters that appear on a standard typewriter keyboard. If you wanted to code the alphabet from A to Z in both upper and lower case you would need 52 different codes. Add in the numbers from 0 to 9 and you would need 62 different codes. Add in all the symbols above the numbers and you have would need 72 different codes. Add in the miscellaneous other characters on the keyboard (e.g., period, comma, question, etc.) and you need another 20 or so codes for a total of at least 92 different codes.

How many bits would you need to code those 94 different characters, numbers, and symbols? If you have one bit, you could code two characters, perhaps A and B. If you you had two bits you could code four characters. Table 1 shows how many characters you can code for given number of bits.

 # of bits number of codes possible Binary codes 1 2 0 1    (0 through 1) 2 4 00 01    (0 through 1) 10 11    (2 through 3) 3 8 000 001 010 011    (0 through 3) 100 101 110 111    (4 through 7) 4 16 0000 0001 0010 0011 0100 0101 0110 0111   (0 through 7) 1000 1001 1010 1011 1100 1101 1110 1111    (8 through 15) 5 32 00000 00001 00010 00011 ... 01111    (0 through 15) 10000 10001 10011 10011 ... 11111    (16 through 31) 6 64 000000 000001 000010 000011 ... 011111    (0 through 31) 100000 100001 100010 100011 ... 111111     (32 through 63) 7 128 0000000 0000001 0000010 ... 0111111    (0 through 63) 1000000 1000001 1000011 ... 1111111    (64 through 127) 8 256 00000000 00000001 00000010 ... 01111111   (0 through 127) 10000000 10000001 10000010 ... 11111111   (128 through 256)

You would need 7 bits just to code all the characters on the typewriter keyboard. You would have a few code numbers to spare. It is easier for the computer to keep track of a series of 8 bits rather than 7 bits so the ASCII codes are actually stored in 8 consecutive bits. This leaves room for an additional 128 characters to be coded. Eight bits are called a byte. The ASCII codes developed for the English language are shown in Table 2.

 0 1 2 3 4 5 6 7 8 9 0 \bell \back space \tab 10 \newline \vertical tab \form feed \return 20 30 space ! " # \$ % & ' 40 ( ) * + , - . / 0 1 50 2 3 4 5 6 7 8 9 : ; 60 < = > ? @ A B C D E 70 F G H I J K L M N O 80 P Q R S T U V W X Y 90 Z [ \ ] ^ _ ` a b c 100 d e f g h i j k l m 110 n o p q r s t u v w 120 x y z { | } ~ \177

The codes are read by adding the column indicator to the row indicator. For example, the capital letter A is in row 60 and column 5, therefore its ASCII code is 60 + 5 or, 65. The ASCII code for the lower case letter a is 97.

Table 3 shows the ASCII codes for the word Hello in both decimal and binary representation

 H e l l o decimal 72 101 108 108 111 binary 01001000 01100101 01101100 01101100 01101111

The word "Hello" is stored in the memory of your computer in 5 consecutive bytes of memory, or 40 consecutive bits of memory (5 x 8 bits = 40 bits). The string of 40 bits (5 bytes) at the beginning of this section is the computer representation of "Hello."

3.  Binary numbers

How do you convert decimal numbers to binary numbers?  Consider an eight-bit binary number.  Each of the  eight bits can be either on (1) or off (0).  The Table 4 shows the relationship between the position of a bit and the decimal value for that bit if it is on.  Do you see the pattern for the decimal values of each of the bits?

 Bit Position 8 7 6 5 4 3 2 1 Decimal Value of each bit 128 64 32 16 8 4 2 1 e.g. 1 0 0 0 0 0 0 0 1 = ? e.g. 2 0 0 0 0 1 0 0 0 = ? e.g. 3 0 0 0 0 0 1 1 1 = ? e.g. 4 0 1 0 1 0 1 0 1 = ? e.g. 5 = 72 e.g. 6 = 22

To find the decimal value of a binary number you merely sum the values of the bits that are on. The decimal value of example example 1 (00000001)  is 1. The decimal value of example 2 (00001000) is 8.

The decimal value for #3 (00000111) = _________.
The decimal value for #4 (01010101) = _________.

Can you figure out how to reverse the process?  What is the binary representation of the decimal number 72 (example #5 in the above table)?  To solve this problem find the highest bit whose decimal representation is equal to or smaller than 72 and set that bit to 1.   In this example set the bit at position 7 to on. That bit represents the decimal value 64, so you have represented 64 units of the number 72.  Subtracting 64 from 72 leaves 8 units. Turn on the highest bit that that is equal to or smaller than 8.  That is, set the bit at position 4 to on.  The bit at position 4 represents the decimal value 8.  So, the binary representation of the decimal number 72 is 01001000.

The binary value for the decimal number 22 = _________.

Numeric values are stored in several consecutive bytes.  Why?  What is the range of numbers that you could code if you only used 8 bits?   Looking back at Table 1 see that we could store 256 different values, that is the numbers from 0 to 255. Lets expand that table somewhat.

 Number of bits(bytes) Number of values that can be coded 8(1) 2**7 =  256 16(2) 2**15 =  32,768 32(4) 2**31 = 2,147,483,648 64(8) 2**63 = 9.223372E18

SPSS uses several consecutive bytes to store numeric values.  It appears to allot the space to value dynamically, depending upon the number of digits.

As was mentioned earlier, it is easy to do math on binary numbers.  Lets add together 2 + 2 using binary arithmetic.

 Bit Position 8 7 6 5 4 3 2 1 Decimal Value of each bit 128 64 32 16 8 4 2 1 2 0 0 0 0 0 0 1 0 + 2 0 0 0 0 0 0 1 0 = 4 0 0 0 0 0 1 0 0

To understand this think back to the decimal number system. If I have the value 9 and add 1 to it, the 9 turns to 0 and a 1 is carried to left one digit, resulting the value 10.  It works the same in the binary system, except that the instead of having the values 0 through 9 you only have the values 0 through 1.  If I have a binary value 1 and add 1 to it, the 1 turns to a 0 and a 1 is carried to the left one digit.

Compare the simplicity of that system with trying to arithmetic on ASCII values. According to Table 2 the ASCII value of 2 is 50.  So the binary representation of the ASCII value of 2 is 00110010. What happens if you try to add together the ASCII values of 2 + 2?

 Bit Position 8 7 6 5 4 3 2 1 Decimal Value of each bit 128 64 32 16 8 4 2 1 2 0 0 1 1 0 0 1 0 + 2 0 0 1 1 0 0 1 0 = d 0 1 1 0 0 1 0 0

The result is the binary value 01100100, which is the decimal value 100 which is the ASCII representation for the letter small "d."  So, if you tried to add together 2 + 2 using their ASCII values you would get "d."

5.  A Caution

To recap.  When you create a data file using the SPSS Data Editor numeric values are stored using binary code and string variables are stored using ASCII code. The SPSS Data Editor uses the code that is most appropriate for the type of data that is being entered.  This can be expressed in the form of an analogy

ASCII : String :: binary : numeric

The contents of the file can be read directly into the memory of the computer and appropriate computations (numeric or string) can be easily carried out by the computer. If you create a data file using a word processor then both numeric and string values are coded using ASCII code. The computer must first translate the ASCII values of the numeric variables into binary code prior to running any computations on those values.

You should not try to edit a data file created by the SPSS Data Editor with a word processor. The word processor only knows ASCII, so it will try to interpret the binary values of numeric variables as ASCII code.  The results will look strange in the word processor.  If you try to edit the file you may change a byte in the middle of a numeric variable, drastically changing it value.  Or worse, you might delete a byte resulting in the shifting over of all remaining bytes in the data file changing the values for all of the following data. You can only edit a data file created by the SPSS Data Editor with the Data Editor itself.

ŠLee A. Becker, 1997-1999 -revised 09/01/99