CS 100 (Learn)CS 100 (Web)Module 01


ASCII

(direct YouTube link)

NOTE: If your internet access is restricted and you do not have access to YouTube, we have provided alternate video links.

TRANSCRIPT

In this video we're going to move beyond the binary and number representations and start working with letters and characters.

So as a quick review, we know that computers can store and transmit digital information in binary (as ones and zeros) and that we can use binary to represent numbers.

To start, all we need to do is start assigning numbers to the different letters.

If I sat you down and asked you to come up with a way to number the different letters of the alphabet, you would probably come up with a scheme like 'A' is one, 'B' is two, 'C' is three, and so on.

What about lowercase letters? What about punctuation such as periods, spaces and exclamation marks? It wouldn't be a problem -- you would just start assigning larger and larger numbers to those symbols.

But what if I approached a friend of yours and gave them the same task? Would they come up with the exact same numbering system? Probably not.

In the early days of digital communication, this was a problem. Different organizations were using different numbering systems.

So in the nineteen sixties [1960s] a committee of white dudes got together and they developed an "official" numbering system. It was called the American Standard Code for Information Exchange or more commonly known as the acronym ASCII.

Before we continue, I want to introduce a terminology we use in computers: a character is any letter or punctuation mark or symbol, or almost anything you could type on a keyboard -- even emojis, which in Japanese means "picture character". The word "character" comes from the same Greek roots as "making cake" (which sounds awesome and delicious) but it was really more related to making coins and how letters and symbols were printed on early coins.

So what ASCII does is assign each character a number, and this is what it looks like:

32 space  48 0      64 @      80 P       96 `      112 p  
33 !      49 1      65 A      81 Q       97 a      113 q  
34 "      50 2      66 B      82 R       98 b      114 r  
35 #      51 3      67 C      83 S       99 c      115 s  
36 $      52 4      68 D      84 T      100 d      116 t  
37 %      53 5      69 E      85 U      101 e      117 u  
38 &      54 6      70 F      86 V      102 f      118 v  
39 '      55 7      71 G      87 W      103 g      119 w  
40 (      56 8      72 H      88 X      104 h      120 x  
41 )      57 9      73 I      89 Y      105 i      121 y  
42 *      58 :      74 J      90 Z      106 j      122 z  
43 +      59 ;      75 K      91 [      107 k      123 {  
44 ,      60 <      76 L      92 \      108 l      124 |  
45 -      61 =      77 M      93 ]      109 m      125 }  
46 .      62 >      78 N      94 ^      110 n      126 ~  
47 /      63 ?      79 O      95 _      111 o

So upper case A is sixty five, a lower case a is ninety seven. A space is thirty two.

In the nineteen sixties [ 1960s ] these were all the printable characters that were being used on teletype machines. You are probably far too young to know what a teletype machine was but it was used to communicate between newspapers and government offices and financial institutions to send information back and forth.

You'll notice that this table starts at thirty two. The characters numbered zero through thirty-one [0 ... 31] are missing, as well as the number one hundred and twenty-seven [127]. That is because those were special "control characters" that had a special meaning for teletype machines. For example, the number seven would make a little bell noise, and the number ten was a "line feed", which made the paper advance a line. In the early days, if you wanted to troll someone you could send ten seven ten seven ten seven ten seven and their teletype machine would make lots of noise and waste paper.

Most of the control characters are obsolete now, but some are still used. When you hit "enter" on your keyboard, it often represented as a line feed (ten).

One interesting thing to point out with the ASCII table is this is how alphabetical order is determined on computers. The exclamation mark is the first visible character in alphabetical order.

You may have seen this trick before: some people put extra A's at the start of their contact names to make sure they appear at the top of their list (like "AAAMom"). You can now see that a better choice is the exclamation mark. You can try putting an exclamation mark at the front of your name in some apps like chat clients so you'll appear at the top.

One final note about ASCII is that the code goes from zero to one hundred and twenty-seven [0 ... 127], and so you only need seven bits per character to represent an ASCII code. They conveniently fit in one byte with a leftover bit that could be used to detect errors. Since each code is one byte, you can them as two hex letters, and you often see that in ASCII tables. In fact, the numbering system makes a lot more sense in hex.

This may be a spoiler, but in the two thousand and fifteen movie "The Martian", Matt Daemon plays the character Mark Watney who is stranded on Mars and needs a way to communicate with Earth. Initially, the only thing that the earth scientists can do is rotate a camera around and so Watney sets up the sixteen hex digits in a circle. The scientists on earth could then move the camera to point at the hex characters in sequence. Each pair of hex characters was an ASCII character, which Watney could then look up in a table and and receive text messages from Earth.

Who knows? Maybe learning ASCII and hex could one day save your life.