Question 2

Deadline Friday, September 15, 11:59pm
Name on Marmoset Q2
To Submit bits.cc OR bits.rkt

From Characters To Bits

Write a C++ or Racket program that reads a sequence of 0 and 1 ASCII characters from standard input, and writes the corresponding sequence of bits to standard output.

Why? (Motivation)

How many bits are used to represent this sequence of 0s and 1s?

11110001

The total number of 0 and 1 characters in this sequence is 8. But each 0 and 1 character itself is encoded using multiple bits! This web page uses ASCII character encoding, which traditionally used 7 bits per character, but nowadays uses 8 bits (1 byte) per character. Therefore, 64 bits (8 bytes) are actually used to represent this sequence of 0s and 1s. You could confirm this for yourself by downloading this HTML file and examining it in a binary viewer.

This raises a question: What if we wanted to create a file that contains exactly the 8 bits 11110001? You wouldn't be able to do it by just typing 0s and 1s into a text editor, because the text editor will save each 0 and 1 using ASCII or some other text encoding, rather than translating them directly to individual bits.

You may have found a way to do this for a single byte if you already solved the Goose question, but was your method convenient for longer sequences of bytes?

The goal of this question is to write a specialized tool for this task, which lets us directly type 0s and 1s and produces the corresponding bit sequence.

A reference implementation of this question called cs241.bits is available, both as a web tool and in the Linux environment.

Using cs241.bits in the Linux environment

First execute the command source /u/cs241/setup. This command does not create or copy any folders or files. It simply sets some variables that allow you to use CS241 tools by typing their names directly on the command line.

After running the setup command, you can use cs241.bits by just typing its name. For example:

cs241.bits <<< "01001000011010010010000100001010" > output.bin

Clarifications

Is there any error checking required for this problem?

No.

Why is the length of the input sequence always a multiple of 8?

On most modern operating systems, files are stored as sequences of bytes and it is not actually possible to create a file whose bit length is not a multiple of 8. Programming languages on these operating systems do not allow users to output data in chunks smaller than one byte.

If you want your program to support input lengths that are not a multiple of 8, a reasonable choice would be to either discard incomplete bytes or pad incomplete bytes with 0s, and print a warning for the user. However, we won't test your program with input lengths that are not a multiple of 8, so this is not required.

What if the input contains characters other than 0 and 1?

The files we test your program with will not contain any characters other than 0 or 1. However, files that you create for your own testing might accidentally contain additional characters such as trailing newlines (text editors on Linux such as Vim and Nano typically add a newline to the end of files automatically). Thus, we recommend you design your program to ignore these characters, though it is not required.

Examples & Hints

Input/Output Example

If the input sequence is

0100000100001010
Your program should output the character A followed by a line feed character (newline).

Wait, what?

The binary sequence 01000001 happens to be the ASCII encoding of uppercase A, and the binary sequence 00001010 happens to be the ASCII encoding of a line feed character.

If you properly convert the input sequence of ASCII 0s and 1s to a bit sequence, then your program should output two bytes. Those two bytes match the ASCII encodings of A and line feed, and your terminal will display those two bytes according to their ASCII interpretation.

Please ask questions on Piazza if you are confused! The concepts at play here are fundamental to the course.

Testing your Program

If using C++, compile your program as follows:

g++ -g -Wall -std=c++17 bits.cc -o bits
The echo command with the -n option (suppress newline) can be used to produce an input sequence of 0s and 1s without any extraneous characters. For example:
echo -n "0100000100001010" | ./bits
A
echo -n "0100001100001010" | ./bits
C
If using Racket, replace ./bits with racket bits.rkt.
Useful Functions: C++

The <cstdio> header contains functions std::getchar and std::putchar which respectively read and write a single byte (to standard input/output).

Somewhat counterintuitively, the std::getchar function returns an int instead of a char. This is because the char type can only hold a single byte. The std::getchar function can either return a single byte read from standard input, or the constant EOF in case of failure. To distinguish between valid bytes and EOF, the int return type is used.

If you prefer C++ stream I/O, you can use std::cin with the std::noskipws modifier to read one byte at a time.

Useful Functions: Racket

Racket has read-byte and write-byte functions. In Racket, a "byte" is defined as an integer in the range 0 to 255 (inclusive), so you can work with bytes in the same way you'd work with numbers in that range.