Deadline | Friday, September 15, 11:59pm |
Name on Marmoset | Q2 |
To Submit |
bits.cc
OR
bits.rkt
|
Write a C++ or Racket program that reads a sequence of
0 and 1 ASCII characters from standard input,
and writes the corresponding sequence of bits to
standard output.
How many bits are used to represent this sequence of 0s and 1s?
The total number of 0 and 1 characters in this
sequence is 8. But each 0 and 1 character itself
is encoded using multiple bits! This web page uses ASCII
character encoding, which traditionally used 7 bits per
character, but nowadays uses 8 bits (1 byte) per character.
Therefore, 64 bits (8 bytes) are actually used to represent
this sequence of 0s and 1s. You could confirm this for
yourself by downloading this HTML file and examining it in
a binary viewer.
This raises a question: What if we wanted to create a file
that contains exactly the 8 bits 11110001?
You wouldn't be able to do it by just typing 0s and 1s into
a text editor, because the text editor will save each 0 and 1
using ASCII or some other text encoding, rather than
translating them directly to individual bits.
You may have found a way to do this for a single byte
if you already solved the Goose question,
but was your method convenient for longer sequences of bytes?
The goal of this question is to write a specialized tool
for this task, which lets us directly type 0s and 1s and
produces the corresponding bit sequence.
Why? (Motivation)
11110001
A reference implementation of this question called
First execute the command
After running the setup command, you can use cs241.bits
is available, both as a web tool and in the Linux environment.
Using
cs241.bits
in the Linux environmentsource /u/cs241/setup
.
This command does not create or copy any folders or files. It simply sets some variables that allow you
to use CS241 tools by typing their names directly on the command line.
cs241.bits
by just typing its name. For example:
cs241.bits <<< "01001000011010010010000100001010" > output.bin
No.
On most modern operating systems, files are stored as sequences of bytes and it is not actually possible to create a file whose bit length is not a multiple of 8. Programming languages on these operating systems do not allow users to output data in chunks smaller than one byte.
If you want your program to support input lengths that are not a multiple of 8, a reasonable choice would be to either discard incomplete bytes or pad incomplete bytes with 0s, and print a warning for the user. However, we won't test your program with input lengths that are not a multiple of 8, so this is not required.
The files we test your program with will not contain any characters other than 0 or 1. However, files that you create for your own testing might accidentally contain additional characters such as trailing newlines (text editors on Linux such as Vim and Nano typically add a newline to the end of files automatically). Thus, we recommend you design your program to ignore these characters, though it is not required.
If the input sequence is
0100000100001010Your program should output the character
A
followed by a line feed character (newline).
The binary sequence
If you properly convert the input sequence of ASCII 0s and 1s
to a bit sequence, then your program should output
two bytes.
Those two bytes match the ASCII encodings of
Please ask questions on Piazza if you are confused!
The concepts at play here are fundamental to the course.
Wait, what?
01000001
happens to be the
ASCII encoding of uppercase A
, and the binary
sequence 00001010
happens to be
the ASCII encoding of a line feed character.
A
and line feed, and your terminal will display
those two bytes according to their ASCII interpretation.
If using C++, compile your program as follows:
g++ -g -Wall -std=c++17 bits.cc -o bitsThe
echo
command with the -n
option (suppress newline) can be used to produce an
input sequence of 0s and 1s without any extraneous characters.
For example:
echo -n "0100000100001010" | ./bits A echo -n "0100001100001010" | ./bits CIf using Racket, replace
./bits
with racket bits.rkt
.
The <cstdio>
header contains functions
std::getchar
and std::putchar
which respectively read and write a single byte (to
standard input/output).
Somewhat counterintuitively, the std::getchar
function returns an int
instead of a
char
. This is because the char
type can only hold a single byte. The
std::getchar
function can either return a
single byte read from standard input, or the
constant EOF
in case of failure. To distinguish
between valid bytes and EOF
, the
int
return type is used.
If you prefer C++ stream I/O, you can use
std::cin
with the std::noskipws
modifier to read one byte at a time.
Racket has
read-byte
and write-byte
functions. In Racket, a "byte" is defined as an
integer in the range 0 to 255 (inclusive), so you
can work with bytes in the same way you'd work with
numbers in that range.