Assignments for CS 241 | ||
---|---|---|
← Assignment 2 | Assignment 3 | Assignment 4 → |
Friday, June 9th, 5:00 pm | ||
P1 • P2 • P3 • P4 • P5 |
On this assignment, you will write an assembler for the CS 241 dialect of MIPS assembly language.
Your assembler can be written in either C++ or Racket.
For both languages, starter code implementing a scanner is provided. See the Starter Code section below.
All problems on this assignment have some commonalities in terms of requirements. Please read the Assignment Overview and Requirements section below before starting on the problems.
If you've already read it, click
here to skip to the problem statements.
Testing this assignment can be tricky as there are lot of requirements, and some students get confused about the required output format. There is a lengthy section on Testing your Assembler at the end of this page.
The first stage of writing an assembler is scanning, transforming the input from a sequence of individual characters into meaningful chunks called tokens for easier processing. However, scanning is not covered until Module 3, and it takes a fair amount of work to write a scanner. Thus, we are providing starter code that implements a scanner for MIPS Assembly Language.
If using C++, you can compile and run the starter code with the following commands:
$ g++ -g -std=c++17 -Wall asm.cc scanner.cc -o asm $ ./asm < input.asm
Run the code as follows:
$ racket asm.rkt < input.asm
You do not need to pass the scanning.rkt
file
on the command line. As long as it is in the same directory
as asm.rkt
, Racket should pick it up.
scanner.cc
and
scanner.h
.
Token::toNumber
and in Racket it is called token-int-value
. You do not have
to write such a function yourself. However, read the comments for this
function carefully; the way it handles certain values may be unintuitive
to you (especially in C++, where numbers have a limited range).This assignment is broken down into five problems, where you implement different aspects of the MIPS assembler.
By Problem 5, your assembler must implement all the features specified on the following page:
This page specifies the form of a valid MIPS Assembly Language program. A program that deviates from this form is considered invalid.
For Problems 1 to 3, your assembler will not receive invalid input, and does not need to perform error checking. However, your assembler should never crash. Even if the crash happens after producing the correct output, you will fail the tests.
If using C++, we will also use Valgrind to check whether your assembler has any memory leaks or memory-related errors. Note that Valgrind does not only check for leaks, but also errors like use of uninitialized variables, or accessing out-of-range indices in a vector or array. If Valgrind detects any errors or leaks, you will fail the tests. A guide for debugging Valgrind errors is available.
For Problems 4 and 5, your assembler must perform error checking. The requirement that your program does not crash or encounter memory errors or leaks still applies in the case of invalid inputs.
Your assembler should produce an error message when it receives an invalid program. To ensure Marmoset can correctly detect that you produced an error message and exited normally, you must follow the following requirements:
std::cerr
for this. In Racket,
you can use eprintf
, or specify
(current-error-port)
as the output port for a
function such as displayln
that takes an optional
output port argument.error
function.
This will raise an exception of type exn:fail
, which
is also used by many standard library functions to report errors.
Thus, if you use this function, Marmoset cannot tell if your
program crashed or if you just intended to report an error.
Instead of using error
, you can either print to
standard error and then exit normally with exit
, or
you can use raise-user-error
,
which Marmoset has special handling for.
If your program takes quadratic time to run with respect to a measure of the "size" of the input (number of instructions, number of labels, number of characters, number of lines, etc.) then it is likely you will fail some tests.
Linear time or O(n log n) time (where n is the "size") is okay. Be sure to choose efficient data structures and use efficient algorithms. You are free to use whatever data structures you want from the standard library of the language you are using.
cs241.binasm
. They just need to contain the
string ERROR (case sensitive) somewhere, and be printed
on standard error.The expected filename of the main program for all problems is:
You can submit additional files to Marmoset (and you will need to if using the provided starter code, which we recommend!) The files must be combined together in a zip file. The main program should appear in the top level of the zip file, not in a subdirectory.
On the student Linux system, the marmoset_submit
command will
combine your files into a zip file for you if you just specify the files on
the command line. For example:
$ marmoset_submit cs241 A3P1 asm.cc scanner.cc scanner.h $ marmoset_submit cs241 A3P1 asm.rkt scanner.rkt
If you create additional files as part of your solution, you can just add them to the command.
Ultimately, however, you are expected to develop your solution on your own systems, and use the student Linux system only for final testing. Extensions will not be granted if the student Linux system goes down so long as the web tools and Marmoset remain available. You are expected to be able to develop code on your own systems. If using Windows, it would be wise to set up WSL and develop in the Linux environment that provides.
Note: From this problem onwards for the remainder of the course, when an assignment uses the description “release+secret”, what this means is that release tests are worth no marks, but each release test has a similar (but not identical) secret test which is worth marks. The goal of release+secret testing is to ensure that you test for yourself and don't overspecialize to our test cases. When marks are described just as “secret”, this means that there are secret test cases which are unrelated to release tests, and you must consider the full range of testing possibilities on your own for such marks, with no feedback.
Implement the .word
directive with numeric operands
(decimal and hexadecimal integers). That is, your assembler should
handle things like .word 241
and .word 0xf1
but not .word label
yet.
You will not be tested with invalid programs or programs that contain features you have not been asked to implement yet.
Although you do not need to implement labels until Problem 3, you might want to think about how you would implement them ahead of time, so that you do not need to do a major redesign later.
Implement the add
instruction,
and the beq
instruction with
immediate branch offsets (decimal and hexadecimal integers).
That is, your assembler should handle things like add $3, $2, $1
,
and beq $2, $4, -1
,
and beq $2, $4, 0xffff
,
but not beq $0, $0, label
yet.
You will not be tested with invalid programs or programs that contain features you have not been asked to implement yet.
Implement labels. This means:
label
:
.word
directive,
like .word label
.
beq
instruction, like beq $0, $0, label
.
You will not be tested with invalid programs or programs that contain features you have not been asked to implement yet.
Note that in a valid program, the places where label definitions can appear are limited. Each line can start with a sequence of zero or more label definitions. Once another type of token appears (such as an instruction identifier) there can be no more label definitions on that line.
Implement error checking. You will be tested with invalid programs.
You will not be tested with programs containing instructions you have not been asked to implement yet, meaning the instructions you implement in Problem 5. That does not mean all identifiers that appear in a “instruction position” will be valid. For example, the test programs will not contain lines like this, because this line uses an unimplemented instruction:
sub $3, $2, $1
But something like this, which is not a valid instruction at all, is fair game:
meow $3, $2, $1
Implement all the other instructions. Your assembler should correctly assemble all valid programs, and correctly reject all invalid programs.
The public and release tests for this problem only contain valid programs. You can earn some marks on this problem even if your error checking is not working.
For valid programs, your assembler should work exactly the same
as cs241.binasm
.
Thus, you can test your assembler by comparing the output with
cs241.binasm
using diff
. To do this, if the
output of cs241.binasm
is expected.mips
:
$ ./asm < input.asm > output.mips $ diff output.mips expected.mips
If using Racket, write racket asm.rkt
instead of ./asm
.
If the output is identical, then diff
will not output
anything. If it is different, it will say something like
Binary files output.mips and expected.mips differ
.
This is not very useful for figuring out what is different
about the two files. For this, try using cs241.binview
to
display the binary data in the files:
$ cs241.binview output.mips 00000000 00000000 00000000 00101010 $ cs241.binview expected.mips 00000000 00000000 00000000 11110001
This works fine for small files, but for larger files, it will be hard
to spot the differences. You may consider using cs241.binview
's
hexadecimal output option (-h
) for easier reading of larger files:
$ cs241.binview -h output.mips 00411820 00430822 8C84FFFC 03E00008 $ cs241.binview -h expected.mips 00411820 00411822 8C84FFFC 00000008
In the above example, it is a bit easier to see that the second and fourth lines are different, compared to if binary was used.
Alternatively, on the student Linux server, using Bash's process
substitution feature, we can display a
side-by-side diff
of the cs241.binview
output in a single command:
$ diff -W 75 -y <(cs241.binview output.mips) <(cs241.binview expected.mips) 00000000 01000001 00011000 00100000 00000000 01000001 00011000 00100000 00000000 01000011 00001000 00100010 | 00000000 01000001 00011000 00100010 10001100 10000100 11111111 11111100 10001100 10000100 11111111 11111100 00000011 11100000 00000000 00001000 | 00000000 00000000 00000000 00001000
The -y
option tells diff
to use side-by-side mode,
and the -W 75
option limits the width of the output to 75 columns
so that
it is more condensed. The output.mips
file appears on the left,
and the expected.mips
file appears on the right.
The bar characters (|
) indicate which lines
of the files are different.
Your error messages can include other text, as long
as they include ERROR, and this is recommended as it will make
debugging easier. Your error messages do not have to match the
ones produced by cs241.binasm
.
To check that your error messages are sent to standard error, redirect standard output to a file. Since standard error is sent to the terminal by default, if you redirect standard output to a file, then only standard error will be displayed.
$ ./asm < valid.asm > output.mips $ ./asm < invalid.asm > output.mips ERROR: Register operand $241 is out of range
For a valid program, you should see no output on the terminal if you're redirecting it to a file. For an invalid program, you should see some output on the terminal if you're properly sending it to standard error.
If you don't see your error message, open the output.mips file and check if it was accidentally sent to standard output. If not, then maybe your error detection is just incorrect.
If you want an automated way to check your error message was produced correctly (for example, for use in a Bash script) you can redirect standard error to a file:
$ ./asm < invalid.asm > output.mips 2> error.txt $ grep "ERROR" error.txt ERROR: label operand looop is undefined
The grep
command there will print all lines
in error.txt that contain ERROR. If such a line exists,
it will return a zero exit code (indicating success).
Otherwise, it will print nothing and return a non-zero
exit code (indicating failure). You could use this to
check if a error test case passed in an automated test suite.
If using C++, testing your program with Valgrind is recommended, as this matches how Marmoset tests your program. Testing with Valgrind will also help you spot subtle memory errors in your program, and debug things like segmentation faults.
Compiling your code with the -g
option will cause Valgrind
to produce more detailed information for debugging, like the line numbers
where the errors occur.
$ g++ -g -std=c++17 -Wall asm.cc scanner.cc -o asm
When running your code, prepend the executable name with valgrind
to use Valgrind.
$ valgrind ./asm < input.asm > output.mips
Valgrind will slow down your program due to the extra checks it performs. The slowdown is significant in some cases, especially for large inputs. The Marmoset tests do attempt to account for this slowdown when evaluating your program's efficiency.
To test the efficiency of your program, test your program with increasingly
larger and larger inputs and time how long it takes. You can use the
time
command for this.
$ time ./asm < input.asm > output.mips
Try repeatedly doubling the size of the input until you start noticing a significant difference in the running times between each run. Are the times roughly doubling as well? If so, that's a good sign. If the times begin increasing at an alarming, more-than-double rate, your program might be quadratic time.