CS 241 — Spring 2022 — Assignment 3

Due Friday, June 3, 5:00 pm / Out of 10 marks

On this assignment, you will write an assembler for the CS 241 dialect of MIPS assembly language.

Your assembler can be written in either C++ or Racket.

For both languages, starter code implementing a scanner is provided. See the Starter Code section below.

All problems on this assignment have some commonalities in terms of requirements. Please read the Assignment Overview and Requirements section below before starting on the problems.

If you've already read it, click here to skip to the problem statements.

Testing this assignment can be tricky as there are lot of requirements, and some students get confused about the required output format. There is a lengthy section on Testing your Assembler at the end of this page.

Starter Code

The first stage of writing an assembler is scanning, transforming the input from a sequence of individual characters into meaningful chunks called tokens for easier processing. However, scanning is not covered until Module 3, and it takes a fair amount of work to write a scanner. Thus, we are providing starter code that implements a scanner for MIPS Assembly Language.

C++ Scanner

Racket Scanner

Using the C++ Starter Code

If using C++, you can compile and run the starter code with the following commands:

$ g++ -g -std=c++17 -Wall asm.cc scanner.cc -o asm
$ ./asm < input.asm

Using the Racket Starter Code

Run the code as follows:

$ racket asm.rkt < input.asm

You do not need to pass the scanning.rkt file on the command line. As long as it is in the same directory as asm.rkt, Racket should pick it up.

Notes

Assignment Overview and Requirements

This assignment is broken down into five problems, where you implement different aspects of the MIPS assembler.

By Problem 5, your assembler must implement all the features specified on the following page:

The MIPS Assembly Language Specification

This page specifies the form of a valid MIPS Assembly Language program. A program that deviates from this form is considered invalid.

For Problems 1 to 3, your assembler will not receive invalid input, and does not need to perform error checking. However, your assembler should never crash. Even if the crash happens after producing the correct output, you will fail the tests.

If using C++, we will also use Valgrind to check whether your assembler has any memory leaks or memory-related errors. Note that Valgrind does not only check for leaks, but also errors like use of uninitialized variables, or accessing out-of-range indices in a vector or array. If Valgrind detects any errors or leaks, you will fail the tests. A guide for debugging Valgrind errors is available.

For Problems 4 and 5, your assembler must perform error checking. The requirement that your program does not crash or encounter memory errors or leaks still applies in the case of invalid inputs.

Your assembler should produce an error message when it receives an invalid program. To ensure Marmoset can correctly detect that you produced an error message and exited normally, you must follow the following requirements:

Efficiency Requirement

If your program takes quadratic time to run with respect to a measure of the "size" of the input (number of instructions, number of labels, number of characters, number of lines, etc.) then it is likely you will fail some tests.

Linear time or O(n log n) time (where n is the "size") is okay. Be sure to choose efficient data structures and use efficient algorithms. You are free to use whatever data structures you want from the standard library of the language you are using.

Clarifications

Problems and Submission Requirements

The expected filename of the main program for all problems is:

You can submit additional files to Marmoset (and you will need to if using the provided starter code, which we recommend!) The files must be combined together in a zip file. The main program should appear in the top level of the zip file, not in a subdirectory.

The marmoset_submit command will combine your files into a zip file for you if you just specify the files on the command line. For example:

$ marmoset_submit cs241 A3P1 asm.cc scanner.cc scanner.h
$ marmoset_submit cs241 A3P1 asm.rkt scanner.rkt

If you create additional files as part of your solution, you can just add them to the command.

Problem 1: One Directive [1 mark]

Implement the .word directive with numeric operands (decimal and hexadecimal integers). That is, your assembler should handle things like .word 241 and .word 0xf1 but not .word label yet.

You will not be tested with invalid programs or programs that contain features you have not been asked to implement yet.

Although you do not need to implement labels until Problem 3, you might want to think about how you would implement them ahead of time, so that you do not need to do a major redesign later.

Problem 2: Two Instructions [1 mark]

Implement the add instruction, and the beq instruction with immediate branch offsets (decimal and hexadecimal integers).

That is, your assembler should handle things like add $3, $2, $1, and beq $2, $4, -1, and beq $2, $4, 0xffff, but not beq $0, $0, label yet.

You will not be tested with invalid programs or programs that contain features you have not been asked to implement yet.

Problem 3: Labels [2 marks]

Implement labels. This means:

You will not be tested with invalid programs or programs that contain features you have not been asked to implement yet.

Note that in a valid program, the places where label definitions can appear are limited. Each line can start with a sequence of zero or more label definitions. Once another type of token appears (such as an instruction identifier) there can be no more label definitions on that line.

Problem 4: Error Checking [3 marks, 1 secret]

Implement error checking. You will be tested with invalid programs.

You will not be tested with programs containing instructions you have not been asked to implement yet, meaning the instructions you implement in Problem 5. That does not mean all identifiers that appear in a “instruction position” will be valid. For example, the test programs will not contain lines like this, because this line uses an unimplemented instruction:

sub $3, $2, $1

But something like this, which is not a valid instruction at all, is fair game:

meow $3, $2, $1

Problem 5: The Rest of the Assembler [3 marks, 1 secret]

Implement all the other instructions. Your assembler should correctly assemble all valid programs, and correctly reject all invalid programs.

The public and release tests for this problem only contain valid programs. You can earn some marks on this problem even if your error checking is not working.

Testing your Assembler

Valid Programs

For valid programs, your assembler should work exactly the same as cs241.binasm. Thus, you can test your assembler by comparing the output with cs241.binasm using diff:

$ ./asm < input.asm > output.mips
$ cs241.binasm < input.asm > expected.mips
$ diff output.mips expected.mips

If using Racket, write racket asm.rkt instead of ./asm.

If the output is identical, then diff will not output anything. If it is different, it will say something like Binary files output.mips and expected.mips differ.

This is not very useful for figuring out what is different about the two files. For this, try using cs241.binview to display the binary data in the files:

$ cs241.binview output.mips
00000000 00000000 00000000 00101010
$ cs241.binview expected.mips
00000000 00000000 00000000 11110001

This works fine for small files, but for larger files, it will be hard to spot the differences. You may consider using cs241.binview's hexadecimal output option (-h) for easier reading of larger files:

$ cs241.binview -h output.mips
00411820
00430822
8C84FFFC
03E00008
$ cs241.binview -h expected.mips
00411820
00411822
8C84FFFC
00000008

In the above example, it is a bit easier to see that the second and fourth lines are different, compared to if binary was used.

Alternatively, using Bash's process substitution feature, we can display a side-by-side diff of the cs241.binview output in a single command:

$ diff -W 75 -y <(cs241.binview output.mips) <(cs241.binview expected.mips)
00000000 01000001 00011000 00100000     00000000 01000001 00011000 00100000
00000000 01000011 00001000 00100010  |  00000000 01000001 00011000 00100010
10001100 10000100 11111111 11111100     10001100 10000100 11111111 11111100
00000011 11100000 00000000 00001000  |  00000000 00000000 00000000 00001000            

The -y option tells diff to use side-by-side mode, and the -W 75 option limits the width of the output to 75 columns so that it is more condensed. The output.mips file appears on the left, and the expected.mips file appears on the right. The bar characters (|) indicate which lines of the files are different.

Invalid Programs

For invalid programs, you must print an error message to standard error, and it must contain the string ERROR in all caps.

Your error messages can include other text, as long as they include ERROR, and this is recommended as it will make debugging easier. Your error messages do not have to match the ones produced by cs241.binasm.

To check that your error messages are sent to standard error, redirect standard output to a file. Since standard error is sent to the terminal by default, if you redirect standard output to a file, then only standard error will be displayed.

$ ./asm < valid.asm > output.mips
$ ./asm < invalid.asm > output.mips
ERROR: Register operand $241 is out of range

For a valid program, you should see no output on the terminal if you're redirecting it to a file. For an invalid program, you should see some output on the terminal if you're properly sending it to standard error.

If you don't see your error message, open the output.mips file and check if it was accidentally sent to standard output. If not, then maybe your error detection is just incorrect.

If you want an automated way to check your error message was produced correctly (for example, for use in a Bash script) you can redirect standard error to a file:

$ ./asm < invalid.asm > output.mips 2> error.txt
$ grep "ERROR" error.txt
ERROR: label operand looop is undefined

The grep command there will print all lines in error.txt that contain ERROR. If such a line exists, it will return a zero exit code (indicating success). Otherwise, it will print nothing and return a non-zero exit code (indicating failure). You could use this to check if a error test case passed in an automated test suite.

Valgrind and C++

If using C++, testing your program with Valgrind is recommended, as this matches how Marmoset tests your program. Testing with Valgrind will also help you spot subtle memory errors in your program, and debug things like segmentation faults.

Compiling your code with the -g option will cause Valgrind to produce more detailed information for debugging, like the line numbers where the errors occur.

$ g++ -g -std=c++17 -Wall asm.cc scanner.cc -o asm

When running your code, prepend the executable name with valgrind to use Valgrind.

$ valgrind ./asm < input.asm > output.mips

Valgrind will slow down your program due to the extra checks it performs. The slowdown is significant in some cases, especially for large inputs. The Marmoset tests do attempt to account for this slowdown when evaluating your program's efficiency.

Efficiency Testing

To test the efficiency of your program, test your program with increasingly larger and larger inputs and time how long it takes. You can use the time command for this.

$ time ./asm < input.asm > output.mips

Try repeatedly doubling the size of the input until you start noticing a significant difference in the running times between each run. Are the times roughly doubling as well? If so, that's a good sign. If the times begin increasing at an alarming, more-than-double rate, your program might be quadratic time.