Assignment 6: Code Generation (Part B) and End-of-Term Quiz

In this assignment, you will build on your Assignment 5 compiler and implement code generation (i386 assembly, Intel dialect) for the complete Joos 1W language, including its object-oriented features.

You will also implement register allocation, either using linear scan or graph coloring. By default, your compiler should perform register allocation. Your joosc compiler should also provide a command-line option, --opt-none, that disables optimizations including register allocation. You will evaluate the performance improvements enabled by register allocation on your own benchmarks.

You are encouraged to implement optimizations at the IR level, such as constant propagation and folding, method inlining, and common subexpression elimination. Groups whose compilers are the most correct and generate the fastest code may be rewarded.

Completing the Quiz

As part of this assignment, you will complete an online quiz that will test your understanding of the course material and your ability to apply it to new situations.

You should complete the quiz individually, rather than as a group. Discussing the contents of the quiz with anyone beyond the course staff before the quiz due time is considered misconduct and can result in significant penalties per the University's Policy 71.

Details on how to access the quiz will be announced at a later time.

Report Submission

Submit to Marmoset a report.pdf accompanying your code submissions for Assignments 5 and 6. Your report should follow the guidelines. The report should not exceed eight pages (excluding any appendices).

Your report should include the results of your performance evaluation.

Code Submission

Submit to Marmoset a .zip archive. It should include everything required to build and run your project. In particular, the .zip file must contain a file called Makefile. Marmoset will run make on this Makefile to compile your compiler. The Makefile must generate an executable (binary or shell script) called joosc. The joosc executable must accept multiple filenames as arguments. All of the files listed on the joosc command line, and only those files, are considered part of the program being compiled.

Unlike javac , your joosc compiler should not look for classes in .class files on the CLASSPATH; it should read only the Joos 1W source files listed on the command line. This means that all classes, including classes such as java.lang.Object, must be available in source form and must be specified on the joosc command line. Unlike javac, Joos does not care what directory a source file is in (i.e. it does not require the directory structure of the source code to match the package structure). However, the class declared in a file must still have the same name as the filename. For example, Java would require that the class java.lang.Object be declared in the file Object.java in the directory java/lang, whereas Joos only requires the file to be named Object.java, but otherwise allows it to be in any directory.

For the purposes of this course, a minimalist version of the Java standard library is provided. This library can be found in the directory /u/cs444/pub/stdlib/6.0 in the linux.student.cs environment. Marmoset will include all files in this library on the joosc command line for every test, in addition to other source file(s) specific to that test. The following versioning scheme is used to make it possible to correct errors and/or to extend the library for future assignments (although we aim to minimize the number of changes that will be required). The 6 in the directory name refers to Assignment 6, and the 0 is the first version of the library. Any corrections to the Assignment 6 version of the library will appear in the directories 6.1, 6.2, etc.

As in previous assignments, joosc should process the Joos 1W files given on the command line, produce appropriate diagnostic messages on standard error, and exit with one of the following Unix return codes:

If the input program is valid Joos 1W, your compiler should output, into the subdirectory output of the current working directory, one or more files with the extension .s containing the assembly language code implementing the program. It is recommended that the code implementing each class be generated into a separate .s file. You may assume that the output directory exists before your compiler runs, and that the directory is empty. After your compiler runs, each of the .s files in the directory will be assembled with the command:

/u/cs444/bin/nasm -O1 -f elf -g -F dwarf filename.s

After all of the files are successfully assembled, the file runtime.s from the standard library (see below for description) will also be assembled and placed in the output directory. Then, all of the .o files generated by nasm in the output directory will be linked using the command:

ld -melf_i386 -o main output/*.o

Finally, the generated executable main will be executed.

Your joosc compiler, the assembler and linker, and your final main executable will be run on one of the Linux CPU servers (e.g., linux028.student.cs).

One of the generated .s files must define the global symbol _start:

global _start
_start:

When your program is run, execution will start from this point.

Unlike in Java, the first method that begins executing is not static void main(String[]), but static int test(). All of the test inputs will have such a method. The class containing the test method will be listed first on the joosc command line, before any other compilation units. The code that you generate at _start should initialize all static fields, then call this method. When the method returns with return value x, your program should exit with exit code x using the sys_exit system call. To execute this system call, load the value 1 (indicating sys_exit) into register eax, load the exit code into register ebx, then execute the instruction int 0x80.

Java specifies a very precise but complicated order in which static fields must be initialized (JLS 12.4). For Joos, the order is specified by the following rules:

Note that Java and Joos require that any field without an explicit initializer be initialized to the value false, 0, or null, depending on its declared type.

Another simplification is that arrays in a Joos program inherit the clone() method from java.lang.Object. In Java, arrays are required to implement a different clone method as specified in Section 10.7 of the Java Language Specification.

The runtime.s file included with the standard library contains several utilities that are likely to be useful. In particular, assembly code generated by your compiler in this assignment will call the following functions:

Important note: Although most of the Marmoset tests do not use or test standard output, all of the secret tests, which are worth many marks, do. To get a non-zero mark on the secret tests, it is essential to pass the J1_Hello test for Assignment 6, which prints Hello, World! to standard output.

Marmoset tests the code generated by your compiler on a server running Linux. If you are using Windows or a Mac, the recommended way to test the generated code is either to copy it to linux.student.cs and run it there, or to run it using Linux in a virtual machine. Marmoset will test your compiler on the linux.student.cs servers with the runtime.s file that is included with the Joos standard library.

Before starting to implement these assignments, it is strongly recommended that you meet with your group to design and agree on conventions for

It is recommended that you document these conventions at this stage, and include this documentation in the report that you hand in. It is suggested that you modularize the implementation of these conventions in dedicated modules in your compiler, to ensure consistency between the different parts of your compiler that rely on the conventions.

Benchmarking your compiler optimizations

For each optimization <opt> you implement, you should create three benchmark programs that demonstrate the effectiveness of the optimization. It is recommended that for each of your benchmarks, the unoptimized version of the generated code takes ~1–3 seconds to run natively, so that the measurements are reliable but not too time-consuming. A good way is to create benchmarks that perform optimizable computations repeatedly. The benchmarks should be placed in the subdirectories benchmarks/<opt> in your code submission. The only required optimization is register allocation (i.e., <opt> being opt-reg-only).

Your Makefile should have a target bench that compiles the benchmarks (with the right command-line options), runs the generated code, and outputs a benchmarks/results.csv file containing the measured running times. For each benchmark, the csv file should contain a row in the following format:

<opt>,<benchmark name>,<time in ms w/ opt>,<time in ms w/o opt>,<speedup>

You will include your own measurements in your report. The course staff will run make bench to reproduce your evaluation results in the linux.student.cs environment.

Miscellaneous

The archive should include a file named a6.log showing the commit history of your Git repository.

The archive should not include any extraneous non-source files. It should not include any files that ought to be automatically generated by building or running your compiler.

We reserve the right to deduct points if your submission does not meet the requirements above.

The Marmoset tests for the assignments take several minutes to run. Do not submit more than one submission at a time to Marmoset. If Marmoset reports that your previous submission has not been tested yet, do not submit another one. Denial-of-service attacks on Marmoset will result in disciplinary action.