Assignments 5: Code Generation (Part A)
In this assignment, you will make your compiler generate
i386 assembly (Intel dialect) for the subset of Joos 1W language
that does not include object-oriented features.
This subset does include arrays, static methods, and static fields.
Your compiler should first generate IR code, then lower it to canonical form,
and finally emit assembly code by tiling the IR syntax trees.
You are expected to make use of x86 features, including
memory operands and rich addressing modes, in designing your tiles.
You need not implement register allocation in this assignment;
it is acceptable to spill all variables to the stack.
A Java definition of the IR used in lectures can be found under
/u/cs444/pub/tir
in the linux.student.cs
environment.
AST types and an interpreter are included.
You can change its definition or reimplement it in your chosen
implementation language, but you should document your changes in your
report.
Your IR should not deviate too much from the provided intermediate
representation; for example, using LLVM or JVM is not allowed.
Report Submission
You are not asked to submit a report for this assignment.
However, Assignment 6 will require a report covering
Assignments 5 and 6, so it is recommended that you start writing
such a document. Your report will follow the guidelines.
Code Submission
Submit to Marmoset a .zip
archive. It should include everything required
to build and run your project. In particular, the .zip
file must
contain a file called Makefile
. Marmoset will run make
on this
Makefile
to compile your compiler. The Makefile
must generate an
executable (binary or shell script) called joosc
. The joosc
executable must accept multiple filenames as arguments. All of the files
listed on the joosc
command line, and only those files, are
considered part of the program being compiled.
Unlike javac
, your joosc
compiler should not look for classes in .class
files on the
CLASSPATH
; it should read only the Joos 1W source files
listed on the command line. This means that all classes, including
classes such as java.lang.Object
, must be available in
source form and must be specified on the joosc
command line.
Unlike javac
, Joos does not care what directory a source
file is in; that is, it does not require the directory structure
of the source code to match the package structure.
However, the class declared in a file must still have the same name
as the filename.
For example, Java would require that the class java.lang.Object
be declared in the file Object.java
in the directory
java/lang
, whereas Joos only requires the file to
be named Object.java
, but otherwise allows it to
be in any directory.
For the purposes of this course, a minimalist version of the
Java standard library is provided. This library can be found
in the directory /u/cs444/pub/stdlib/5.0
in the linux.student.cs
environment. Marmoset will include all
files in this library on the joosc
command line for
every test, in addition to other source file(s) specific to that
test. The following versioning scheme is used to make it possible
to correct errors and/or to extend the library for future assignments
(although we aim to minimize the number of changes that will be required).
The 5 in the directory name refers to Assignment 5, and the 0
is the first version of the library. Any corrections to the Assignment 5
version of the library will appear in the directories 5.1
,
5.2
, etc.
As in previous assignments,
joosc
should process the Joos 1W files given on the command line,
produce appropriate diagnostic messages on standard error,
and exit with one of the following Unix return codes:
- 0: the input file is valid Joos 1W
- 42: the input file is not valid Joos 1W
- any other value: your compiler crashed
If the input program is valid Joos 1W, your compiler should output,
into a subdirectory (named output
) of the current working directory,
one or more files with the extension .s
containing the assembly
code implementing the program.
You may assume that the output
directory exists
before your compiler runs, and that the directory is empty.
After your compiler runs, each of the .s
files in the
directory will be assembled with the command:
/u/cs444/bin/nasm -O1 -f elf -g -F dwarf filename.s
After all the files are successfully assembled, the
file runtime.s
from the standard library (see below for description)
will also be assembled and placed in the output
directory.
Then, all of the .o
files generated by nasm
in the
output
directory will be linked using the command:
ld -melf_i386 -o main output/*.o
Finally, the generated executable main
will be executed.
One of the generated .s
files must define the global symbol
_start
:
global _start
_start:
When your program is run, execution will start from this point.
Unlike in Java, the first method that begins executing is not
static void main(String[])
, but static int test()
.
All the test inputs will have such a method. The class containing
the test
method will be listed first on the joosc
command line,
before any other compilation units.
The code that you generate
at _start
should initialize all static fields,
then call this method.
When the method returns with return value x,
your program should exit with exit code x using the
sys_exit
system call. To execute this system call,
load the value 1 (indicating sys_exit
) into register
eax, load the exit code into register ebx, then execute the
instruction int 0x80
.
Java specifies a very precise but complicated order in which
static fields must be initialized (JLS 12.4). For Joos, the order is
specified by the following rules:
- All static fields must be initialized before the startup
code calls the
static int test()
method.
- Static fields within the same class must be initialized
in the order in which they appear in the class.
- Static fields in different classes can be initialized
in any order.
Note that Java and Joos require that any field without an explicit
initializer be initialized to the value false
, 0
,
or null
, depending on its declared type.
The runtime.s
file included with the standard library
contains several utilities that are likely to be useful.
In particular, assembly code generated by your compiler in this assignment
will call the following functions:
- The function
__malloc
allocates a number of bytes of memory. The
number of bytes to be allocated must be in the register
eax before executing the instruction call __malloc
. The address of
the beginning of the allocated memory can be found on register eax
after the call. There is no provision for freeing allocated memory;
you should not need it for the simple programs that we will be testing
with.
- The function
__exception
ends the program with exit
code 13. You should call this function in any situation in which the
equivalent Java code would throw an exception, such as a failed null
check, array bounds check, or cast check.
Marmoset tests the code generated by your compiler on a server
running Linux. If you are using Windows or a Mac, the recommended
way to test the generated code is either to copy it to linux.student.cs
and run it there, or to run it using Linux in a virtual machine.
Marmoset will test your compiler on the linux.student.cs
servers with the runtime.s
file that is included with the
Joos standard library.
Before starting to implement these assignments, it is strongly recommended
that you meet with your group to design and agree on conventions for
- parameter passing,
- local variable storage,
- array layout, and
- naming of labels for method implementations and data.
It is recommended that you document these conventions at this stage,
and include this documentation in the report that you hand in.
It is suggested that you modularize the implementation of these
conventions in dedicated modules in your compiler, to ensure consistency
between the different parts of your compiler that rely on the
conventions.
The archive should include all your test cases and test code that you used to
test your program. Be sure to mention where these files are in your
report. Do not include Marmoset public tests.
The archive should include a file named a5.log
showing the commit history
of your Git repository.
The archive should not include any extraneous non-source files.
It should not include any files that ought to be automatically
generated by building or running your compiler.
Your build process should not transmit data from/to the internet
in any way.
We reserve the right to deduct points if your submission does not
meet the requirements above.
The Marmoset tests for the assignments take several minutes to run.
Do not submit more than one submission at a time to Marmoset.
If Marmoset reports that your previous submission has not been tested
yet, do not submit another one. Denial-of-service attacks on Marmoset
will result in disciplinary action.