Assignment 1: Scanning, Parsing, Weeding
In this assignment, you will implement lexical and syntactic
analysis. The scanner splits up the input into tokens and catches
lexical errors (inputs that cannot form valid tokens).
The parser checks that the input list of tokens conforms to a syntax
specified using a context-free grammar.
The weeding stage detects simple errors that generally could have been
checked by a context-free grammar, but this would make the grammar very
complicated.
It is recommended that your design follow the above three stages, though
it is not required, as long as you somehow implement the lexical and
syntactic specification of Joos 1W.
The following restrictions of the Joos 1W language must be
checked (either in the parser or in the weeder):
- All characters in the input program must be in the range of 7-bit ASCII (0 to 127).
- A class cannot be both abstract and final.
- A method has a body if and only if it is neither abstract nor native.
- An abstract method cannot be static or final.
- A static method cannot be final.
- A native method must be static.
- The type
void
may only be used as the return type of a method.
- A class/interface must be declared in a
.java
file with the same base name as the class/interface.
- An interface cannot contain fields or constructors.
- An interface method cannot be static, final, or native.
- An interface method cannot have a body.
- Every class must contain at least one explicit constructor.
- No field can be final.
- No multidimensional array types or multidimensional array creation expressions are allowed.
- A method or constructor must not contain explicit
this()
or super()
calls.
This phase of the compiler should also determine the values of all literals
in the program. For integer literals, it must check that the literal
is within the range of a Java int
. For character and
string literals, it must expand any escape sequences in the literal.
Your parser must be implemented using an LALR(1) parser generator, such
as CUP, Bison, and JLALR.
Debugging conflicts reported by a parser generator can be challenging,
especially when the parser generator reports only conflict items and
the lookahead symbol.
It is strongly recommended that you use a parser generator that provides
counterexamples which better explain parsing conflicts.
Bison
and an extended version of CUP currently
support counterexample generation.
Report Submission
You are not asked to submit a report for this assignment.
However, Assignment 4 will require a report covering
Assignments 1–4, so it is recommended that you start writing
such a document. Your report will follow the guidelines.
Code Submission
Submit to Marmoset a .zip
archive.
It should include everything required to build and run your project.
In particular, in addition to your source code, it should contain a
file named Makefile
. Marmoset will run make
on this Makefile
to build your compiler. The Makefile
must generate an executable
(binary or shell script) called joosc
.
When run with a single
argument, a filename, joosc
should process the given Joos 1W file,
produce appropriate diagnostic messages on standard error, and exit
with one of the following Unix return codes:
- 0: the input file is lexically and syntactically valid Joos 1W
- 42: the input file is not lexically or syntactically valid Joos 1W
- any other value: your compiler crashed
It should include all your test cases and test code that you used to
test your program. Be sure to mention where these files are in your
report. Do not include Marmoset public tests.
It should include a file named a1.log
showing the commit history
of your Git repository.
It should not include any derived, IDE, or configuration files
or directories such as .class
, .jar
, .classpath
, .project
,
.git
, .gitignore
, and .DS_Store
, unless they are precompiled versions of
third-party libraries.
Be sure to mention in your report which third-party libraries your
project depends on.
If you use a lexer/parser generator, please include the lexer/parser
input file. Your Makefile
should use this file to generate the
lexer/parser code, and your submission should not include the
generated lexer/parser code.
Your build process should not transmit data from/to the internet
in any way.
It is strongly encouraged that you use a command-line tool (e.g.,
zip
on Linux) to create the .zip
archive.
It is strongly encouraged that you write a small (shell) script to
create your .zip
submission; you will be using it repeatedly
throughout the course.
We reserve the right to deduct points if your submission does not
meet the requirements above.
The Marmoset tests for this assignment take several minutes to run.
Do not submit more than one submission at a time to Marmoset.
If Marmoset reports that your previous submission has not been tested
yet, do not submit another one. Denial-of-service attacks on Marmoset
will result in disciplinary action.