Assignment 1: Scanning, Parsing, Weeding

In this assignment, you will implement lexical and syntactic analysis. The scanner splits up the input into tokens and catches lexical errors (inputs that cannot form valid tokens). The parser checks that the input list of tokens conforms to a syntax specified using a context-free grammar. The weeding stage detects simple errors that generally could have been checked by a context-free grammar, but this would make the grammar very complicated. It is recommended that your design follow the above three stages, though it is not required, as long as you somehow implement the lexical and syntactic specification of Joos 1W.

The following restrictions of the Joos 1W language must be checked (either in the parser or in the weeder):

This phase of the compiler should also determine the values of all literals in the program. For integer literals, it must check that the literal is within the range of a Java int. For character and string literals, it must expand any escape sequences in the literal.

Your parser must be implemented using an LALR(1) parser generator, such as CUP, Bison, and JLALR. Debugging conflicts reported by a parser generator can be challenging, especially when the parser generator reports only conflict items and the lookahead symbol. It is strongly recommended that you use a parser generator that provides counterexamples which better explain parsing conflicts. Bison and an extended version of CUP currently support counterexample generation.

Report Submission

You are not asked to submit a report for this assignment. However, Assignment 4 will require a report covering Assignments 1–4, so it is recommended that you start writing such a document. Your report will follow the guidelines.

Code Submission

Submit to Marmoset a .zip archive. It should include everything required to build and run your project. In particular, in addition to your source code, it should contain a file named Makefile. Marmoset will run make on this Makefile to build your compiler. The Makefile must generate an executable (binary or shell script) called joosc.

When run with a single argument, a filename, joosc should process the given Joos 1W file, produce appropriate diagnostic messages on standard error, and exit with one of the following Unix return codes:

It should include all your test cases and test code that you used to test your program. Be sure to mention where these files are in your report. Do not include Marmoset public tests.

It should include a file named a1.log showing the commit history of your Git repository.

It should not include any derived, IDE, or configuration files or directories such as .class, .jar, .classpath, .project, .git, .gitignore, and .DS_Store, unless they are precompiled versions of third-party libraries. Be sure to mention in your report which third-party libraries your project depends on. If you use a lexer/parser generator, please include the lexer/parser input file. Your Makefile should use this file to generate the lexer/parser code, and your submission should not include the generated lexer/parser code.

Your build process should not transmit data from/to the internet in any way.

It is strongly encouraged that you use a command-line tool (e.g., zip on Linux) to create the .zip archive. It is strongly encouraged that you write a small (shell) script to create your .zip submission; you will be using it repeatedly throughout the course.

We reserve the right to deduct points if your submission does not meet the requirements above.

The Marmoset tests for this assignment take several minutes to run. Do not submit more than one submission at a time to Marmoset. If Marmoset reports that your previous submission has not been tested yet, do not submit another one. Denial-of-service attacks on Marmoset will result in disciplinary action.