Assignments for CS241
Due Friday, March 20th, 5:00 PM
In this assignment, you will implement the context-sensitive analysis phase (also known as semantic analysis) of the WLP4 compiler.
The main goal of this phase is to catch all remaning compile-time errors that were not caught by the scanner or parser. A secondary goal is to compute type information for expressions, which will be needed in the code generation phase.
In this assignment, like the assembler assignment, you are gradually building up one large program across multiple problems. You can submit the same program to all problems if you want.
For all problems, you must submit one of the following two files:
You can submit additional files if you wish to divide your program into multiple files, but a file with one of the above names must be present in your submission.
The input for all problems will be a
WLP4 Intermediate (.wlp4i) file
representing a parse tree, that is, the output of the parsing phase.
If you did not complete your own WLP4 parser
in the previous assignment, you can use the
provided
cs241.wlp4parse
tool to generate inputs. Note that cs241.wlp4parse
expects input in the format produced by the scanner
cs241.wlp4scan,
so you will need to run your WLP4 source code through both commands
to produce a .wlp4i file:
$ cs241.wlp4scan < program.wlp4 | cs241.wlp4parse > program.wlp4i
If the input program has an error in it, you simply need to produce
an error message containing ERROR on standard error,
as usual.
If the input program parse tree is correct, it should produce a WLP4 Typed Intermediate (.wlp4ti) file on standard output. This is a representation of the parse tree that has been annotated with type information. You can produce this output by traversing the tree and assigning a type to all expression nodes, then doing a second (preorder) traversal where you print out the tree along with the assigned types. An expression node is defined as either of the following:
NUM, NULL,
or ID token. ID tokens are only considered expression nodes if they represent a variable (including parameters), not if they represent a procedure.
expr, term, factor,
or lvalue.
The cs241.wlp4type tool can be used to convert a .wlp4i file to a .wlp4ti file. It performs context sensitive analysis and produces the type-annotated parse tree. You can use this to check the correctness of your output on the assignment.
In this problem, you will perform context-sensitive analysis on a heavily restricted class of WLP4 programs:
wain procedure is allowed.wain does not contain declarations or statements,
only a return expression.More formally, only the following grammar rules will appear in the parse tree of the program:
start → BOF procedures EOF
procedures → main
main → LONG WAIN LPAREN dcl COMMA dcl RPAREN LBRACE dcls statements RETURN expr SEMI RBRACE
type → LONG
type → LONG STAR
dcl → type ID
dcls →
statements →
expr → term
term → factor
factor → NUM
factor → NULL
factor → LPAREN expr RPAREN
Your main task is to read in a .wlp4i file from standard input,
which is a representation of the parse tree that the parser constructed,
and rebuild the tree.
(Having to construct the tree in the parser
and then immediately reconstruct the tree here
may seem silly, but the assignment is
structured this way so that people who did not finish the WLP4 parser
can still do this assignment.)
After building the tree, you will need traverse it to check for semantic errors in the WLP4 program. There are only a few possible semantic errors in these restricted WLP4 programs:
wain have the same name.wain is not long type.wain is not long type.
If the program contains a semantic error, you should print
ERROR to standard error in all caps.
If there are no semantic errors, you are to print a .wlp4ti file representing the parse tree of the WLP4 program with type annotations to standard output.
To produce this output, you will need to assign a type to each node that represents an expression. For these restricted WLP4 programs, this should not be too difficult. The only nodes representing expressions are the ID nodes in the dcl subtrees, and the nodes in the return expression subtree. For the dcl subtrees, the type is stored right beside the ID, and for the return expression subtree, the type of each node just depends on whether there is NUM or NULL at the leaf (and having NULL at the leaf is a semantic error).
For the following WLP4 program:
long wain(long a, long b) { return 241; }
The input would not be this program, but rather the
corresponding .wlp4i file
produced by passing the program through wlp4scan
and wlp4parse:
start BOF procedures EOF
BOF BOF
procedures main
main LONG WAIN LPAREN dcl COMMA dcl RPAREN LBRACE dcls statements RETURN expr SEMI RBRACE
LONG long
WAIN wain
LPAREN (
dcl type ID
type LONG
LONG long
ID a
COMMA ,
dcl type ID
type LONG
LONG long
ID b
RPAREN )
LBRACE {
dcls .EMPTY
statements .EMPTY
RETURN return
expr term
term factor
factor NUM
NUM 241
SEMI ;
RBRACE }
EOF EOF
And the output would be the corresponding type-annotated .wlp4ti file:
start BOF procedures EOF
BOF BOF
procedures main
main LONG WAIN LPAREN dcl COMMA dcl RPAREN LBRACE dcls statements RETURN expr SEMI RBRACE
LONG long
WAIN wain
LPAREN (
dcl type ID
type LONG
LONG long
ID a : long
COMMA ,
dcl type ID
type LONG
LONG long
ID b : long
RPAREN )
LBRACE {
dcls .EMPTY
statements .EMPTY
RETURN return
expr term : long
term factor : long
factor NUM : long
NUM 241 : long
SEMI ;
RBRACE }
EOF EOF
Notice that the expr, term,
factor, ID, and NUM
lines have been annotated with the corresponding type.
The wlp4type tool can be used to generate expected outputs. The input is a .wlp4i file (not WLP4 source code).
Extend your context-sensitive analysis to allow for
programs with declarations in the body of wain, and
programs where the
return expression of wain is a variable.
This means the following additional rules can appear in the parse tree:
dcls → dcls dcl BECOMES NUM SEMI
dcls → dcls dcl BECOMES NULL SEMI
factor → ID
The main change to your program in this problem is that you will need to create a symbol table to keep track of which variables have been declared and their types.
Your program now must detect the following semantic errors:
wain is not long type.wain does not match
the type of its initialization value.wain.wain is not long type.wain,
but was not declared in wain.
If the program contains a semantic error, you should print
ERROR to standard error in all caps.
If there are no semantic errors, you
are to print a .wlp4ti file
representing the parse tree of the
WLP4 program with type annotations
to standard output.
Extend your context-sensitive analysis to allow for binary arithmetic
operations (+ - * / %) and unary pointer operations
(* & new) in expressions.
This means you should handle the following additional grammar rules:
expr → expr PLUS term
expr → expr MINUS term
term → term STAR factor
term → term SLASH factor
term → term PCT factor
factor → AMP lvalue
factor → STAR factor
factor → NEW LONG LBRACK expr RBRACK
lvalue → ID
lvalue → STAR factor
lvalue → LPAREN lvalue RPAREN
The main change to your program
in this problem will be in your code for assigning types
to the nodes in the return expression subtree of wain.
This will
become significantly more complicated.
These new rules introduce many new possible semantic errors. You will need to handle all type errors in expressions that do not involve procedure calls.
If the program contains a semantic error, you should print
ERROR to standard error in all caps.
If there are no semantic errors, you
are to print a .wlp4ti file
representing the parse tree of the
WLP4 program with type annotations
to standard output.
factor → ID rule
in the case where the ID is a parameter of wain, so you will
need to at least keep track of those two parameter types.Extend your context-sensitive analysis to allow for procedures other than wain in the program, and to allow for procedure calls in expressions. Procedures are still not allowed to contain statements.
This means you should handle the following additional grammar rules:
procedures → procedure procedures
procedure → LONG ID LPAREN params RPAREN LBRACE dcls statements RETURN expr SEMI RBRACE
params →
params → paramlist
paramlist → dcl
paramlist → dcl COMMA paramlist
factor → ID LPAREN RPAREN
factor → ID LPAREN arglist RPAREN
factor → GETCHAR LPAREN RPAREN
arglist → expr
arglist → expr COMMA arglist
Your program will need several major changes:
wain), you
will need separate symbol tables for each procedure's
local variables.wain, you will need to process the return
expressions of every other procedure.variable()
or 1 + procedure.
If the program contains a semantic error, you should print
ERROR to standard error in all caps.
If there are no semantic errors, you
are to print a .wlp4ti file
representing the parse tree of the
WLP4 program with type annotations
to standard output.
Extend your context-sensitive analysis to allow for statements. You should now handle arbitrary WLP4 programs; all rules from the WLP4 grammar can appear in the parse tree. The following grammar rules are new for this problem:
statements → statements statement
statement → lvalue BECOMES expr SEMI
statement → IF LPAREN test RPAREN LBRACE statements RBRACE ELSE LBRACE statements RBRACE
statement → WHILE LPAREN test RPAREN LBRACE statements RBRACE
statement → PRINTLN LPAREN expr RPAREN SEMI
statement → PUTCHAR LPAREN expr RPAREN SEMI
statement → DELETE LBRACK RBRACK expr SEMI
test → expr EQ expr
test → expr NE expr
test → expr LT expr
test → expr LE expr
test → expr GE expr
test → expr GT expr
You will need to extend your analysis for expressions to process all expressions that appear in statements. You will also need to check for type errors in statements. Determining the full list of new semantic errors to check for is left as an exercise.
If the program contains a semantic error, you should print
ERROR to standard error in all caps.
If there are no semantic errors, you
are to print a .wlp4ti file
representing the parse tree of the
WLP4 program with type annotations
to standard output.
© University of Waterloo. | Design by TEMPLATED.