CS 241 — Winter 2024 — Assignment 6 / Out of 15 marks

Assignments for CS 241
← Assignment 5 Assignment 6 Assignment 7 →
Friday, Mar 22th at 5:00 pm
P1P2P3P4P5

In this assignment, you will implement the context-sensitive analysis phase (also known as semantic analysis) of the WLP4 compiler.

The main goal of this phase is to catch all remaning compile-time errors that were not caught by the scanner or parser. A secondary goal is to compute type information for expressions, which will be needed in the code generation phase.

Filenames

In this assignment, like the assembler assignment, you are gradually building up one large program across multiple problems. You can submit the same program to all problems if you want.

For all problems, you must submit one of the following two files:

You can submit additional files if you wish to divide your program into multiple files, but a file with one of the above names must be present in your submission.

Input and Output Format

The input for all problems will be a WLP4 Intermediate (.wlp4i) file representing a parse tree, that is, the output of the parsing phase. If you did not complete your own WLP4 parser in the previous assignment, you can use the provided wlp4parse tool to generate inputs. Note that wlp4parse expects input in the format produced by the scanner wlp4scan, so you will need to run your WLP4 source code through both commands to produce a .wlp4i file:

$ wlp4scan < program.wlp4 | wlp4parse > program.wlp4i

If the input program has an error in it, you simply need to produce an error message containing ERROR on standard error, as usual.

If the program is correct, should produce a WLP4 Typed Intermediate (.wlp4ti) file on standard output. This is a representation of the parse tree that has been annotated with type information. You can produce this output by traversing the tree and assigning a type to all expression nodes, then doing a second (preorder) traversal where you print out the tree along with the assigned types. An expression node is defined as follows:

The wlp4type tool can be used to convert a .wlp4i file to a .wlp4ti file. It performs context sensitive analysis and produces the type-annotated parse tree. You can use this to check the correctness of your output on the assignment.

Problem 1: Simple Programs [2 marks, 1 release+secret]

In this problem, you will perform context-sensitive analysis on a heavily restricted class of WLP4 programs:

More formally, only the following grammar rules will appear in the parse tree of the program:

start → BOF procedures EOF
procedures → main
main → INT WAIN LPAREN dcl COMMA dcl RPAREN LBRACE dcls statements RETURN expr SEMI RBRACE 
type → INT
type → INT STAR
dcl → type ID
dcls →
statements →
expr → term 
term → factor 
factor → NUM  
factor → NULL
factor → LPAREN expr RPAREN  

Your main task is to read in a .wlp4i file (from standard input) which is a representation of the parse tree that the parser constructed, and rebuild the tree.
(Having to construct the tree in the parser and then immediately reconstruct the tree here may seem silly, but the assignment is structured this way so that people who did not finish the WLP4 parser can still do this assignment.)

After building the tree, you will need traverse it to check for semantic errors in the WLP4 program. There are only a few possible semantic errors in these restricted WLP4 programs:

If the program contains a semantic error, you should print ERROR to standard error in all caps.

If there are no semantic errors, you are to print a .wlp4ti file representing the parse tree of the WLP4 program with type annotations to standard output.

To produce this output, you will need to assign a type to each node that represents an expression. For these restricted WLP4 programs, this should not be too difficult. The only nodes representing expressions are the ID nodes in the dcl subtrees, and the nodes in the return expression subtree. For the dcl subtrees, the type is stored right beside the ID, and for the return expression subtree, the type of each node just depends on whether there is NUM or NULL at the leaf (and having NULL at the leaf is a semantic error).

Example

For the following WLP4 program:

int wain(int a, int b) { return 241; }

The input would not be this program, but rather the corresponding .wlp4i file produced by passing the program through wlp4scan and wlp4parse:

start BOF procedures EOF
BOF BOF
procedures main
main INT WAIN LPAREN dcl COMMA dcl RPAREN LBRACE dcls statements RETURN expr SEMI RBRACE
INT int
WAIN wain
LPAREN (
dcl type ID
type INT
INT int
ID a
COMMA ,
dcl type ID
type INT
INT int
ID b
RPAREN )
LBRACE {
dcls .EMPTY
statements .EMPTY
RETURN return
expr term
term factor
factor NUM
NUM 241
SEMI ;
RBRACE }
EOF EOF

And the output would be the corresponding type-annotated .wlp4ti file:

start BOF procedures EOF
BOF BOF
procedures main
main INT WAIN LPAREN dcl COMMA dcl RPAREN LBRACE dcls statements RETURN expr SEMI RBRACE
INT int
WAIN wain
LPAREN (
dcl type ID
type INT
INT int
ID a : int
COMMA ,
dcl type ID
type INT
INT int
ID b : int
RPAREN )
LBRACE {
dcls .EMPTY
statements .EMPTY
RETURN return
expr term : int
term factor : int
factor NUM : int
NUM 241 : int
SEMI ;
RBRACE }
EOF EOF

Notice that the expr, term, factor, ID, and NUM lines have been annotated with the corresponding type.

The wlp4type tool can be used to generate expected outputs. The input is a .wlp4i file (not WLP4 source code).

Problem 2: Building a Symbol Table [2 marks, 1 release+secret]

Extend your context-sensitive analysis to allow for programs with declarations in the body of wain, and programs where the return expression of wain is a variable.

This means the following additional rules can appear in the parse tree:

dcls → dcls dcl BECOMES NUM SEMI
dcls → dcls dcl BECOMES NULL SEMI
factor → ID

The main change to your program in this problem is that you will need to create a symbol table to keep track of which variables have been declared and their types.

Your program now must detect the following semantic errors:

If the program contains a semantic error, you should print ERROR to standard error in all caps.
If there are no semantic errors, you are to print a .wlp4ti file representing the parse tree of the WLP4 program with type annotations to standard output.

Problem 3: Type Checking for Operations [3 marks, 2 release+secret]

Extend your context-sensitive analysis to allow for binary arithmetic operations (+ - * / %) and unary pointer operations (* & new) in expressions.

This means you should handle the following additional grammar rules:

expr → expr PLUS term
expr → expr MINUS term
term → term STAR factor
term → term SLASH factor
term → term PCT factor
factor → AMP lvalue
factor → STAR factor
factor → NEW INT LBRACK expr RBRACK
lvalue → ID
lvalue → STAR factor
lvalue → LPAREN lvalue RPAREN

The main change to your program in this problem will be in your code for assigning types to the nodes in the return expression subtree of wain. This will become significantly more complicated.

These new rules introduce many new possible semantic errors. You will need to handle all type errors in expressions that do not involve procedure calls.

If the program contains a semantic error, you should print ERROR to standard error in all caps.
If there are no semantic errors, you are to print a .wlp4ti file representing the parse tree of the WLP4 program with type annotations to standard output.

Dependencies and Part Marks

Problem 4: Procedures [4 marks, 2 release+secret, 1 secret]

Extend your context-sensitive analysis to allow for procedures other than wain in the program, and to allow for procedure calls in expressions. Procedures are still not allowed to contain statements.

This means you should handle the following additional grammar rules:

procedures → procedure procedures
procedure → INT ID LPAREN params RPAREN LBRACE dcls statements RETURN expr SEMI RBRACE
params →
params → paramlist
paramlist → dcl
paramlist → dcl COMMA paramlist
factor → ID LPAREN RPAREN
factor → ID LPAREN arglist RPAREN
arglist → expr
arglist → expr COMMA arglist

Your program will need several major changes:

If the program contains a semantic error, you should print ERROR to standard error in all caps.
If there are no semantic errors, you are to print a .wlp4ti file representing the parse tree of the WLP4 program with type annotations to standard output.

Dependencies and Part Marks

Problem 5: Statements [4 marks, 2 release+secret, 1 secret]

Extend your context-sensitive analysis to allow for statements. You should now handle arbitrary WLP4 programs; all rules from the WLP4 grammar can appear in the parse tree. The following grammar rules are new for this problem:

statements → statements statement
statement → lvalue BECOMES expr SEMI
statement → IF LPAREN test RPAREN LBRACE statements RBRACE ELSE LBRACE statements RBRACE
statement → WHILE LPAREN test RPAREN LBRACE statements RBRACE
statement → PRINTLN LPAREN expr RPAREN SEMI
statement → DELETE LBRACK RBRACK expr SEMI
test → expr EQ expr
test → expr NE expr
test → expr LT expr
test → expr LE expr
test → expr GE expr
test → expr GT expr

You will need to extend your analysis for expressions to process all expressions that appear in statements. You will also need to check for type errors in statements. Determining the full list of new semantic errors to check for is left as an exercise.

If the program contains a semantic error, you should print ERROR to standard error in all caps.
If there are no semantic errors, you are to print a .wlp4ti file representing the parse tree of the WLP4 program with type annotations to standard output.

Dependencies and Part Marks