CS 241 — Winter 2021 — Assignment 6

Assignments for CS 241
← Assignment 5 Assignment 6 Assignment 7 →
Friday, March 5th at 5:00 pm Friday, March 19th at 5:00 pm Friday, March 26th at 5:00 pm
P1P2P3P4P5

In this assignment, you will write a general LR(1) parser, and then specialize it to obtain a parser for WLP4. You do not need to generate LR(1) DFAs, as these will be given to you along with the grammar as input to your program. Your task is just to implement the LR(1) parsing algorithm.

Marmoset Notes

For each question, Marmoset has public and release tests, and often has secret tests as well. When you release test a submission, Marmoset will only report on public and release tests. No information about secret tests will be visible until after the assignment deadline.

Problems 1 and 2 have no public tests, only release tests. Click "view" underneath "detailed test results" to release test your submission.

The public tests for Problems 4 and 5 are worth 0 marks. You must pass the public tests before you can release test your program, but your public test mark will be shown as "0/0" whether you pass or fail.

For this assignment, secret tests are worth 30% of the Marmoset test marks.

Problem 1 — 10 marks of 100 (filename: a6p1.cfg-r)

Write a .cfg-r file which solves A5P6.

You may find the cs241.cfgrl tool helpful for checking your solutions, which takes a .cfg-r file and produces a .cfg file with a forward leftmost derivation equivalent to the .cfg-r file's reversed rightmost derivation.

Problem 2 — 10 marks of 100 (filename: a6p2.cfg-r)

Write a .cfg-r file that solves A5P7.

Problem 3 — 40 marks of 100, 20 secret marks (filename: lr.cc or lr.rkt)

Write a Racket or C++ program that reads an LR1 File representing a context-free grammar, an LR(1) DFA, and a sequence to be recognized. If the sequence is in the language, output a derivation for the sequence using the unindented reversed rightmost derivation format described on the CFG-R page.

If the input is not recognized, output "ERROR at k" (followed by a single newline character) to standard error, where k is one greater than the number of tokens in the longest correct prefix of the input sequence. The longest correct prefix of an input sequence is the longest prefix which could possibly be a prefix of a string generated by the grammar. See the example below.

Caution: In previous assignments, Marmoset would ignore debugging output on standard error and just look for the string "ERROR". However, for this assignment, standard error must have the exact format specified above with no extraneous output.

For the sample LR1 file as input, the correct output is:

term id
expr term
term id
expr term
term ( expr )
expr expr - term
term id
expr expr - term
S BOF expr EOF

If you replace the last line of this file by:

BOF id - id ) - id EOF

the correct output (to standard error) is:

ERROR at 5

In this case, the longest correct prefix is BOF id - id since this prefix could possibly be extended into a string generated by the grammar, such as BOF id - id EOF. The next longest prefix is BOF id - id ) which cannot be extended into a string generated by the grammar, since the ) can never be matched. Since the longest corrext prefix has 4 tokens, the output is "ERROR at 5".

Your program should only take one sequence as input, but this sequence could be split over multiple lines. For example, if the last line of the sample LR1 file was replaced with the following sequence of lines:

BOF
id
-
(
id
)
-
id 
EOF

Then the correct output would be the same as the example shown above. The test inputs on Marmoset will use Unix-style newlines, with no extraneous whitespace at the end of a line, and the last line in the file shall be terminated with a newline.

You may test your program with the WLP4 grammar and parse table, in .lr1 format after adding a sequence to be recognized. (Note: if your browser has a automatic translation feature, disable it, as it may corrupt this file. Or simply download the file instead of copy-pasting it.)

You can also use the tool cs241.slr to generate the LR(1) DFA for any SLR(1) grammar. You can use it to generate LR(1) DFAs for your own grammars, allowing you to create additional test inputs to this problem. cs241.slr expects a CFG file file on standard input, and outputs an LR1 file on standard output.

Problem 4 — 15 marks of 100 (filename: wlp4parse.cc or wlp4parse.rkt)

Modify your solution to Problem 3 to always use the WLP4 grammar and parse table. It should accept as input the output from Assignment 5 Problem 1 and have the same output as Problem 3, except that each terminal is printed with its lexeme at the appropriate point in the derivation. For example, if the input to the program is (note the lack of BOF and EOF):

INT int
WAIN wain
LPAREN (
INT int
ID a
COMMA ,
INT int
ID b
RPAREN )
LBRACE {
RETURN return
NUM 42
SEMI ;
RBRACE }

The output should be:

BOF BOF
INT int
WAIN wain
LPAREN (
INT int
type INT
ID a
dcl type ID
COMMA ,
INT int
type INT
ID b
dcl type ID
RPAREN )
LBRACE {
dcls
statements
RETURN return
NUM 42
factor NUM
term factor
expr term
SEMI ;
RBRACE }
main INT WAIN LPAREN dcl COMMA dcl RPAREN LBRACE dcls statements RETURN expr SEMI RBRACE
procedures main
EOF EOF
start BOF procedures EOF

Errors should be handled in the same way as Problem 3. Since BOF is not present in the input, it is not counted as a token when determining the length of the longest correct prefix.

On Marmoset, the public tests for this problem are worth 0 marks. You must pass the public tests before you can release test your program, but your public test mark will be shown as "0/0" whether you pass or fail.

Problem 5 — 25 marks of 100, 10 secret marks (filename: wlp4parse.cc or wlp4parse.rkt)

Modify your output to Problem 4 to follow the WLP4I format. The format of a .wlp4i file represents the derivation as well as the tokens in the program. You should make a parse tree and print it using a preorder traversal to achieve this.

You can use wlp4scan to help generate inputs to your program, and wlp4parse and wlp4icheck to help check your output.

For example, when using the input from the previous example, the corresponding output is:

start BOF procedures EOF
BOF BOF
procedures main
main INT WAIN LPAREN dcl COMMA dcl RPAREN LBRACE dcls statements RETURN expr SEMI RBRACE
INT int
WAIN wain
LPAREN (
dcl type ID
type INT
INT int
ID a
COMMA ,
dcl type ID
type INT
INT int
ID b
RPAREN )
LBRACE {
dcls
statements
RETURN return
expr term
term factor
factor NUM
NUM 42
SEMI ;
RBRACE }
EOF EOF

Errors should be handled in the same way as Problem 4.

On Marmoset, the public tests for this problem are worth 0 marks. You must pass the public tests before you can release test your program, but your public test mark will be shown as "0/0" whether you pass or fail.

Click here to return to the top of the page.