Assignments for CS 241 | ||
---|---|---|
← Assignment 4 | Assignment 5 | Assignment 6 → |
Friday, Nov 8th at 5:00 pm | ||
P1 • P2 • P3 • P4 • P5 |
On this assignment, you will get some practice with CFGs, then implement the LR(1) parsing algorithm using an SLR(1) DFA. Finally, you will specialize your parser to obtain a parser for WLP4; this parser will operate on the output of the scanner you wrote in the previous assignment.
You can still do all problems on this assignment
even if you did not complete
the WLP4 scanner. The wlp4scan
tool can be used to produce inputs for your WLP4 parser.
A number of different file formats are used for the problems on this assignment. We have tried to structure these file formats in a uniform way so that they are relatively easy to read and write.
.
and is followed by a string in all caps,
such as .CFG
or .ACTIONS
.
read-chars
function
from the previous assignment's starter code, or just
use string-split
to split
the string on whitespace).
.
character.
.END
. Nothing follows this line, so you
can stop reading the file once you see this line.
We suggest the following approach to reading files made up of these components:
.END
, you do not need two different
cases for detecting "end of component" (reached a header line)
and "end of file" (no more input).
A reference page for the different types of components is available here.
Below is an ambiguous context-free grammar:
start → ( list ) list → value list list → ε value → object value → number value → animal object → KIT number → TEN animal → KIT TEN
And here is a representation of this grammar as a CFG component:
.CFG start ( list ) list value list list .EMPTY value object value number value animal object KIT number TEN animal KIT TEN
Create a file called animal.cfg
containing the above
CFG component, followed by two
DERIVATION components,
terminated by a .END
line.
The DERIVATION components should represent two different
leftmost derivations of the same string (thus proving
that the grammar is ambiguous).
cs241.cfgcheck
course tool can be used to verify if your CFG and derivations
are formatted correctly and produce the correct string.
Here is a WLP4 program:
int wain(int* array, int size) { *array = 241; return (1 + 2) / 3; }
And here it is as a sequence of token kinds:
INT WAIN LPAREN INT STAR ID COMMA INT ID RPAREN LBRACE STAR ID BECOMES NUM SEMI RETURN LPAREN NUM PLUS NUM RPAREN SLASH NUM SEMI RBRACE
Create a file called program.cfg
containing a CFG component,
followed by a DERIVATION component,
terminated by a .END
line.
The CFG component should be this CFG for the
(non-augmented) WLP4 grammar, and the DERIVATION component should be a
leftmost derivation of the above sequence of token kinds.
Note: If you are using a browser with an automatic translation feature, such as Google Chrome, it may attempt to "translate" the WLP4 grammar linked above and corrupt it! You should either download the file, or ensure automatic translation is disabled before copying the WLP4 grammar into your solution file.
In this problem, you will write a C++ or Racket program that performs bottom-up parsing. However, rather than making parsing decisions yourself, you will be given a preset sequence of "shift" and "reduce" actions corresponding to a successful parse, and you simply need to perform the parsing actions you are given. There is also a "print" action that prints out the current progress of the parse.
Suppose you are given the input below.
representing a CFG for balanced parentheses,
the input sequence BOF ( ) EOF
,
and a sequence of parsing and print actions.
Assume the rules in the CFG are numbered,
where the first rule is rule 0, the second
is rule 1, etc.
The text in italics on the right is not part of the input. It is just there to explain what each shift and reduce action is doing.
.CFG S BOF P EOF P ( P ) P P .EMPTY .INPUT BOF ( ) EOF .ACTIONS print shift shift reduce 2 shift reduce 2 print reduce 1 print shift print reduce 0 print .END |
Explanation Parsing Progress the . marks position in input . BOF ( ) EOF shift BOF BOF . ( ) EOF shift ( BOF ( . ) EOF reduce by P → ε BOF ( P . ) EOF shift ) BOF ( P ) . EOF reduce by P → ε BOF ( P ) P . EOF reduce by P → ( P ) P BOF P . EOF shift EOF BOF P EOF . reduce by S → BOF P EOF S . |
The expected output is obtained by printing the "parsing progress" at each "print" line:
. BOF ( ) EOF BOF ( P ) P . EOF BOF P . EOF BOF P EOF . S .
The input to your program will consist of a CFG component,
an INPUT component, and
an ACTIONS component, in that order,
terminated by a .END
line.
To parse the input, your program should maintain two pieces of information:
At the start of the parse, the reduction sequence should be empty, and the input sequence should contain the full sequence to be parsed from the INPUT component.
Your program should then perform each of the parsing actions specified in the ACTIONS component, in order.
.
), followed by the input sequence. There should be a
single space separating each printed element (including the dot) and
a line feed should be printed after the last element of the input
sequence. This action does not modify the reduction sequence or
input sequence, and is simply used to view the current progress of parsing.
For full marks, you should choose your data structures carefully in order to meet the following efficiency requirements:
The efficiency test is a secret test on Marmoset.
For the following input (which prints after every shift/reduce action to produce a full trace of the parse):
.CFG S BOF expr EOF expr expr + term expr term term term * factor term factor factor # factor id .INPUT BOF id + # * id EOF .ACTIONS print shift print shift print reduce 6 print reduce 4 print reduce 2 print shift print shift print reduce 5 print reduce 4 print shift print shift print reduce 6 print reduce 3 print reduce 1 print shift print reduce 0 print .END
The output should be:
. BOF id + # * id EOF BOF . id + # * id EOF BOF id . + # * id EOF BOF factor . + # * id EOF BOF term . + # * id EOF BOF expr . + # * id EOF BOF expr + . # * id EOF BOF expr + # . * id EOF BOF expr + factor . * id EOF BOF expr + term . * id EOF BOF expr + term * . id EOF BOF expr + term * id . EOF BOF expr + term * factor . EOF BOF expr + term . EOF BOF expr . EOF BOF expr EOF . S .
In this problem, you will modify your solution to the previous problem so that it determines whether to shift or reduce using an SLR(1) DFA.
The input to your program will consist of a CFG component,
an INPUT component,
a TRANSITIONS component,
and a REDUCTIONS component,
terminated by a .END
line.
The latter two components encode the transitions and reducible
items of an SLR(1) DFA.
The CFG and the input-to-be-parsed will be augmented. This means:
augmented-start-symbol BOF original-start-symbol EOFThe
augmented-start-symbol
and the original-start-symbol
may differ
between test grammars, but all test grammars will use the symbols
BOF and EOF to mark the beginning and end of input.
augmented-start-symbol
will never appear on the right hand side of a CFG rule.
Your program should work the same way as in the previous problem, except that it uses the SLR(1) DFA to decide whether to shift or reduce at each parsing step. Your program should perform the "print" action once at the start (before shifting or reducing), and also after every shift or reduce. In other words, your output should be a full trace of the parse.
Additionally, in this problem, it is possible that the parse will be unsuccessful. This happens if there is no transition on the next symbol of input when the DFA tells you to shift. If this occurs, output a single line consisting of the string ERROR at k (terminated with a line feed) to standard error, where k is one greater than the number of terminal symbols that were successfully shifted prior to the error.
Unlike in some earlier assignments, where your error message simply had to contain "ERROR", in this problem your error message must exactly match the required message. Extraneous output (such as debugging output) on standard error will cause you to fail some Marmoset tests.
.CFG S BOF expr EOF expr expr - term expr term term id term ( expr ) .INPUT BOF id - ( id - id ) - id EOF .TRANSITIONS 0 BOF 6 1 id 2 1 ( 3 1 term 8 3 id 2 3 ( 3 3 term 4 3 expr 7 6 id 2 6 ( 3 6 term 4 6 expr 10 7 - 1 7 ) 9 10 - 1 10 EOF 5 .REDUCTIONS 2 3 - 2 3 ) 2 3 EOF 4 2 - 4 2 ) 4 2 EOF 5 0 .ACCEPT 8 1 - 8 1 ) 8 1 EOF 9 4 - 9 4 ) 9 4 EOF .END |
The output should be:
. BOF id - ( id - id ) - id EOF BOF . id - ( id - id ) - id EOF BOF id . - ( id - id ) - id EOF BOF term . - ( id - id ) - id EOF BOF expr . - ( id - id ) - id EOF BOF expr - . ( id - id ) - id EOF BOF expr - ( . id - id ) - id EOF BOF expr - ( id . - id ) - id EOF BOF expr - ( term . - id ) - id EOF BOF expr - ( expr . - id ) - id EOF BOF expr - ( expr - . id ) - id EOF BOF expr - ( expr - id . ) - id EOF BOF expr - ( expr - term . ) - id EOF BOF expr - ( expr . ) - id EOF BOF expr - ( expr ) . - id EOF BOF expr - term . - id EOF BOF expr . - id EOF BOF expr - . id EOF BOF expr - id . EOF BOF expr - term . EOF BOF expr . EOF BOF expr EOF . S .
If the INPUT component was replaced with the following:
.INPUT BOF id - ( id - ) - id EOF
Then your program should print ERROR at 7
to
standard error.
This signifies that the error occurred after 6 symbols
were read successfully and therefore the 7th symbol
is the one that caused the error.
It does not matter what is printed to standard output;
for example,
it is fine if you produce a partial trace of the parse before the error.
cs241.slr
course tool can be used to augment a CFG component
and generate its SLR(1) DFA (TRANSITIONS component and
REDUCTIONS component). You can use this tool to generate
additional test inputs for your parser.
In this problem, you will modify your solution to the previous problem to parse WLP4 programs, and to produce a representation of a parse tree as output.
The input no longer contains a CFG or parsing DFA, and is no longer formatted into components. Instead, the input format is simply the output format of the WLP4 scanner.
The output format of the parser is also different. You should construct a parse tree for the program, in which the value stored at each node in the tree is either the production rule used to expand the nonterminal at the node (if it is an internal node), or the token kind and lexeme (if it is a leaf node). If the parse is successful, the print the node values, one per line, using a left-to-right preorder traversal of the tree:
The result of this preorder traversal is a .wlp4i (WLP4 Intermediate) file.
In this problem, it is possible that the parse will be unsuccessful. This happens if there is no transition on the kind of the next token when the parsing DFA tells you to shift. If this occurs, output a single line consisting of the string ERROR at k (terminated with a line feed) to standard error, where k is one greater than the number of WLP4 tokens that were successfully read from the input prior to the error.
Note that BOF and EOF are not part of the input, so they do not contribute to the value of k in this problem. Note also that "one token" means a kind and lexeme pair; so each line of the input contains one token.
Unlike in some earlier assignments, where your error message simply had to contain "ERROR", in this problem your error message must exactly match the required message. Extraneous output (such as debugging output) on standard error will cause you to fail some Marmoset tests.
.END
line..END
line.For the following input:
INT int WAIN wain LPAREN ( INT int ID a COMMA , INT int ID b RPAREN ) LBRACE { RETURN return NUM 42 SEMI ; RBRACE }
The output should be:
start BOF procedures EOF BOF BOF procedures main main INT WAIN LPAREN dcl COMMA dcl RPAREN LBRACE dcls statements RETURN expr SEMI RBRACE INT int WAIN wain LPAREN ( dcl type ID type INT INT int ID a COMMA , dcl type ID type INT INT int ID b RPAREN ) LBRACE { dcls .EMPTY statements .EMPTY RETURN return expr term term factor factor NUM NUM 42 SEMI ; RBRACE } EOF EOF
wlp4parse
to see if it is correct.