CS 241 — LR1 File Format (.lr1)

A .lr1 file has three components:

a context-free grammar
the states, transitions, and reduce actions of the LR(1) DFA
a sequence of terminal symbols to be parsed

Remark: We only learn how to construct SLR(1) DFAs in this course, and not general LR(1) DFAs, but the algorithm for parsing with them is the same. This file format can represent general LR(1) DFAs as well as SLR(1) DFAs, so we write LR(1) rather than SLR(1) throughout this page.

Context-free Grammar

Same as the CFG File Format with no derivations. The grammar must be augmented; that is, the start symbol must occur as the LHS of exactly one rule that begins and ends with a terminal.

LR(1) DFA

The LR(1) DFA representation has three components:

An integer s -- the number of states in the LR(1) DFA. States are numbered 0 to s-1. State 0 is the start state.
An integer t -- the number of shift and reduce actions encoded in the LR(1) DFA.
t lines, each containing one of:
- state terminal reduce rule (Rules (productions) are numbered from 0 in the order they appear in the CFG.)
- state₁ symbol shift state₂ (where symbol is either a terminal or a non-terminal)

A line of the form state terminal reduce rule is read as:
"If the DFA is in state, and terminal is first in the unread input, reduce by rule number rule."
Here terminal represents the lookahead symbol that is sometimes needed to resolve shift-reduce or reduce-reduce conflicts. (There will be no conflicts in the LR(1) DFAs used on the assignment.)

A line of the form state₁ symbol shift state₂ is read as:
"If the DFA is in state₁, shift symbol and transition to state₂."
Shift actions exist for both terminals and nonterminals; the shift actions involving nonterminals are used in the process of reducing by a rule.

Note that there is no reduce action for rule 0 in the LR(1) DFA, even though you are required to output the rule corresponding to this final reduction in your parser. Rule 0 will always be the unique rule which has the start symbol on the LHS.

Sequence to be Parsed

One or more tokens representing a single sequence to be parsed, with each pair of tokens separated by whitespace, including newlines.

Your parser must support inputs where tokens are separated by newlines. For example, the sample sequence BOF id - ( id ) - id EOF in the file below could also be written as:

BOF
id - ( id ) - id
EOF

Or even:

BOF 
id
-
(
id
)
-
id
EOF

Example .lr1 File (see bubble diagram)

6
BOF
EOF
id
-
(
)
3
S
expr
term
S
5
S BOF expr EOF
expr term
expr expr - term
term id
term ( expr )
11
28
8 EOF reduce 2
9 - reduce 4
7 - shift 1
1 id shift 2
6 ( shift 3
6 term shift 4
10 EOF shift 5
2 - reduce 3
4 ) reduce 1
3 id shift 2
4 EOF reduce 1
2 ) reduce 3
0 BOF shift 6
8 - reduce 2
2 EOF reduce 3
3 expr shift 7
9 ) reduce 4
9 EOF reduce 4
4 - reduce 1
1 term shift 8
3 term shift 4
3 ( shift 3
10 - shift 1
6 id shift 2
8 ) reduce 2
1 ( shift 3
7 ) shift 9
6 expr shift 10
BOF id - ( id ) - id EOF