CS 241 — Fall 2021 — Assignment 5

Assignments for CS 241
← Assignment 4 Assignment 5 Assignment 6 →
Friday, Feb 11th at 5:00 pm Friday, Feb 18th at 5:00 pm Friday, Mar 4th at 5:00 pm
P1P2P3P4P5P6P7P8P9

In this assignment, you will write a scanner for WLP4, and get some practice working with context-free grammars, derivations, and trees.

Some Advice

Students sometimes find the scanner (Problem 1) more difficult than the rest of the assignment. It is given as the first problem because this follows the order that the topics are presented in the course. However, you may wish to attempt the other problems first, particularly the Context-Free Grammar practice problems (Problems 2 through 7) which are much easier than the programming questions.

We consider it very important for learning purposes to attempt the scanner problem to the best of your ability, but it is not a prerequisite for future assignments. If your scanner ends up unfinished or faulty, you will be able to use a scanner that we provide to complete future assignments.

On the other hand, Problem 8 about constructing and traversing trees is more important than it might seem. We will be working with trees heavily in the later stages of the compiler. To get started on Assignment 7, you will need to write some code that is essentially a more complicated version of Assignment 5 Problem 8. So you will need to solve this problem eventually. We recommend solving it now.

Marmoset Notes

Problems 2, 3, 5, 6 and 7 only have a public test, no release tests.

For this assignment, secret tests are worth 30% of the Marmoset test marks.

Part I. Scanning

Problem 1 — 50 marks of 100, 25 secret marks (filename: wlp4scan.rkt or wlp4scan.cc)

Write a scanner (lexical analyzer) for the WLP4 programming language. Your scanner should read a WLP4 program from standard input and identify the tokens, in the order they appear in the input. For each token, your scanner should compute the kind of the token and the lexeme (the string of characters making up the token), and print it to standard output, one line per token.

For example, the correct output for this WLP4 program

//
// WLP4 program with two integer parameters, a and b
//   returns the sum of a and b
//
   int wain(int a, int b) {
     return a + b;   // unhelpful comment about summing a and b
   }
is:
INT int
WAIN wain
LPAREN (
INT int
ID a
COMMA ,
INT int
ID b
RPAREN )
LBRACE {
RETURN return
ID a
PLUS +
ID b
SEMI ;
RBRACE }

If the input cannot be scanned as a sequence of valid tokens (possibly separated by white space as detailed in the WLP4 language specification), your program should print a message containing the string ERROR (in upper case) to standard error. We recommend that you try to make the error message informative.

A reference implementation of the scanner called wlp4scan is available after running source /u/cs241/setup. When testing your program, you can compare its output to wlp4scan's to make sure it is correct. Run it as follows:

    wlp4scan < input.wlp4 > tokens.txt

Hint: You may wish to modify the starter code from Assignment 3 to solve this problem. The scanner in the starter code uses the Simplified Maximal Munch algorithm, together with a DFA for MIPS assembly language tokens, to repeatedly recognize the longest prefix of a string that is a token and extract the tokens. You should be able to replace the DFA with one which recognizes WLP4 tokens.

Part II. Context-Free Grammars

You are to create context-free grammars and derivations represented as CFG files each of which is a textual reprentation of a CFG, optionally followed by a textual representation of several derivations. The Linux environment provides a tool cs241.cfgcheck which checks the validity of the CFG and derivations.

Problem 2 — 2 marks of 100 (filename: a5p2.cfg)

Create a file representing the CFG with the following production rules:
S → BOF A EOF
A → x
A → A x

Problem 3 — 2 marks of 100 (filename: a5p3.cfg)

Add derivations for the following strings to the CFG file created for Problem 2:
BOF x EOF
BOF x x EOF
BOF x x x EOF

Problem 4 — 5 marks of 100 (filename: a5p4.cfg)

The following production rules yield an ambiguous CFG:
S → BOF A EOF
A → x
A → A x A
Find a string for which there are two or more distinct leftmost derivations. Compose a CFG file representing the CFG and two different derivations for your string.

Problem 5 — 3 marks of 100 (filename: a5p5.cfg)

Add a derivation to this context-free grammar for the following sequence of symbols:
BOF id EOF

You can remove the existing derivation in the file, but it is not necessary. Marmoset will accept either replacing the existing derivation, or adding a new derivation.

Problem 6 — 3 marks of 100 (filename: a5p6.cfg)

Add a derivation to this context-free grammar for the following sequence of symbols:
BOF ( id - ( id - id ) ) - id EOF

You can remove the existing derivation in the file, but it is not necessary. Marmoset will accept either replacing the existing derivation, or adding a new derivation.

Problem 7 — 5 marks of 100 (filename: a5p7.cfg)

Add a derivation to this augmented grammar for WLP4 for the following (augmented) sequence of WLP4 tokens:
BOF INT WAIN LPAREN INT ID COMMA INT ID RPAREN LBRACE RETURN ID PLUS ID STAR ID SEMI RBRACE EOF

Part III. Trees and Expressions

Problem 8 — 20 marks of 100 (filename: traverse.rkt or traverse.cc)

Write a Racket or C++ program that reads a pre-order traversal of a non-empty tree from standard input and prints the corresponding post-order traversal for that tree. Each line of both the input and output will consist of two non-negative integers:

<NODE-VALUE> <NUMBER-OF-CHILDREN>

For example the following input:

1 3
2 2
5 0
6 0
3 1
7 0
4 4
8 0
9 0
10 0
11 1
12 0

corresponds to this tree.

The output of the program given the above input would be:
5 0
6 0
2 2
7 0
3 1
8 0
9 0
10 0
12 0
11 1
4 4
1 3

The inputs given by Marmoset will be non-empty trees and they will be valid (i.e., there will not be "missing" children or random nonsense that doesn't follow the format described above). To get full marks, your solution should be reasonably efficient in terms of both time and memory usage. There is an efficiency test involving a very large tree that is worth 5 marks (25% of the marks for this question).

The purpose of this problem is to prepare you for working with trees in future assignments.
Therefore, you should solve this problem by building a tree data structure out of the input, and then recursively traversing the tree to produce the output.
It is possible to solve this problem by simply manipulating the input with a stack, and this might even be faster than actually building a tree, but this misses the point of the problem and is probably a waste of your time.

Note that the trees in this problem are not binary trees, or even n-ary trees for some n; each node can have an unbounded number of children. Keep this in mind when deciding how to represent the tree as a data structure.

Submit a file called traverse.rkt or traverse.cc containing the Racket or C++ source code for your program.

Hint: If using C++, you may be tempted to avoid using pointers when designing your tree data structure. You may think that this makes things "easier" since you don't have to worry about memory management. It does not make things easier. It simply introduces a different memory management problem: you have to avoid making unnecessary copies of your trees in memory. Copying pointers is cheap but copying entire tree structures wastes a lot of time and space. Accidental copying like this is the most common cause of inefficiency on this question.

Unless you understand C++ move semantics fairly well, you might want to just use pointers. You can use raw pointers or smart pointers, but if using smart pointers you should have a good understanding of them. Unique pointers (std::unique_ptr) will work but require some care to use correctly. Shared pointers (std::shared_ptr) are not appropriate for this problem, since each child of a tree should be separate from all other children and should not share memory with them. Shared pointers will "work" but maintaining the reference counts can potentially waste a lot of time.

Problem 9 — 10 marks of 100, 5 secret marks (filename: galaxy.rkt or galaxy.cc)

Write a Racket or C++ program that reads a CFG from standard input, which is assumed to contain the same grammar as for Problems 5 and 6, but with the example derivation replaced by another valid derivation for the same grammar.

Assume that each occurrence of the symbol id represents the value 42, that each occurrence of - represents subtraction, and that operations associate from left to right, except where overridden by parentheses. Output a single line giving the value of the derived expression.

The correct output for the sample grammar and derivation is a single line containing -42 followed by a newline.

Notes:

Click here to go back to the top of the page.