Assignments for CS 241 | ||
---|---|---|
← Assignment 3 | Assignment 4 | Assignment 5 → |
Friday, June 23rd at 5:00 pm | ||
P1 • P2 • P3 • P4 • P5 |
On this assignment, you will create DFAs for certain regular languages, write a tool that determines if a DFA recognizes a string, then extend this tool to a Simplified Maximal Munch scanner.
Then, you will be introduced to WLP4, a simple programming language that we will incrementally write a compiler for in this course. Your compiler will translate WLP4 source code into MIPS assembly language.
You will get some practice with WLP4 to better understand its various quirks and limitations, and then you will specialize your Simplified Maximal Munch scanner to work with WLP4 programs, completing the first component of the compiler.
For this problem, you are to construct two DFAs:
{a, l, f}
recognizing the language of strings that start with
alfalfa
.{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -}
recognizing the language of decimal integers between -128 and 255.
More precisely, a string is in the language
if and only if
it meets the
input requirements from
Assignment 1 Problem 5
except for the requirement that the string ends with a line feed.
To allow Marmoset to check the correctness of your DFAs, you must prepare them using a custom DFA file format we have created for the course.
Submit both DFA files together in a zip file. If you submit only one, Marmoset will still run the tests but you will only get marks for the one you submitted.
Here are some extra optional problems if you would like more practice with DFAs. You can submit the following files to A4P1 on Marmoset, and Marmoset will check that your DFA is correct.
These problems are not worth marks. There is no bonus for completing them.
{a,c,t}
recognizing
the language described by the regular expression (ca(a)*t)*
.{a,c,t}
recognizing
strings that do not start or end with a
, and for every a
in the string, the
letter before the a
is different from the letter after the a
.{a, l, f}
recognizing the language of strings that end with
alfalfa
.Write a program that reads a DFA file from standard input, and for each string in the input section of the file, determine whether the string is accepted by the DFA. Print a single line of output (to standard output) for each string:
true
.
false
.
To simplify your task, starter code is provided that reads a DFA file from standard input. The starter code does not actually do anything, but comments are provided at the locations where the code has finished reading certain pieces of information. You should fill in these sections with your own code that sets up appropriate data structures and implements the DFA recognition algorithm.
.EMPTY
to denote the empty string.
If .EMPTY
is given as input, then in your
output, you should print .EMPTY
(rather than
nothing) when asked to "print the string".
cs241.DFA
does error checking,
but you do not need to).
Compare your results to cs241.DFA
.
For this problem, Marmoset uses diff -Zb
to compare your
output, so that small differences in whitespace are ignored (e.g., extra
space at the end of the line).
For instance, on the student Linux system:
$ ./dfa < input.dfa > output.txt $ cs241.DFA < input.dfa > expect.txt $ diff -Zb output.txt expect.txt
An input/output example is given on the DFA file format page.
If you are given the following input:
.ALPHABET $ 0-9 .STATES start $ $n! .TRANSITIONS start $ $ $ 0-9 $n $n 0-9 $n .INPUT $2$4$1$241
Then your output should be:
$2 $4 $1 $241
For this problem, there is no sample executable to compare your output with. Try testing with simple DFAs where you can easily understand how the string should be broken down into tokens.
As with the previous problem, Marmoset will use
the -Zb
options with diff
to allow for some leniency in terms of whitespace.
However, you should
not really be printing whitespace other than
a single line feed after each token you output.
Starting from this assignment, you will begin writing a compiler that translates a C-like programming language called WLP4 into MIPS assembly language.
WLP4 is quite similar to a restricted subset of C, but it uses C++ style memory allocation (new/delete instead of malloc/free). The syntax is harshly restricted to make it easier to compile. For example:
More information about WLP4 is below:
Write a WLP4 procedure that computes the absolute value of an integer. You do not need to handle the case of -231 = -2147483648 (this integer has no positive counterpart in 32-bit two's complement).
Write a WLP4 procedure called tieBreak
that takes two distinct integers as parameters.
Write a WLP4 program that takes an array of integers
in the range -241 to 241 (inclusive) as input and
returns the integer that occurs most frequently.
If there is a tie for the most frequent integer,
break the tie using the rule from the tieBreak
procedure described in Warm-Up Exercise 2.
Your program should run in linear time (in the size of the input array). This likely means you will need to allocate a sufficiently large block of memory to store the number of times each integer occurs.
Suppose the input array is [1, 2, 2, 3, 3, 3]
. Your program should return 3.
Suppose the input array is [4, 4, 4, 2, 2, 2, -3, -3, -3]
.
There's a three way tie between 4, 2 and -3. Since 2 has the least absolute value,
your program should return 2.
Suppose the input array is [1, -1, 2, -2]
.
There's a four way tie between 1, -1, 2, and -2.
Among these, 1 and -1 tie for the smallest absolute value.
To break this tie, you choose the negative number, so you return -1.
Compile your program with wlp4c
, then run
the compiled code with mips.array
. Since
wlp4c
produces machine code directly,
you don't need to use cs241.binasm
.
Alternatively, on the student Linux system:
$ wlp4c < practice.wlp4 > practice.mips $ mips.array practice.mips
Please remember that the student Linux system is unreliable. Do your development and testing on your own system, using the web tools when possible, so that you won't be stuck when (rather than if) the student Linux servers go down.
The wain
function in your WLP4 program
should take an int*
as the first parameter
and an int
as the second parameter.
The first parameter will receive the value
that mips.array
places in $1
(a pointer to the array) and the second parameter
will receive the value it places in $2 (the size
of the array).
When the program terminates, the return value
of the wain
function will be stored
in $3.
Ensure the value in $3 is correct.
Write a scanner for WLP4. The scanner should read a WLP4 program from standard input and tokenize it, producing a sequence of lines, each representing one token of the program. Each line should be terminated by a line feed (ASCII 0x0A) and should contain the token kind (the all-caps name indicating the type of the token, such as ID, NUM, LPAREN, etc. as defined in the WLP4 Specification), followed by a single space, followed by the token lexeme (the actual string represented by the token).
Whitespace and comments are not considered tokens and should not be part of the output. However, tokens in the input might be separated by whitespace or comments.
If the input cannot be scanned into a sequence of valid tokens as per the requirements in the WLP4 Specification, your program must print an error message containing the string ERROR in all caps to standard error and exit normally.
For programs with a valid tokenization, your output should
be identical to that of the wlp4scan
tool.
open-input-string
(Racket).
std::ifstream
to open a file as read-only.
In Racket, open-input-file
should work.
If the input is this WLP4 program:
// // WLP4 program with two integer parameters, a and b // returns the sum of a and b // int wain(int a, int b) { return a + b; // unhelpful comment about summing a and b }
Then the correct output is:
INT int WAIN wain LPAREN ( INT int ID a COMMA , INT int ID b RPAREN ) LBRACE { RETURN return ID a PLUS + ID b SEMI ; RBRACE }
Compare your results to wlp4scan
. For
this problem, Marmoset again uses diff -Zb
, so that small
differences in whitespace are ignored (e.g., extra space at the end of the
line).
On the student Linux system:
$ ./wlp4scan < input.wlp4 > output.txt $ wlp4scan < input.wlp4 > expect.txt $ diff -Zb output.txt expect.txt
Be careful not to confuse wlp4scan
(the
course-provided reference implementation of the scanner) with your own program.
Perhaps name your own executable something different, like
mywlp4scan
.