Assignments for CS 241 | ||
---|---|---|
← Assignment 1 | Assignment 2 | Assignment 3 → |
Friday, Oct 4 at 5:00 pm | ||
P1 • P2 • P3 • P4 • P5 |
On this assignment, you will create DFAs for certain regular languages, write a tool that determines if a DFA recognizes a string, then extend this tool to a Simplified Maximal Munch scanner.
You will write DFAs in the cs241 DFA file format, and you will use the cs241.DFA
tool to test your DFAs in problems 1 and 2, and optionally in problems 3, 4, and 5.
The cs241.DFA
(web version) tool reads a DFA file and for a given test input (specified in the DFA file), the DFA tool tests the input against the DFA.
You will submit your solutions to Marmoset, which will run your solution against some automated tests. Each problem specifies a required filename that you must use for your Marmoset submission.
Note: From this problem onwards for the remainder of the course, when an assignment uses the description "release+secret", what this means is that release tests are worth no marks, but each release test has a similar (but not identical) secret test which is worth marks. The goal of release+secret testing is to ensure that you test for yourself and don't overspecialize to our test cases. When marks are described just as "secret", this means that there are secret test cases which are unrelated to release tests, and you must consider the full range of testing possibilities on your own for such marks, with no feedback.alfalfa.dfa
)Write a DFA with the alphabet {a, l, f} recognizing the language of strings that contain the string "alfalfa".
DFAs are specified in the course specific DFA file format we have created for the course.
Here are some extra optional problems if you would like more practice with DFAs. You can submit the following files to A2P1 on Marmoset, and Marmoset will check that your DFA is correct.
These problems are not worth marks. There is no bonus for completing them.
catstar.dfa:
Construct a DFA with alphabet {a,c,t}
recognizing the language described by the regular expression (ca(a)*t)*
.cattac.dfa:
Construct a DFA with alphabet {a,c,t}
recognizing strings that do not start or end with a
, and for every a
in the string, the letter before the a
is different from the letter after the a
.endalfalfa.dfa:
Construct a DFA with alphabet {a, l, f}
recognizing the language of strings that end with alfalfa
.Submit all DFA files together in a zip file. If you submit only one, Marmoset will still run the tests but you will only get marks for the one you submitted.
64-to-512.dfa
)Write a DFA with the alphabet {-, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
that recognizes the language of integers made of numeric characters (optionally started with a negative sign), between the range -64 and 512 (inclusive), with no leading 0s, and no negative 0 (i.e. "-0")
dfa.cpp
or dfa.rkt
)Write a program that reads a DFA file from standard input, and for each string in the input section of the file, determine whether the string is accepted by the DFA. Print a single line of output (to standard output) for each string:
true
false
To simplify your task, starter code is provided that reads a DFA file from standard input. The starter code does not actually do anything, but comments are provided at the locations where the code has finished reading certain pieces of information. You should fill in these sections with your own code that sets up appropriate data structures and implements the DFA recognition algorithm.
If you use the c++ starter code, you can compile the code with:
g++ -g -O0 -fPIC -std=gnu++20 -Wall -Wextra -pedantic-errors -Wno-unused-parameter -o dfa dfa.cpp
And run the code with:
./dfa < input.dfa
If you use the racket starter code, you can run the code with:
racket dfa.rkt < input.dfa
You are not required to use the starter code, and you are allowed to change the starter code in any way you see fit. However, the starter code does not purposefully contain any bugs or issues that will cause you to lose marks on this assignment. We strongly recommend that you use the starter code.
Note that the DFA file format uses the special keyword .EMPTY
to denote the empty string. If .EMPTY
is given as input, then in your output, you should print .EMPTY (rather than nothing).
Your program does not need to check errors in the DFA file. Each DFA file in the tests follows the DFA file format, and the specified DFA is a valid DFA. Note that inputs in the .INPUT
section are not a part of the definition of DFA.
Your program should behave the same as the cs241.DFA
tool aside from error checking (cs241.DFA
does error checking, but you do not need to).
For this problem, Marmoset uses diff -Zb
to compare your output, so that small differences in whitespace are ignored (e.g., extra space at the end of the line).
smm.cpp
or smm.rkt
)In this problem, you will extend your DFA recognizer to a Simplified Maximal Munch scanner.
As in the previous problem, your program will read a DFA file from standard input. However, it should treat the input section as a single string, rather than multiple whitespace-separated strings. Your program should attempt to tokenize the string using the provided DFA and the Simplified Maximal Munch algorithm.
The program should print the tokens obtained, in order, one per line, to standard output, as the algorithm identifies each token. Each line should be terminated by a line feed (ASCII 0x0A
).
If an error is detected, no additional output should be sent to standard out. A message should be printed to standard error. The message must start with ERROR
. The error message does not need to be anything more than ERROR
, but you might find it useful to include more information.
For instance, if you are given the following input:
.ALPHABET $ 0-9 .STATES start $ $n! .TRANSITIONS start $ $ $ 0-9 $n $n 0-9 $n .INPUT $2$4$1$241
Then your output must be:
$2 $4 $1 $241
The input string to be scanned will not be empty (i.e., the empty string: "") or epsilon (i.e., .EMPTY
). It will contain at least one character.
You do not need to output the state of the DFA that you reached (which often represents the "kind" of the token). You only need to output the actual string (the "lexeme" of the token).
You must implement Simplified Maximal Munch, not some other scanning algorithm. The tests are not just looking for any valid tokenization, but that you produce the specific tokenization that Simplified Maximal Munch would produce, or an error if Simplified Maximal Munch cannot find a tokenization.
For this problem, there is no sample executable to compare your output with. Try testing with simple DFAs where you can easily understand how the string should be broken down into tokens.
Your program does not need to check errors in the DFA file. Each DFA file in the tests follows the DFA file format, and the specified DFA is a valid DFA.
As with the previous problem, Marmoset will use the -Zb
options with diff to allow for some leniency in terms of whitespace. However, you should not really be printing whitespace other than a single line feed after each token you output.
asm.cpp
or asm.rkt
)Updates and Clarifications
Write a program that scans MIPS assembly from stdin, and outputs its tokenization.
Your program will output a list of tokens, one on each line, with the token kind in all caps, a space character, then the lexeme, and the newline character (ascii 0x10
).
Your token types must be:
DIRECTIVE
: a period followed by one alpha character, and then followed by zero or more alphanumeric characters, such as .p124
and .word
LABEL
: an alpha character, followed by zero or more alphanumeric characters, follow by a colon. E.g., p124:
ID
: an alpha characters, followed by zero or more alphanumeric characters. E.g, p
, p124
HEXINT
: The characters 0x
followed by 1 or more characters between 0-9
, a-f
, or A-F
REG
: The $
followed by one or two decimal digitsDEC
: a single 0 character xor ( an optional -
, followed by a character between 1-9
, optionally followed by zero or more characters between 0-9
)COMMA
: The ,
characterLPAREN
: The (
characterRPAREN
: The )
characterThe input to your scanner will have each assembly instruction on only one line.
Your scanner should not crash in any ways, even if the input is not a valid assembly language program. Your assembler should not silently ignore errors in the input program. C++ solutions cannot leak resources.
Your scanner will be tested by both valid mips assembly, invalid mips assembly, and input that doesn't scan. Invalid mips assembly can scan successfully. For instance add $1, $2, $3, $4
will successfully scan. Your program does not need to validate that what it scanned is a valid program. It only needs to output the tokens that it scans, stopping either when the end of input is reached, or when an error is detected.
If the input to your program is
add $1, $2, $3, $4
Then the output of your program must be:
ID add REG $1 COMMA , REG $2 COMMA , REG $3 COMMA , REG $4
In the earlier Simplified Maximal Munch scanner problem, you were provided a scanning DFA, followed by a string to scan using the DFA. For this problem, the input to your program is simply mips assembly from stdin. That is, the input will not include a DFA for mips tokens. You will likely have to create such a DFA yourself and embed it in your program somehow.
If you want to reuse your DFA reading code from the earlier problems, you could embed it as a multi-line string literal (these are called raw strings in C++ and here strings in Racket) and then read it from a std::stringstream
(C++) or using open-input-string
(Racket).
You could also submit the DFA as a separate file (zip it together with your program) and have your program read it using file I/O. However, note that Marmoset blocks you from writing to files. This means you should open the file as read-only; otherwise opening the file could fail and cause unexpected errors in your code. In C++, you can use std::ifstream
to open a file as read-only. In Racket, open-input-file should work.
You can embed a DFA file into a c/c++ by using xxd
to generate a raw string:
xxd -i mips.dfa > mips.cpp
In your asm.cpp implementation, you can "read" the file with:
#include <iostream> #include <sstream> unsigned char mips_dfa[] = ...; // Copied from mips.cpp unsigned int mips_dfa_len = ...; // Copied from mips.cpp int main() { std::string foo((char *) mips_dfa, mips_dfa_len); std::stringstream in(foo); }
You can compile your implementation with the following flag
g++ asm.cpp -g -O0 -fPIC -std=gnu++20 -Wall -Wextra -pedantic-errors -Wno-unused-parameter -o asm
You can test your program with
./asm < input.asm