Deadline |
P1A: Friday, October 6, 11:59pm P1B: Friday, November 3, 11:59pm |
Name on Marmoset | P1A, P1B |
To Submit |
P1A: mipsscan.cc OR mipsscan.rkt P1B: wlp4scan.cc OR wlp4scan.rkt
|
Marking Scheme |
P1A: 50% (14 release test marks, 36 secret test marks) P1B: 50% (22 release test marks, 28 secret test marks) |
For this project, you will write two scanners:
The lectures have covered enough material to complete both scanners. You need to understand DFAs, the simplified maximal munch algorithm, and the general concepts behind scanning. In terms of the course notes, you need to have read Chapter 3 in its entirety.
Both scanners are to read a sequence of ASCII characters from standard input, tokenize the sequence using simplified maximal munch, and output the tokenization, printing one token per line. Specifications for the valid tokens are given below.
ERROR
(in ALL CAPS)
to standard error.
ERROR
to standard error as above.
mipsscan
and
wlp4scan
, are available as
web tools and on
the Linux environment (after running /source/u/cs241/setup
).
You can use these to confirm what the expected output is for test inputs you create.
exit
function does not call destructors for stack-allocated objects, which may cause you to leak memory and fail tests. To exit after printing an error, throw an exception and catch it in main
, then exit normally with return
.
Or if the error happens in main
itself, you can just exit directly with return
.
error
function, and other runtime errors caused by mistakes in your code. To report errors, rather than using error
, print to standard error (e.g., with eprintf
) and then call exit
.
The following tables define the kinds and lexemes of the tokens your scanner must produce.
Restrictions on certain token kinds are also specified.
Restrictions:
Kinds, Lexemes, & Restrictions: MIPS Assembly
Kind
Description of Associated Lexemes
ID
Identifiers: Strings starting with an alphabetic character (
a-z
or A-Z
), followed by zero or more alphanumeric characters (a-z
or A-Z
or 0-9
).
DOTID
Dot-Identifiers: An
.
character followed by an ID lexeme.
LABELDEF
Label Definitions: An ID lexeme followed by a
:
character.
DECINT
Decimal Integers: Strings which optionally start with a
-
character, followed by one or more decimal digits (0-9
).
HEXINT
Hexadecimal Integers: Strings starting with
0x
, followed by one or more hexadecimal digits (0-9
or a-f
or A-F
).
REGISTER
Registers: A $
character followed by one or more decimal digits (0-9
). The numeric value of the string of decimal digits cannot exceed 31.
COMMA
Comma: A ,
character.
LPAREN
Left Parenthesis: A (
character.
RPAREN
Right Parenthesis: A )
character.
NEWLINE
Either a line feed character (ASCII 0x0a), or the two-character sequence carriage return, line feed (ASCII 0x0d, ASCII 0x0a).
When printing this token, do not print the lexeme. Print a line containing only the kind.
?WHITESPACE
Whitespace: A sequence of one or more space (ASCII 0x20) or tab (ASCII 0x09) characters.
?COMMENT
Comment: A ;
character, followed a sequence of zero or more ASCII characters that does not contain a line feed (ASCII 0x0a) or carriage return (ASCII 0x0d).
(In other words, once a ;
character is encountered, everything up to the next line feed or carriage return is a ?COMMENT.)
$
character must not
exceed 31.
A DFA file that recognizes the lexemes of valid
MIPS assembly tokens is provided. Note that the
DFA does not assign kinds to lexemes for you, nor verify
restrictions. It just recognizes strings that are valid lexemes.
Restrictions:
Kinds, Lexemes, & Restrictions: WLP4
Kind
Description of Associated Lexemes
ID
Identifiers: Strings starting with an alphabetic character (a-z or A-Z), followed by zero or more alphanumeric characters (a-z or A-Z or 0-9), not equal to any of the following keywords: "int", "wain", "if", "else", "while", "println", "return", "new", "delete", "NULL".
NUM
Numbers: Strings consisting of one or more decimal digits (
0-9
).
INT
Keyword
int
: The string "int".
WAIN
Keyword
wain
: The string "wain".
IF
Keyword
if
: The string "if".
ELSE
Keyword
else
: The string "else".
WHILE
Keyword
while
: The string "while".
PRINTLN
Keyword
println
: The string "println".
RETURN
Keyword
return
: The string "return".
NEW
Keyword
new
: The string "new".
DELETE
Keyword
delete
: The string "delete".
NULL
Keyword
NULL
: The string "NULL".
LPAREN
Left Parenthesis: The
(
character.
RPAREN
Right Parenthesis: The
)
character.
LBRACE
Left Brace: The
{
character.
RBRACE
Right Brace: The
}
character.
LBRACK
Left Bracket: The
[
character.
RBRACK
Right Bracket: The
]
character.
BECOMES
Equals: The
=
character.
PLUS
Plus: The
+
character.
MINUS
Minus: The
-
character.
STAR
Star: The
*
character.
SLASH
SLASH: The
/
character.
PCT
Percent: The
%
character.
AMP
Ampersand: The
&
character.
COMMA
Comma: The
,
character.
SEMI
Semicolon: The
;
character.
LT
Less Than: The
<
character.
GT
Greater Than: The
>
character.
LE
Less Than or Equal To: The string "<=".
GE
Greater Than or Equal To: The string ">=".
EQ
Equal To: The string "==".
NE
Not Equal To: The string "!=".
?WHITESPACE
Whitespace: A sequence of one or more of the following characters: space (ASCII 0x20), tab (ASCII 0x09), line feed (ASCII 0x0a) or carriage return (ASCII 0x0d).
?COMMENT
Comment: The string "//", followed a sequence of zero or more ASCII characters that does not contain a line feed (ASCII 0x0a) or a carriage return (ASCII 0x0d).
A DFA file for WLP4 lexemes is not provided.
If using C++, compile your MIPS scanner as follows:
g++ -g -Wall -std=c++17 mipsscan.cc -o mipsscanOr your WLP4 scanner as follows:
g++ -g -Wall -std=c++17 wlp4scan.cc -o wlp4scanIf your program uses other
.cc
files, you must include those in the compilation command as well.
In the examples below, if using Racket,
substitute ./mipsscan
with racket mipsscan.rkt
and
substitute ./wlp4scan
with racket wlp4scan.rkt
.
If you are using C++ and want to test with Valgrind,
add valgrind
before the name of your program.
For example: valgrind ./mipsscan
Suppose the file program.asm
contains the following data:
lis $4 .word 16 lw $3, 0xfffc($4) ; what does this do? jr $31If you run your MIPS scanner with this file, you should get the following output:
./mipsscan < program.asm ID lis REGISTER $4 NEWLINE DOTID .word DECINT 16 NEWLINE ID lw REGISTER $3 COMMA , HEXINT 0xfffc LPAREN ( REGISTER $4 RPAREN ) NEWLINE ID jr REGISTER $31 NEWLINE
Suppose the file program.asm
contains the following data:
jr $32If you run your MIPS scanner with this file, it should produce a message on standard error containing the string
ERROR
, in ALL CAPS.
You can confirm the message is being sent to standard error by
redirecting standard output to /dev/null
:
./mipsscan < program.asm > /dev/null ERROR: Invalid register number: $32The error message does not need to be informative, but it must contain
ERROR
in ALL CAPS
or else Marmoset won't be able to tell that you are producing an error.
Making your error messages informative
is still recommended because it can help you debug your program.
Suppose the file program.wlp4
contains the following data:
// Simple program that adds int wain(int a, int b) { // Just add the numbers together return a + b; }If you run your WLP4 scanner with this file, you should get the following output:
./wlp4scan < program.wlp4 INT int WAIN wain LPAREN ( INT int ID a COMMA , INT int ID b RPAREN ) LBRACE { RETURN return ID a PLUS + ID b SEMI ; RBRACE }
Suppose the file program.wlp4
contains the following data:
int factorial(int n) { return (n!); }If you run your WLP4 scanner with this file, it should produce a message on standard error containing the string
ERROR
, in ALL CAPS.
You can confirm the message is being sent to standard error by
redirecting standard output to /dev/null
:
./wlp4scan < program.wlp4 > /dev/null ERROR: Simplified maximal munch failed at this point: int factorial(int n) { return (n!)The error message does not need to be informative, but it must contain
ERROR
in ALL CAPS
or else Marmoset won't be able to tell that you are producing an error.
Making your error messages informative
is still recommended because it can help you debug your program.
You can use the reference implementations
mipsscan
and
wlp4scan
to see what the expected output should be for other inputs.
The diff
command is helpful for comparing your
output to the expected output.
Begin by downloading the following starter code (C++, Racket). The starter code for each language contains:
dfa-print.cc
/ dfa-print.rkt
)
defines a function that reads
a DFA file from an input stream/port and prints
information about the DFA.
dfa.h
and dfa.cc
)
or a single file for Racket (dfa.rkt
)
defining a string constant containing a DFA.
By default, the starter code is set up to print information about the DFA stored in the string constant.
Marmoset Note: When submitting to Marmoset, you must submit all files your program uses.
If submitting using the web interface, submit a ZIP file containing the files you want to submit.
If submitting using the command line, you can just type the names of the files to submit. For example,
if you renamed dfa-print.cc
to mipsscan.cc
, but your code
still uses dfa.h
and dfa.cc
, you would write:
/u/cs_build/bin/marmoset submit cs241 P1A mipsscan.cc dfa.h dfa.cc
Create a class or struct representing a DFA.
Using the function for printing information about a DFA
as a starting point,
write a function that reads a DFA file
and constructs an instance
of the DFA class or struct.
You can test it by replacing the string constant in
dfa.h
/ dfa.rkt
,
or by passing your own files to the function directly.
Using your DFA class or struct, write a "get next state" function which takes a state and a character as parameters. If there is an arrow leading out of the input state that is labelled with the input character, return the state pointed to by the arrow. Otherwise, return a value indicating there is no next state to go to.
Use the "get next state" function to implement a "simplified maximal munch" function, which takes a string (or an input stream/port), tokenizes it with simplified maximal munch, and returns a sequence of strings (token lexemes). Do not worry about token kinds or filtering out tokens at this stage.
You can test your implementation with the DFA provided in the starter code, the MIPS token lexeme DFA, and any other DFA files you have lying around.
Replace the DFA provided in the starter code with the MIPS token lexeme DFA. It is time to write the MIPS scanner.
Create a class or struct for tokens. Modify your "simplified maximal munch" function to return a sequence of tokens instead of a sequence of strings.
Now it is time to write the WLP4 scanner. Create a DFA for WLP4 tokens and replace the MIPS DFA with this new DFA.
Creating the WLP4 DFA may be a little tricky. If you are stuck, study the provided MIPS DFA and try to understand how it works.
Once you have the WLP4 DFA, modify the part of your "simplified maximal munch" function that checks restrictions on tokens, so that it checks the WLP4 restrictions instead of the MIPS restrictions.
Test your code thoroughly, and you're done!