CS241 Tool Summary

Various tools are used in CS241. Historically, they were all available only on the student Linux server. They are now slowly being duplicated as browser/web-based versions, to avoid overloading the servers. The web-based versions of these tools are available here

All of these tools are available on the student Linux servers (linux.student.cs.uwaterloo.ca), by running source /u/cs241/setup.

After you have run the setup command, simply type the name of the tool to use it. The tools do not get copied to your user folder or anything like that, the setup script simply does some configuration that lets you run the tools from anywhere.

The setup command only lasts for one login session. If you do not want to manually enter it every time you log in, put the command in the hidden .bash_profile file in your home directory, or in .profile if .bash_profile does not exist.

List of CS241 Tools

Tool Primarily Used In Purpose
marmoset_submit All Assignments Command line submission to Marmoset
cs241.binview Assignment 1, 2 & 3 Binary file viewer
cs241.wordasm Assignment 1 ARM64 assembler (limited to .8byte directives)
cs241.binasm Assignment 1, 2, 3, 7 & 8 ARM64 assembler
cs241.arm64emu Assignment 1, 2, 7 & 8 ARM64 emulator
cs241.dfa Assignment 3 DFA checker and recognizer
cs241.smm Assignment 4 Run Simplifed Maximal Munch using a DFA
cs241.wlp4c Assignment 4, 6, 7, & 8 WLP4 semantic analyzer and compiler
cs241.wlp4scan Assignment 4, 5, 6, 7 & 8 WLP4 scanner
cs241.cfgcheck Assignment 5 CFG syntax checker
cs241.slr Assignment 5 SLR(1) DFA generator
cs241.wlp4parse Assignment 5, 6, 7 & 8 WLP4 parser
cs241.wlp4type Assignment 6, 7 & 8 WLP4 semantic analyzer
cs241.linkasm Assignment 7 & 8 ARM64 assembler (produces linkable ARMCOM files)
cs241.linker Assignment 7 & 8 ARMCOM linker
cs241.striparmcom Assignment 7 & 8 Strips ARMCOM metadata

Tool Information

marmoset_submit

Usage: marmoset_submit COURSE PROJECT FILE1 [FILE2] ...

This tool lets you submit to Marmoset from the command line. For this course the COURSE value should always be CS241. The PROJECT field should be filled with the name of the problem you are submitting to, like a4p2 or a9bonus. Problem names can be viewed on the Marmoset web interface. When using this tool the problem names must exactly match the name shown on the Marmoset web interface, respecting capitalization. After PROJECT, simply list out all files you wish to submit for the chosen problem. A sample is shown below:

        $ marmoset_submit CS241 a1p5 self-modifying.hex
      

For many problems you will just be submitting a single file, but you can provide multiple files on the command line. For example, if your solution for a question is split into several modules, you can submit those modules together as long as your main required executable is still present. In this course, even if the assignment only specifies one particular file name, you are allowed to submit other files alongside the one with that name. For C++ programs, Marmoset will simply pass all .cc files you submit to the compiler; a Makefile is not needed, and in fact any Makefile you submit will be ignored. (This is specific to the way CS241's Marmoset scripts are set up and may not apply to other courses.)

Note that this is just a wrapper script for the command /u/cs_build/bin/marmoset_submit. You can use the /u/cs_build/bin/marmoset_submit command directly without running the CS241 setup command. You can also use marmoset_submit (whether through our wrapper script or directly using the cs_build command) in other courses that use Marmoset by just replacing CS241 with that course name.

cs241.binview

Web version

Usage: cs241.binview [OPTION]... FILE

This tool lets you view the binary data stored in files in a human-readable form. If you view a file with cat or open it in a text editor like vim, your terminal will attempt to display the binary data as ASCII (or maybe Unicode) characters. For a text file, this is exactly what you want, but for something like ARM64 machine code it will probably look like gibberish.

The cs241.binview tool formats and prints the binary data as a sequence of 0 and 1 characters. It also has options for printing in decimal, hexadecimal, and ASCII.

Example

Suppose the file program.bin contains the machine code version of the following ARM64 program (translated using an assembler like cs241.binasm):

        add x1, x2, x3
        br x31
      

You can use cs241.binview with the --all option to view the machine code in binary, decimal, hexadecimal and ASCII form.

        $ cs241.binview --all program.bin
        #65      #96      #35      #139    
        0x41     0x60     0x23     0x8B    
        A        `        #        \x8B    
        01000001 01100000 00100011 10001011

        #224     #3       #31      #214    
        0xE0     0x03     0x1F     0xD6    
        \xE0     ^ETX     ^US      \xD6    
        11100000 00000011 00011111 11010110
      

If --all or other similar options are not used, the default is to produce binary output only.

Notes:

  • Run cs241.binview without arguments to get a help message showing all the options.
  • When piping data to cs241.binview, you must pass the argument - to tell cs241.binview to read from standard input, like this: cs241.binasm < input.asm | cs241.binview -
  • Aside from viewing things like machine code, this tool can also be used to check if the output of a program you write has unnecessary characters (like whitespace or "null" bytes) that normally wouldn't show up with cat.
  • This program has similar functionality to the standard Unix tool xxd. It is basically a version of xxd where the output format and options are more tailored to the needs of CS 241.

cs241.wordasm

Web version

Usage: cs241.wordasm < PROGRAM.hex

This is a very restricted ARM64 assembler that only supports the .8byte directive. No other instructions or features like labels are available.

The tool reads ASCII text from standard input. The text should consist of a series of lines, one line for each 64-bit word of binary output. Each line consists of the string ".8byte" followed by a string giving either the hexadecimal (prefixed with "0x") or decimal (no prefix) representation of the 64-bit word. A semicolon can be used to start a single-line comment.

Example

Suppose the file cs241.hex contains the following text:

        .8byte 0x6f72203134325343   // "CS 241 ro"
        .8byte 0x000000000a736b63   // "cks(newline) (4 null characters)"
      

This is the encoding of the string "CS241 rocks" (with a newline at the end).

Redirecting this file into cs241.wordasm gives the following output on the terminal:

        $ cs241.wordasm < cs241.hex
        CS241 rocks
      

Because of the little-endianness of ARM64, the characters in each number are byte-encoded in reverse order. Hence, when converting from the 8 byte value to the corresponding string, the characters must be read in reverse order, as shown below.

        IN : 0x6f72203134325343
        ----------------------
        Byte: 6f 72 20 31 34 32 53 43
        Char: o  r     1  4  2  S  C
        ----------------------
        OUT: CS241 ro
      

You will mostly use cs241.wordasm to write ARM64 machine code programs, but as this example shows, it can be used to create any kind of binary data (as long as the length in bytes is a multiple of 8). In this example, ASCII text was produced as output. When producing machine code output, you will likely want to redirect the output to a file, like this:

        cs241.wordasm < input.hex > output.bin
      

cs241.binasm

Web version

Usage: cs241.binasm < PROGRAM.asm > OUTPUT.bin

This is an assembler that takes an ARM64 assembly language program as input (from standard input) and produces an ARM64 machine language program in binary as output (on standard output). Unlike cs241.wordasm, all the instructions and features of the CS241 ARM64 dialect are supported.

Example

Aside from having a much more flexible input format, the usage is identical to cs241.wordasm. Suppose input.asm contains the following text:

        add x1, x2, x3
        br x30
      

You can assemble this program directly as follows. It is not necessary to convert the program to .8byte directives like with cs241.wordasm.

        cs241.binasm < input.asm > output.bin
      

cs241.arm64emu

Web version

Usage: cs241.arm64emu [-a] [-i] PROGRAM.bin [x0 value] [x1 value] ...

This is an emulator for running compiled ARM64 machine language programs. Once you have produced a machine language program using one of our assemblers, you can run it with this tool.

To use the tool, give it the filename of the compiled machine language program you want to run as the first command line argument. By default, it will load the program at address 0x00. You can load numerical inputs into the emulator by passing in additional arguments, which will be loaded into the ARM64 registers before the program runs. For example, running

        $ cs241.arm64emu program.bin 10 3 5
      

will run the program program.bin with:

  • The value 10 in register x0
  • The value 3 in register x1
  • The value 5 in register x2

For longer inputs, the -a option stores the input arguments as an array within memory, stores the address of the start of the array in x0 and the length of the array in x1.

The emulator supports reading from standard input. The -i option enables interactive mode, which allows you to step through your code's execution and debug any issues.

Example

Here we have prepared a simple ARM64 program in the format expected by cs241.wordasm. The source code is stored in the file addvalues.hex, and it adds the values in x0 and x1 and stores it in x0. We run it through cs241.wordasm to produce the binary machine language version. We then run the program with cs241.arm64emu:

        $ cat addvalues.hex
        .8byte 0xD61F03C08B216000 // add x0, x0, x1
                                  // br x30

        $ cs241.wordasm < addvalues.hex > prog.bin
        $ cs241.bin64emu prog.bin 4 5

        x0:  0x9		x16: 0x0
        x1:  0x5		x17: 0x0
        x2:  0x0		x18: 0x0
        x3:  0x0		x19: 0x0
        x4:  0x0		x20: 0x0
        x5:  0x0		x21: 0x0
        x6:  0x0		x22: 0x0
        x7:  0x0		x23: 0x0
        x8:  0x0		x24: 0x0
        x9:  0x0		x25: 0x0
        x10: 0x0		x26: 0x0
        x11: 0x0		x27: 0x0
        x12: 0x0		x28: 0x0
        x13: 0x0		x29: 0x1000000
        x14: 0x0		x30: 0xfffffe4
        x15: 0x0		sp:  0x1000000
        pc:  0xfffffe4		instr: hlt
        flags: vczn
        Program exited normally.
      

The emulator prints a list of register values to standard error when the program stops (whether it stops normally or crashes). Notice the following about the register values:

  • We ran the program with the value 4 in x0 and 5 in x1.
  • Adding those two values together produces 9, which is also 0x9 in hexadecimal.
  • At the final state we see that x0 contains 0x9, as desired.

Error Messages

Here are brief explanations of some the error messages you might see from this tool. This is not necessarily a comprehensive list.

  • Unrecognized instruction [flags]: These basically mean that the ARM64 emulator tried to execute something that is not a valid instruction. Whether it says "instruction" or "instruction flags" depends on exactly what the 32 bits are that it tried to execute. If the opcode is invalid, then it's an "unrecognized instruction". If the opcode is valid but the flags don't correspond to any known instruction, then the error will be "unrecognized instruction flags" A very common reported reason for this is that you forgot to assemble your program, and the emulator is attempting to execute the ASCII text source code instead of machine code. However, it can also happen in actual programs if you jump or branch to a location containing non-instruction data, or if you accidentally overwrite part of your running code with non-instruction data.
  • Misaligned memory access: This means the emulator tried to access a memory address that is not word-aligned (that is, not a multiple of 4). Usually the "access" is a store or load instruction (ldur, stur, ldr), but this can also happen if you branch (b, b.cond,br, blr) to an address that isn't a multiple of 4.
  • Out of bounds memory access: The ARM64 emulator only allows your program to access a certain area of memory, from 0x00000000 to 0x00ffffff (inclusive). Accessing memory addresses at 0x01000000 or larger gives this error. Because the stack pointer (x31 usually) starts at 0x01000000, this error is sometimes (but not always) related to incorrect stack management. As with "misaligned access", the "memory access" that causes the error could be a store or load, but it could also be a branch.
  • Division by zero: You tried to divide by zero in your program. Check that any registers that you use as divisors don't get set to 0, and/or are initialized to non-zero values before dividing another number.
  • std::invalid_argument... This error does not happen when running the ARM64 program itself, but rather when entering numerical values into the simulator. Entering anything aside from decimal integers will result in the program crashing.

cs241.dfa

Web version

Usage: cs241.dfa < FILE.dfa

This tool reads a file in the DFA file format from standard input, and checks the file for errors. If no errors are found, the strings in the input section of the file are processed with the DFA. The tool outputs one line per input string, containing the string followed by "true" if the string was accepted and "false" otherwise.

cs241.smm

Web version

Usage: cs241.smm < FILE.smm

This tool reads a file in the SMM file format as its command line argument. As with cs241.dfa, it checks the file for errors. If no errors are found, then characters are tokenized according to Simplified Maximal Munch using the provided DFA file. Characters are first read from the input section (which is optional) and then from standard input until EOF. Each time a token is accepted, its lexeme is printed to standard out, followed by a newline. Spaces and newlines are printed using the representation used in the SMM DFA format.

cs241.wlp4c

Web version

Usage: cs241.wlp4c < PROGRAM.wlp4 > OUTPUT.bin

This tool reads a WLP4 program from standard input and produces on standard output a compiled ARM64 machine language program. The compiled program can be run directly using cs241.arm64emu.

The input should be WLP4 source code like below, not the result of the scanning or parsing phase. The cs241.wlp4c tool will do the scanning and parsing itself.

        int main(int a, int b) {
          return a;
        }
      

Note that cs241.wlp4c performs semantic analysis and will reject programs that do not follow the name and type rules. Thus you can use cs241.wlp4c to check the semantic validity of a WLP4 program.

cs241.wlp4scan

Web version

Usage: cs241.wlp4scan < PROGRAM.wlp4 > TOKENS.scan

This tool reads a WLP4 program from standard input similar to wlp4c, but only performs the scanning phase of compilation. The result is a list of lines representing WLP4 tokens, with each line containing a "kind" (the type of token) followed by a "lexeme" (the string the token corresponds to).

cs241.cfgcheck

Web version

Usage: cs241.cfgcheck < FILE.cfg

This tool reads (from standard input) a CFG component followed by a sequence of zero or more DERIVATION components. If the CFG and derivations are valid, it prints information about the CFG, and prints the rules used in each derivation. It then prints the terminal string arrived at by the derivation. If the CFG is malformed, or the derivation is invalid, an error message is printed, and the tool quits.

cs241.slr

Web version

Usage: cs241.slr < FILE.cfg > OUTPUT.slr1

This tool reads a CFG component representing a non-augmented grammar from standard input, augments the CFG, and produces a file representing the augmented CFG and its SLR(1) DFA, suitable for use as a test input for the SLR(1) parser you will write on Assignment 5.

The output contains, in order:

  • A CFG component representing the augmented version of the input CFG.
  • An INPUT component representing the augmented empty string "BOF EOF".
  • You may replace this with your own string to test your parser.
  • A TRANSITIONS component representing the transitions of the SLR(1) DFA.
  • A REDUCTIONS component representing the reducible items of the SLR(1) DFA.

This tool uses the SLR(1) DFA construction, so the tool will not work for all grammars. It will only work if the SLR(1) construction happens to produce a conflict-free DFA for the grammar. In particular, the tool will not work for ambiguous grammars.

cs241.wlp4parse

Web version

Usage: cs241.wlp4parse < TOKENS.scan > PROGRAM.wlp4

This tool takes a scanned WLP4 program (that is, a list of tokens produced by cs241.wlp4scan) on standard input, and produces on standard output a WLP4I file representing the parse tree for the program. A WLP4I file is essentially just a preorder traversal of the parse tree, so it encodes all the same information as the tree itself.

Typically you wouldn't call cs241.wlp4parse on its own, and you would instead pipe it the output from cs241.wlp4scan, since cs241.wlp4parse requires the input to be scanned first. For example, if program.wlp4 contains WLP4 source code (not scanned yet) you could run the following command:

        $ cs241.wlp4scan < program.wlp4 | cs241.wlp4parse > program.wlp4i
      

This avoids the need to create a temporary file to hold the scanned program.

cs241.wlp4type

Web version

Usage: cs241.wlp4type < PROGRAM.wlp4i > PROGRAM.wlp4ti

This tool can be used to convert a WLP4 Intermediate (.wlp4i) file to a WLP4 Typed Intermediate (.wlp4ti) file. The tool checks the program represented by the .wlp4i file for semantic errors, and if there are no errors, it outputs a .wlp4ti file representing the type-annotated parse tree.

To use cs241.wlp4type with WLP4 program source code, pass the source code through cs241.wlp4scan and cs241.wlp4parse first:

        cs241.wlp4scan < program.wlp4 | cs241.wlp4parse | cs241.wlp4type > program.wlp4ti
      

cs241.linkasm

Web version

Usage: cs241.linkasm < PROGRAM.asm > PROGRAM.com

This assembler has all the features of cs241.binasm and supports two additional directives: .import and .export. These can be used to import label definitions from assembled ARMCOM programs, or to export label definitions which allow other programs to import them. This allows the creation of ARM64 "libraries" which export useful procedures that other code can use.

This assembler produces ARMCOM files rather than plain machine code. If your program imports labels from other ARMCOM files, you will need to use cs241.linker to combine the ARMCOM file into a usable assembly.

Example

Suppose we have the following assembly files that use the .import and .export directives:

        $ cat import.asm
        .import label
        ldr x0, 8
        b 12
        .8byte label
        br x0
        $ cat export.asm
        .export label
        label:
        br x30
      

These programs don't do anything interesting, and are just to give an example of the syntax. We can assemble these programs as follows:

        $ cs241.linkasm < import.asm > import.com
        $ cs241.linkasm < export.asm > export.com
      

Using cs241.binasm to assemble these would produce an error since it does not support .import or .export.

Note that here we have simply assembled two separate programs that don't know anything about each other. If we tried to execute import.com it would get stuck in an infinite loop, since the label import has not been resolved, and unresolved labels default to value 0. To resolve the import we would have to use cs241.linker.

cs241.linker

Web version

Usage: cs241.linker PROGRAM.com [PROGRAM2.com] ... > LINKED.com

This tool links two or more ARMCOM files together into a single ARMCOM file. The files are provided as command line arguments, not using standard input. For example, we could link the ARMCOM files produced in the cs241.linkasm example as follows:

        cs241.linker import.com export.com > linked.com
      

Note that the order of linkage matters; the first file passed to the linker will be placed first in the linked program, and as such will be the main entrypoint.

cs241.striparmcom

(There is no web version of this tool. Instead, its functionality in the web version is included in cs241.linker, above)

Usage: cs241.striparmcom ADDRESS < PROGRAM.com > PROGRAM.bin

This tool reads an ARMCOM file from standard input and produces plain ARM64 code on standard output. It strips out the relocation and linking metadata, and relocates the program to run at the address specified by the command line argument. The ADDRESS in the above usage example should be replaced by a number; it does not mean the literal string "address".

Normally address 0 is used, since the ARM64 machine loads programs at address 0 by default, so typical usage would look like: cs241.striparmcom 0 < input.com > output.bin

The tool will produce information about the ARMCOM metadata on standard error. You can use this to programmatically compare the metadata for two ARMCOM files: sort the standard error results (since the armcom files may have the same metadata entries but different ordering) and run diff on the sorted files. If you do not want to see this metadata information, you can redirect standard error to /dev/null as follows: cs241.striparmcom 0 < input.com > output.bin 2> /dev/null

Note that ARMCOM files can usually be directly executed without needing to strip the metadata. However, for technical reasons, the memory allocation ARMCOM module you will use in some assignments requires the ARMCOM metadata to be stripped to work correctly. That is the main reason to use this tool, aside from the aforementioned trick of using it to compare the metadata of two ARMCOM files.