MIPS Assembly Language (CS 241 Dialect)

Version 20220520.0

Introduction

MIPS Assembly Language is a textual human-readable representation of MIPS Machine Language. A program that translates MIPS Assembly Language to MIPS Machine Language is called a MIPS assembler.

MIPS Machine Language

A MIPS program consists of a sequence of 32-bit instruction words, whose meanings and encoding are described by the MIPS Reference Sheet. The location of the first word is defined to be 0; the location of the next word is 4; and so on. Traditionally, locations are given in hexadecimal notation. So, for example, the location of the 11th word would be 0x28 (the hexadecimal representation of 40).

The MIPS CPU can only interpret valid MIPS instruction words; however these words, being encoded in binary, are not easily manipulated by humans. For this reason we define an assembly language, which is easier for humans to understand and manipulate. There is a direct correspondence between assembly language statements and machine language instructions.

MIPS Assembly Language

A MIPS Assembly program is a Unix text file consisting of a number of lines. Each line has the general format

    labels instruction comment

Each of these components – labels, instruction, and comment is optional; a particular line may have all three, any two, any one, or none at all. The components, if they appear, must appear in the given order, e.g., labels cannot come after the instruction on a line.

Every line with an instruction specifies a corresponding machine language instruction word. Lines without an instruction are called null lines and do not specify an instruction word. That is, an assembly language program with n non-null lines specifies a machine language program with n words, in 1-1 ordered correspondence.

Labels

The labels component lists zero or more labels, each followed by a colon (:). A label is a string of alphanumeric characters, the first of which must be a letter of the alphabet. For example, fred123x is a valid label but 123fred is not.

A label appearing in the labels component is said to be defined; a particular label may be defined at most once in an assembly language program. Labels are case-sensitive; that is, fred and Fred are distinct labels.

The location of a line in an assembly language program is 4n, where n is the number of non-null lines preceding it. The first line therefore has location 0. If the first line is non-null, the second line has location 4. On the other hand, if the first line is null, the location of the second line is also 0. Note that the location of any non-null line is exactly the same as the location of the machine language word that it specifies. And all null lines immediately preceding it have the same location.

The value of a label is defined to be the location of the line on which it is defined. The value of a label corresponds to the memory address it corresponds to in the machine language program, if the machine language program were to be loaded at address 0.

Comments

A comment is any sequence of characters beginning with a semicolon (;) and ending with the line feed character (ASCII 0x0A) that terminates the line. Comments have meaning only to the reader; they do not contribute to the specification of the equivalent machine language program.

Instructions

An instruction takes the form of an opcode followed by one or more operands. There can be at most one instruction per line.

The opcode may be add, sub, mult, multu, div, divu, mfhi, mflo, lis, lw, sw, slt, sltu, beq, bne, jr, jalr, or the pseudo-opcode .word.

An operand may be:

a non-negative decimal integer denoted by a string of digits 0-9,
a negative decimal integer denoted by a minus sign (-) followed by a non-negative decimal integer,
a hexadecimal number denoted by 0x followed by a string of hexadecimal digits 0-9 or a-f or A-F (case does not matter),
a register denoted by a dollar sign ($) followed by a non-negative decimal integer, which must be in the range 0 to 31,
a label, which must be defined somewhere in the program (it can be defined before or after its use as an operand).

An instruction will also often contain punctuation characters such as commas or parentheses which separate the operands. Whitespace between opcodes, operands and punctuation is ignored. For example, add $1, $2, $3 is equivalent to add$1,$2,$3.

Leading zeroes are allowed in numeric values and do not change the meaning of the number. This includes register numbers, e.g., $007 is allowed.

The number of operands, meaning of operands, allowed types of operands, allowed values or ranges for operands, and required punctuation generally differ depending on the instruction.

Operand Format — add, sub, slt, sltu

These opcodes all take three register operands separated by commas. For example:

   add $1, $2, $3

The first operand is $d (the destination register) as specified in the MIPS Reference Sheet. The second and third operands are $s and $t respectively.

So in the example above we have d=1, s=2, and t=3, and the 5-bit representations of these values are encoded in the corresponding machine instruction.

For these opcodes and all other opcodes that take register parameters, the register numbers must be between 0 and 31, inclusive.

Operand Format — mult, multu, div, divu

These opcodes take two register operands corresponding to $s and $t. For example

   mult $4, $5

specifies that s=4 and t=5 are encoded in the instruction word. $d is not used and is encoded as 0 in the instruction word.

Operand Format — mfhi, mflo, lis

These opcodes have a single register operand, $d. For example:

   lis $6

d=6 is encoded in the instruction word, while $s and $t are not used and are encoded as 0 in the instruction word.

Operand Format — lw, sw

These opcodes have two register operands, $s and $t, and in addition an immediate operand, i. They have the following syntax:

   opcode $t, i($s)

For example,

   lw $7, 8($9)

The parameters $s and $t are registers, and i may be a decimal number (possibly negative) or a hexadecimal number. Note that i cannot be a label or a register.

If specified in decimal, i must be in the range -32768 through 32767, and it is encoded as a 16-bit two's complement integer.

If specified in hexadecimal, i must not exceed 0xffff when viewed as a non-negative base 16 number. The hexadecimal value should be interpreted as representing a 16-bit binary sequence and that sequence should be encoded directly.

Operand Format — beq, bne

These opcodes take three operands: registers $s and $t, and an immediate operand i. They have the following syntax:

   opcode $s, $t, i

For example,

   beq $10, $11, 12

The parameters $s and $t are registers, and i may be a decimal number (possibly negative), a hexadecimal number, or a label.

If specified in decimal or hexadecimal, the requirements on the range of i and encoding of i are the same as for the lw and sw opcodes.

If i is a label, the value (labelValue-PC)/4 is encoded as a 16-bit two's complement integer. Here labelValue is the value of the label, and PC is location of the beq or bne instruction plus 4 (that is, the location where the program counter would be when the branch instruction is executed – one word after the instruction itself). The value (labelValue-PC)/4 must be in the range -32768 through 32767 (otherwise it cannot be encoded in 16-bit two's complement).

Operand Format — jr, jalr

These opcodes have a single register operand, $s. For example:

   jalr $13

s=13 is encoded in the instruction word, while $d and $t are not used and are encoded as 0 in the instruction word.

The difference between these opcodes and mfhi, mflo, lis is only in the encoding. For jr and jalr, the single register parameter corresponds to $s, rather than $d.

Operand Format — .word

.word is not a true opcode, as it does not necessarily encode a MIPS instruction at all. It is a directive for the assembler, telling the assembler to encode a 32-bit word at the location of the directive.

.word has one operand i, which is either a number or a label.

If i is a decimal number, it must be in the range -2³¹ through 2³²-1 (that is, the union of the ranges for signed two's complement and unsigned 32-bit integers).
If the value of i is non-negative, it is encoded as a 32-bit unsigned integer. If the value of i is negative, it is encoded as a 32-bit two's complement integer.
If i is hexadecimal, it must not exceed 0xffffffff when viewed as a non-negative base 16 number.
Its value is interpreted as representing a 32-bit binary sequence and that sequence is encoded directly.
If a label is used for i, its value is encoded as an 32-bit unsigned integer.

Remark: We will not test whether your assembler performs range checking for label operands of .word. This is because it would require a program that is several gigabytes in size to create a label value that is out of range. Your assembler should perform range checking for all other operand types mentioned in this document (including label operands for branch instructions, and integer operands for .word).