MIPS Assembly Language is a textual human-readable representation of MIPS Machine Language. A program that translates MIPS Assembly Language to MIPS Machine Language is called a MIPS assembler.
A MIPS program consists of a sequence of 32-bit instruction words, whose meanings and encoding are described by the MIPS Reference Sheet. The location of the first word is defined to be 0; the location of the next word is 4; and so on. Traditionally, locations are given in hexadecimal notation. So, for example, the location of the 11th word would be 0x28 (the hexadecimal representation of 40).
The MIPS CPU can only interpret valid MIPS instruction words; however these words, being encoded in binary, are not easily manipulated by humans. For this reason we define an assembly language, which is easier for humans to understand and manipulate. There is a direct correspondence between assembly language statements and machine language instructions.
A MIPS Assembly program is a Unix text file consisting of a number of lines. Each line has the general format
labels instruction comment
Each of these components – labels, instruction, and comment is optional; a particular line may have all three, any two, any one, or none at all. The components, if they appear, must appear in the given order, e.g., labels cannot come after the instruction on a line.
Every line with an instruction specifies a corresponding machine language instruction word. Lines without an instruction are called null lines and do not specify an instruction word. That is, an assembly language program with n non-null lines specifies a machine language program with n words, in 1-1 ordered correspondence.
The labels component lists zero or more labels, each followed by a colon (:). A label is a string of alphanumeric characters, the first of which must be a letter of the alphabet. For example, fred123x is a valid label but 123fred is not.
A label appearing in the labels component is said to be defined; a particular label may be defined at most once in an assembly language program. Labels are case-sensitive; that is, fred and Fred are distinct labels.
The location of a line in an assembly language program is 4n, where n is the number of non-null lines preceding it. The first line therefore has location 0. If the first line is non-null, the second line has location 4. On the other hand, if the first line is null, the location of the second line is also 0. Note that the location of any non-null line is exactly the same as the location of the machine language word that it specifies. And all null lines immediately preceding it have the same location.
The value of a label is defined to be the location of the line on which it is defined. The value of a label corresponds to the memory address it corresponds to in the machine language program, if the machine language program were to be loaded at address 0.
A comment is any sequence of characters beginning with a semicolon (;) and ending with the line feed character (ASCII 0x0A) that terminates the line. Comments have meaning only to the reader; they do not contribute to the specification of the equivalent machine language program.
An instruction takes the form of an opcode followed by one or more operands. There can be at most one instruction per line.
The opcode may be add, sub, mult, multu, div, divu,
mfhi, mflo, lis, lw, sw, slt, sltu, beq, bne, jr, jalr,
or the pseudo-opcode
.word
.
An operand may be:
An instruction will also often contain punctuation characters such as commas
or parentheses which separate the operands. Whitespace between opcodes, operands
and punctuation is ignored.
For example, add $1, $2, $3
is equivalent
to add$1,$2,$3
.
Leading zeroes are allowed in numeric values and do not change the meaning of the number. This includes register numbers, e.g., $007 is allowed.
The number of operands, meaning of operands, allowed types of operands, allowed values or ranges for operands, and required punctuation generally differ depending on the instruction.
These opcodes all take three register operands separated by commas. For example:
add $1, $2, $3
The first operand is $d (the destination register) as specified in the MIPS Reference Sheet. The second and third operands are $s and $t respectively.
So in the example above we have d=1, s=2, and t=3, and the 5-bit representations of these values are encoded in the corresponding machine instruction.For these opcodes and all other opcodes that take register parameters, the register numbers must be between 0 and 31, inclusive.
These opcodes take two register operands corresponding to $s and $t. For example
mult $4, $5
specifies that s=4 and t=5 are encoded in the instruction word. $d is not used and is encoded as 0 in the instruction word.
These opcodes have a single register operand, $d. For example:
lis $6
d=6 is encoded in the instruction word, while $s and $t are not used and are encoded as 0 in the instruction word.
These opcodes have two register operands, $s and $t, and in addition an immediate operand, i. They have the following syntax:
opcode $t, i($s)
For example,
lw $7, 8($9)
The parameters $s and $t are registers, and i may be a decimal number (possibly negative) or a hexadecimal number. Note that i cannot be a label or a register.
If specified in decimal, i must be in the range -32768 through 32767, and it is encoded as a 16-bit two's complement integer.
If specified in hexadecimal,
i must not exceed 0xffff
when viewed as a non-negative base 16 number.
The hexadecimal value should be interpreted as representing a 16-bit
binary sequence and that sequence should be encoded directly.
These opcodes take three operands: registers $s and $t, and an immediate operand i. They have the following syntax:
opcode $s, $t, i
For example,
beq $10, $11, 12
The parameters $s and $t are registers, and i may be a decimal number (possibly negative), a hexadecimal number, or a label.
If specified in decimal or hexadecimal,
the requirements on the range of i and encoding of i are the same as
for the lw
and sw
opcodes.
If i is a label, the value (labelValue-PC)/4 is encoded as a 16-bit two's complement integer. Here labelValue is the value of the label, and PC is location of the beq or bne instruction plus 4 (that is, the location where the program counter would be when the branch instruction is executed – one word after the instruction itself). The value (labelValue-PC)/4 must be in the range -32768 through 32767 (otherwise it cannot be encoded in 16-bit two's complement).
These opcodes have a single register operand, $s. For example:
jalr $13
s=13 is encoded in the instruction word, while $d and $t are not used and are encoded as 0 in the instruction word.
The difference between these opcodes and mfhi,
mflo, lis
is only in the encoding.
For jr
and jalr
, the single register
parameter corresponds to $s, rather than $d.
.word
is not a true opcode,
as it does not necessarily encode a MIPS instruction
at all. It is a directive for the assembler, telling the assembler to
encode a 32-bit word at the location of the directive.
.word
has one operand i, which is either a number or a label.
0xffffffff
when viewed as a non-negative base 16 number.
Remark: We will not test whether your assembler performs range checking
for label operands of .word
. This is because it would
require a program that is several gigabytes in size to create a label value
that is out of range. Your assembler should perform range checking for all
other operand types mentioned in this document (including label operands
for branch instructions, and integer operands for .word
).