System/161 MIPS Processor

The 32-bit MIPS is the simplest "real" 32-bit processor for which development tools are readily available; furthermore, the MIPS architecture is already widely used for teaching in various contexts. This makes it a natural choice for System/161.

The specific dialect of MIPS processor found in System/161, which for a lack of a better term we'll refer to as MIPS-161, is essentially a cache-coherent r3000, or MIPS-I.

User Mode

In user mode, the MIPS-161 behaves the same as any other 32-bit MIPS. All user instructions are fully interlocked and there are no pipeline hazards. All MIPS-I instructions are supported. MIPS-II and higher instructions are not supported, except for the ll and sc instructions used for multiprocessor synchronization. Please consult your favorite MIPS reference for further details.

Kernel Mode

In kernel mode, the MIPS-161 is mostly a MIPS-I, with a few differences and a few extensions borrowed from later MIPS versions. For completeness, the following sections define the complete kernel mode interface.

Kernel Instructions

The WAIT instruction has been borrowed from MIPS-II. This operation puts the processor into a low-power state and suspends execution until some external event occurs, such as an interrupt. Since the exact behavior of WAIT is not clearly specified anywhere I could find, the MIPS-161 behavior is as follows:

Regarding the TLBR, TLBWR, TLBWI, and TLBP instructions, see the MMU section below. Regarding the RFE instruction, see the trap handling section, also below. Regarding the MFC0 and MTC0 instructions, see the next section.

Kernel Registers

The MIPS-161 has 10 supervisor registers in coprocessor 0. These may be accessed with the MFC0 (move from coprocessor 0) and MTC0 instruction, as follows:

	mfc0 $4, $12
loads the contents of supervisor register 12 (STATUS) into general-purpose register 4 (a0), and
	mtc0 $4, $12
does the reverse.

The supervisor registers are:

with the following bit patterns:
3130 2928 2726 2524 2322 2120 1918 1716 1514 1312 1110 98 76 54 32 10 Bits 
P 0 SLOT 0 INDEX
0 SLOT 0 RANDOM
PPAGE N D V G 0 TLBLO
PTBASE VSHIFT 0 CONTEXT
BADVADDR BADVADDR
VPAGE ASID 0 TLBHI
d c b a 0 B T E M Z S I H H H H H H F F 0 KUo IEo KUp IEp KUc IEc STATUS
BD 0 CE 0 H H H H H H F F 0 EXC 0 CAUSE
EPC EPC
0 PRID PRID
INDEX RANDOM TLBLO CONTEXT BADVADDR TLBHI STATUS CAUSE EPC PRID

Trap Handling

When an exception occurs, the following things happen:

  • The PC where the exception occurred is loaded into the EPC register.
  • If this was in a branch delay slot, the EPC register is set to the address of the branch (that is, 4 is subtracted) and the BD flag in the CAUSE register is set. Software need not examine the BD flag unless the exact address of the faulting instruction is wanted, e.g. for disassembly and analysis.
  • The EXC field of the CAUSE register is set to reflect what happened. The exception codes are listed below.
  • For coprocessor-related exceptions the CE field of the CAUSE register is set.
  • For interrupts the H and F bits of the CAUSE register are set to reflect the interrupt(s) that are active.
  • For MMU exceptions the BADVADDR register is loaded with the failing address. A masked and shifted form suitable for indexing a page table is placed in the VSHIFT field of the CONTEXT register.
  • The bottom six bits of the STATUS register are shifted left by two. The "o" (old) bits are lost; the "p" (previous) bits become the old bits; the "c" (current) bits become the previous bits; and the current bits are set to 0. This disables interrupts and puts the processor in kernel mode.
  • Execution continues from one of five hardwired addresses according to what happened and the setting of the B (boot) bit in the STATUS register.

    The exception handler addresses are:
     
    BTrap Address
    1 General 0xbfc0 0180
    1 UTLB 0xbfc0 0100
    1 Reset 0xbfc0 0000
    0 General 0x8000 0080
    0 UTLB 0x8000 0000
    A UTLB exception is a TLB miss that occurs against the user address space (0x0000 0000 - 0x7fff ffff) and occurs because no matching TLB entry was found. Other TLB exceptions go through the General vector. This allows a fast-path TLB refill handler. See below.

    To return from an exception, one executes the following sequence:

    	jr k0
    	rfe
    
    where the k0 register has been loaded with the desired exception return address, either the value previously retrieved from the EPC register or some other address chosen by software. The RFE instruction is not a jump; it occurs in the delay slot of a jump. It shifts the six bottom bits of the status register right by two, undoing the shift done at exception entry time. This returns the processor to whatever interrupt state and mode (user/kernel) it was in when the exception occurred.

    Because there are three pairs of state bits, the processor can take two nested exceptions without losing state, if one is careful. This is to facilitate the fast-path TLB refill handler. See below.

    The two soft interrupt lines can be activated by writing to the CAUSE register.

    The exception codes:

    The IBE and DBE exceptions are not MMU exceptions and do not set BADVADDR.

    MMU

    The MMU is the MIPS-I MMU, with a 64-entry fully associative TLB where each entry maps one 4K virtual page to one 4K physical page. The paired pages setup of later MIPS processors is not present, and there is no support for superpages.

    The processor's virtual address space is divided into four segments:
     
    NameDescription
    kseg2Supervisor mode only; TLB-mapped, cacheable
    kseg1Supervisor mode only; direct-mapped, uncached
    kseg0Supervisor mode only; direct-mapped, cached
    kusegUser and supervisor mode; TLB-mapped, cacheable
     
    The mapped segments are mapped via a translation lookaside buffer (TLB) with software refill. The direct-mapped segments are mapped (without use of the TLB) both to the first 512 megabytes of the physical memory space. Typically the kernel lives in kseg0, hardware devices are accessed through kseg1, and user-mode programs are run in kuseg.

    There are four MMU-related instructions:

    The INDEX field of the RANDOM register ranges from 8 to 63; it is incremented on every instruction executed, which is not very random but apparently adequate for the purpose, which is to fill the TLB rapidly and effectively. Entries 0 through 7 of the TLB are never touched by TLBWR and can be used for reserved or special mappings.

    The processor is built to support a fast-path TLB refill handler, which is invoked via the UTLB exception vector (see above). The idea is that the OS maintains page tables in virtual memory using the kseg2 region (see above) and loads the base address of the page table into the PTBASE field of the CONTEXT register. Each page table entry is a 4-byte quantity suitable for loading directly into the TLBLO register; 1024 of these fit on a 4K page, so each page table page maps 4MB and it takes 512 pages, or 2MB of virtual space, to map the whole 2GB user address space. (Since these are placed in virtual memory, only the page table pages that are used need be materialized.) With this setup, the UTLB exception handler can then read the CONTEXT register and use the resulting value to load directly from the page table. If this fails because that section of the page table is not materialized, a second (non-UTLB) exception occurs. Careful register usage and the three-deep nesting of the bottom part of the STATUS register allows the general-purpose exception handler to recover from this condition and proceed as desired. On success, the UTLB handler can then unconditionally write the PTE it got into the TLB. If the V (valid) bit is not set, on return from the UTLB handler another exception will occur; however, because a matching (though not valid) TLB entry exists, this will not be a UTLB exception, and the general exception handler will get control and can schedule pagein or whatever.

    There are a number of possible other ways to use the UTLB handler, of course. One simple way is to just have it jump to the general-purpose exception handler.

    As noted above, the V (valid) bit does not prevent a TLB entry from being "matching". A TLB entry is matching if both of the following are true:

    One must never load the TLB in such a fashion that two (or more) entries can match the same virtual address. If this happens, the processor shuts down and is unrecoverable except by hard reset. Since there is no way to prevent entries from matching, one should clear the TLB by loading each entry with a distinct VPAGE, and use VPAGEs from the kseg0 or kseg1 regions that will never be presented to the MMU for translation. To reset the TLB at startup, since it is not cleared by processor reset, one should use a second, potentially larger, set of distinct VPAGEs and check that each is not already present before loading it.

    There is no way to tell if a TLB entry has been used, or how recently it has been used. Nor is there a direct way to tell if a TLB entry has been used for writing. The D ("dirty") bit can be used for this purpose with software support, as follows:

    The MMU exceptions are as follows:

    Cache Control

    The MIPS-I has a remarkably painful cache and cache control architecture. While the MIPS-161 exhibits the same cache control bits in the STATUS register, it is in fact cache-coherent and there is no need to flush, examine, or otherwise touch the cache subsystem. In fact, doing any of these things in the MIPS-I fashion will result in undefined behavior.

    Since normal MIPS processors have split instruction and data caches, and future System/161 releases may include more cache handling, it is recommended that all necessary flushes of the instruction cache be included and stubbed out.

    Out-of-Order Execution

    Even with cache coherence, or when using uncached memory regions, processors that support out-of-execution may require so-called memory barrier instructions to ensure that memory accesses occur on the external bus (and become visible to other processors or devices) in the same order they are issued in machine code.

    The MIPS-II SYNC instruction can be used to wait until all pending load and store operations are complete. This guarantees that all memory updates before the SYNC become visible before any memory updates after the SYNC. All SYNC instructions executed on all cores and processors within a system occur in a single well defined global order.

    The MIPS-161 currently has no support for out-of-order execution and the SYNC instruction is not supported. This may change in the future.

    Cores

    Each MIPS-161 processor has only one core on the die. However, as noted elsewhere System/161 supports up to 32 processors on the mainboard. There is little software-visible difference between 32 single-core processors on one mainboard and 32 cores in one processor; most of what effects exist are cache-related and not modeled by System/161 in any event.

    Startup

    On CPU reset execution begins from the Reset vector defined above. The processor starts out in an almost completely undefined state. The cache is in an undefined state (except on the MIPS-161 this does not matter...), the TLB is in an undefined state, and the contents of the general and kernel-mode registers are all undefined, except as follows:

    The code at the Reset vector must in general sort out the processor state before it can do anything else.

    In System/161, the boot ROM takes care of these issues and loads a kernel as described in the LAMEbus documentation. However, the state guaranteed by the boot ROM is only slightly more flexible: the boot ROM guarantees that the cache is in a workable state, and it provides a stack and an argument string in the a0 register. The TLB is still in an undefined state and the contents of other general and kernel-mode registers and register fields are still undefined.

    Identifying the Processor

    Currently, code that knows it is running on System/161 may assume it has a MIPS-161 and proceed accordingly.

    Code that wants to run unchanged on a variety of MIPS platforms without a System/161-specific startup wedge is likely to run into problems: there is no safe way to identify that one is running on System/161 as such, and distinguishing the MIPS-161 from an arbitrarily chosen MIPS-I is likely to be problematic. The MIPS-161 sets the PRID register to 0x0000 beef; however, my understanding is that the contents of the PRID register for early MIPS models (where the upper 16 bits are defined as 0) cannot even be used reliably to distinguish real deployed hardware. It might be possible to distinguish the MIPS-161 based on its cache (non-)behavior; however, this is probably dangerous and not recommended.