Untitled

DIVISION OF ENGINEERING AND APPLIED SCIENCES
HARVARD UNIVERSITY
CS 161. Operating Systems

Matt Welsh
Spring 2005
MIPS r2000/r3000 Architecture
Architecture/assembler summary

[CS161 Home Page]

(This is not intended to be either a comprehensive reference or a tutorial. More information is available from www.mips.com.)

Registers
Instructions
Synthetic instructions
Delay slots
Exceptions
Segments
Registers

There are 32 general-purpose registers and 3 special registers on the MIPS r2k itself. There are also up to 32 registers each on up to four coprocessors. For CS161 purposes, there is only one coprocessor, coprocessor 0, which is the "system coprocessor"; it takes care of exceptions and virtual memory issues.
Register	Symbolic
name	Save
by	Description
General registers
$0	z0, ZERO	N/A	Always contains 0, no matter what's written to it.
$1	AT	caller	Assembler temporary. See below.
$2	v0	caller	Value 0. Used for computations; function return value is placed here. Also holds the system call number on syscall entry.
$3	v1	caller	Value 1. Used for computations; upper word of 64-bit return value is placed here.
$4	a0	caller	Argument 0. First function argument goes here.
$5	a1	caller	Argument 1. Second function argument goes here.
$6	a2	caller	Argument 2. Third function argument goes here.
$7	a3	caller	Argument 3. Fourth function argument goes here. Also used as a flag value on system call return.
$8	t0	caller	General-purpose temporary register.
$9	t1	caller	General-purpose temporary register.
$10	t2	caller	General-purpose temporary register.
$11	t3	caller	General-purpose temporary register.
$12	t4	caller	General-purpose temporary register.
$13	t5	caller	General-purpose temporary register.
$14	t6	caller	General-purpose temporary register.
$15	t7	caller	General-purpose temporary register.
$16	s0	callee	General-purpose saved register.
$17	s1	callee	General-purpose saved register.
$18	s2	callee	General-purpose saved register.
$19	s3	callee	General-purpose saved register.
$20	s4	callee	General-purpose saved register.
$21	s5	callee	General-purpose saved register.
$22	s6	callee	General-purpose saved register.
$23	s7	callee	General-purpose saved register.
$24	t8	caller	General-purpose temporary register.
$25	t9	caller	General-purpose temporary register.
$26	k0	nobody	Kernel scratch register.
$27	k1	nobody	Kernel scratch register.
$28	gp	global	Global pointer. Constant for any given process.
$29	sp	N/A	Stack pointer.
$30	s8	callee	Saved register #8 - conventionally, but not always, a frame pointer.
$31	ra	caller	Return address of function.
Special registers
HI	-	caller	High-order word of 64-bit multiply result, or remainder of divide result.
LO	-	caller	Low-order word of 64-bit multiply result, or quotient of divide result.
PC	-	N/A	Program counter.
Coprocessor 0
cop0 $0	c0_index	N/A	TLB entry index register.
cop0 $1	c0_random	N/A	TLB randomized access register.
cop0 $2	c0_entrylo	N/A	Low-order word of "current" TLB entry.
cop0 $4	c0_context	N/A	Page-table lookup address.
cop0 $8	c0_vaddr	N/A	Virtual address associated with certain exceptions.
cop0 $10	c0_entryhi	N/A	High-order word of "current" TLB entry.
cop0 $0	c0_status	N/A	Processor status register.
cop0 $13	c0_cause	N/A	Exception cause register.
cop0 $14	c0_epc	N/A	PC at which exception occurred.
Any of the 32 general-purpose registers can be used in any instruction that takes register operands. The special registers are accessed using special instructions; the coprocessor registers can be accessed by using special coprocessor instructions to move their values to general registers and back.
Register $31 is the "link register". Most of the instructions for calling subroutines are hardwired to store the return address into this register. (The jalr instruction is, for some reason, an exception.)

The coprocessor 0 registers have various bit fields in them. These are:

c0_index
Bits	Name	Description
31	P	Set by the tlbp instruction if the probe fails.
14-30	unused
8-13	Index	TLB entry number for tlbwi, tlbr, and tlbp.
0-7	unused
c0_random
Bits	Name	Description
14-31	unused
8-13	Random	Semi-random TLB entry number used by tlbwr. Updated by processor. Never has a value between 0-7.
0-7	unused
c0_entrylo
Bits	Name	Description
12-31	PFN	Physical page number (bits 12-31 of address) for VM mapping.
11	N	Non-cacheable; if set, RAM cache is disabled accessing this page.
10	D	Dirty; if set, page may be written to.
9	V	Valid; if set, page may be accessed.
8	G	Global; if set, valid in every address space.
0-7	unused
c0_context
Bits	Name	Description
21-31	PTEBase	Base address of page table. Untouched by hardware; maintained by software.
20-0	BadVPN	Offset into page table for a kuseg fault (bits 12-30 of c0_vaddr), set by hardware.
c0_vaddr
Bits	Name	Description
0-31	vaddr	Failing virtual address; set by certain exceptions.
c0_entryhi
Bits	Name	Description
12-31	VPN	Virtual page number (bits 12-31 of address) for VM mapping.
6-11	PID	ID of address space in which virtual address exists.
0-5	unused
c0_status
Bits	Name	Description
28-31	CU	If these bits are set, the corresponding coprocessors are usable. If clear, use of said coprocessors will generate a coprocessor unusable exception.
23-27	unused
22	BEV	If set the "bootstrap" exception handler addresses are used.
21	TS	If set to 1, the processor is dead in the water and needs to be reset.
20	PE	Set to 1 if a cache parity error occurs. Clear by writing 1.
19	CM	Set to 1 if the most recent data cache load missed, but only if IsC is set.
18	PZ	If set to 1, uses space parity for outgoing data.
17	SwC	If set, the cache control lines affect the instruction cache rather than the data cache.
16	IsC	If set, the data cache is detached from main memory. (For flushing.)
8-15	IntMask	While these bits are set, the corresponding interrupts are masked and do not cause interrupt exceptions.
6-7	unused
5	KUo	Old kernel/user mode bit (1 = user mode)
4	IEo	Old interrupt enable bit (0 = mask all interrupts)
3	KUp	Previous kernel/user mode bit (1 = user mode)
2	IEp	Previous interrupt enable bit (0 = mask all interrupts)
1	KUc	Current kernel/user mode bit (1 = user mode)
0	IEc	Current interrupt enable bit (0 = mask all interrupts)
c0_cause
Bits	Name	Description
31	BD	Set if last exception occurred in a branch delay slot.
30	unused
28-29	CE	Coprocessor number resulting from a coprocessor unusable exception.
16-27	unused
10-15	IP	Bits reflecting the state of the external hardware interrupt lines. Bit 10 is irq 0.
8-9	Sw	Software interrupts. Like IP, but controlled by software.
6-7	unused
2-5	ExcCode	An exception code, from the list below.
0-1	unused
c0_epc
Bits	Name	Description
0-31	epc	Program counter for restarting after exception.

Instructions

This table uses the following symbols:
RD, RS, RT	Up to three general registers ($0-$31)
HI, LO	The special "hi" and "lo" registers
HI:LO	"hi" and "lo" as a single 64-bit value
C0_REG	A coprocessor 0 register
signed-IMM	Immediate value IMM, sign-extended to 32 bits
unsigned-IMM	Immediate value IMM, zero-extended to 32 bits
offset	Branch or memory-access offset (always signed)
signed-	Value is interpreted as signed
unsigned-	Value is interpreted as unsigned
address	Immediate address for jump
These are the instructions (there are a few not listed, including all the floating-point operations, but this should include anything we'll see in CS161.)
In the opcode names, "u" means "unsigned"; "i" means immediate; the "al" in some jump instructions means "and link", meaning "function call".

Instruction	Operation	Notes
add RD, RS, RT	RD = RS + RT; exception on overflow
addi RT, RS, IMM	RT = RS + signed-IMM; exception on overflow
addiu RT, RS, IMM	RT = RS + signed-IMM
addu RD, RS, RT	RD = RS + RT
and RD, RS, RT	RD = RS & RT
andi RS, RT, IMM	RT = RS & unsigned-IMM
beq RS, RT, branch-offset	if (RS == RT) NEXTPC += (branch-offset << 2)
bgez RS, branch-offset	if (signed-RS <= 0) NEXTPC += (branch-offset << 2)
bgezal RS, branch-offset	$31 = NEXTPC; if (signed-RS >= 0) NEXTPC += (branch-offset << 2)
bgtz RS, branch-offset	if (signed-RS > 0) NEXTPC += (branch-offset << 2)
blez RS, branch-offset	if (signed-RS <= 0) NEXTPC += (branch-offset << 2)
bltz RS, branch-offset	if (signed-RS < 0) NEXTPC += (branch-offset << 2)
bltzal RS, branch-offset	$31 = NEXTPC; if (signed-RS < 0 NEXTPC += (branch-offset << 2)
bne RS, RT, branch-offset	if (RS != RT) NEXTPC += (branch-offset << 2)
break	breakpoint (immediate breakpoint exception) with no delay slot
div RS, RT	LO = signed-RS / signed-RT; HI = signed-RS % signed-RT
divu RS, RT	LO = unsigned-RS / unsigned-RT; HI = unsigned-RS % unsigned-RT
j address	NEXTPC = (NEXTPC & 0xf0000000) | (address << 2)
jal address	$31 = NEXTPC; NEXTPC = (NEXTPC & 0xf0000000) | (address << 2)
jalr RD, RS	RD = NEXTPC; NEXTPC = RS. RD is normally $31.
jr RS	NEXTPC = RS
lb RT, offset(RS)	RT = signed-8-memory[RS + offset]
lbu RT, offset(RS)	RT = unsigned-8-memory[RS + offset]
lh RT, offset(RS)	RT = signed-16-memory[RS + offset]
lhu RT, offset(RS)	RT = unsigned-16-memory[RS + offset]
lui RT, IMM	RT = unsigned-IMM << 16
lw RT, offset(RS)	RT = 32-memory[RS + offset]
lwl RT, offset(RS)	RT = unaligned-32-memory[RS + offset]	1
lwr RT, offset(RS)	RT = unaligned-32-memory[RS + offset]	1
mfc0 RT, C0_REG	RT = C0_REG
mfhi RD	RD = HI
mflo RD	RD = LO
mtc0 RT, C0_REG	C0_REG = RT
mthi RS	HI = RS
mtlo RS	LO = RS
mult RS, RT	HI:LO = signed-RS * signed-RT
multu RS, RT	HI:LO = unsigned-RS * unsigned-RT
nor RD, RS, RT	RD = ~(RS | RT)
or RD, RS, RT	RD = RS | RT
ori RT, RS, IMM	T = RS | unsigned-IMM
rfe	return from exception	2
sb RT, offset(RS)	8-memory[RS + offset] = RT
sh RT, offset(RS)	16-memory[RS + offset] = RT
sll RD, RT, IMM	RD = RT << unsigned-IMM
sllv RD, RT, RS	RD = RT << RS
slt RD, RS, RT	RD = signed-RS < signed-RT
slti RT, RS, IMM	RT = signed-RS < signed-IMM
sltiu RT, RS, IMM	RT = unsigned-RS < unsigned-signed-IMM
Yes, according to my reference it actually takes the 16-bit immediate, sign-extends it, and then reinterprets it as an unsigned value. Don't ask me.	4
sltu RD, RS, RT	RD = unsigned-RS < unsigned-RT
sra RD, RT, IMM	RD = signed-RT >> unsigned-IMM
srav RD, RT, RS	RD = signed- RT >> RS
srl RD, RT, IMM	RD = unsigned-RT >> unsigned-IMM
srlv RD, RT, RS	RD = unsigned-RT >> RS
sub RD, RS, RT	RD = RS - RT; exception on overflow
subu RD, RS, RT	RD = RS - RT
sw RT, offset(RS)	32-memory[RS + offset] = RT
swl RT, offset(RS)	unaligned-32-memory[RS + offset] = RT	1
swr RT, offset(RS)	unaligned-32-memory[RS + offset] = RT	1
syscall	make system call; immediate syscall exception with no delay slot
tlbp	probe tlb: search TLB for entry matching c0_entryhi; set probe-failed bit and index field in c0_index.	3
tlbr	read tlb entry: load the TLB entry named by the index field of c0_index into c0_entryhi and c0_entrylo.	3
tlbwi	write tlb entry indexed: store c0_entryhi and c0_entrylo into the TLB entry named by the index field of c0_index.	3
tlbwr	write tlb entry "random": store c0_entryhi and c0_entrylo into the TLB entry named by the index field of c0_random.	3
xor RD, RS, RT	RD = RS ^ RT
xori RT, RS, IMM	RD = RS ^ unsigned-IMM
Notes:
lwl/lwr and swl/swr are for accessing unaligned words in memory. The actual specification is complicated, but what it boils down to is that
lwl RT, offset(RS)
lwr RT, (offset+3)(RS)
loads the 32-bit value starting at RS+offset, no matter what the alignment of that address is. swl/swr behave analogously.

RFE rotates the lower six bits of the status register by two to the right, so the "previous" interrupt/usermode state becomes the current state and the "old" state is copied into the "previous" state. This inverts what happens on an exception. RFE is normally found in the delay slot of a jump instruction of some kind.

For an explanation of these, see the comments in src/kern/arch/mips/include/tlb.h.
Synthetic instructions

Because all instructions are exactly 32 bits wide, it's not possible to perform certain logical operations in a single instruction. The assembler will cover for these by emitting multiple actual instructions as needed.
For instance, the "lc" (load constant) and "la" (load address) instructions, both of which load 32-bit constants, will be expanded by the assembler into a "lui" instruction to load the upper half of the word, and then usually an "ori" or "addiu" to set the lower half of the word.

Some of these combinations require an extra register to hold intermediate values. Register $1 is reserved for this purpose. You can prevent the assembler from using $1 by putting ".set noat" in the assembler source.

Delay slots

The MIPS is a pipelined architecture, and certain aspects of the pipeline are exposed to the programmer. In general, "slow" instructions are not finished until the instruction *two* spaces after them is being fetched. The instruction in between is referred to as a "delay slot".
There is no pipeline stall logic; the delay slots must be filled out appropriately in the machine code. If they aren't, the behavior is undefined.

The assembler will attempt to fill delay slots for you; however, it isn't very bright about it and usually inserts nops. Also, in some cases it cannot tell what you mean and can silently mangle code that you thought was using delay slots efficiently. For this reason, when coding OS/161, I turned off this behavior with ".set noreorder".

Delay slots apply chiefly to two classes of instructions:

Loads and stores involving memory.
			lw $9, 0($8)	; load value into $9
			nop		; $9 won't be ready here
			addiu $10, $9	; now we can use $9
Branches and jumps.
			jal myfunc	; call function
			move a0, s0	; executes BEFORE jump happens
			addiu s0,s0,v0	; executes AFTER function returns
The interaction between branch delay slots and exception handling is extremely unpleasant and you'll be happier if you don't think about it.
Exceptions

When an exception occurs, information about the exception is recorded in some of the coprocessor 0 registers and execution contains from a known hardwired address.
The following registers are updated on exception:

c0_cause: the BD, CE, and ExcCode fields are updated.
c0_context: the BadVPN field is updated in the same cases c0_vaddr is updated.
c0_vaddr: updated on some exceptions (see list).
c0_status: the lower six bits are shifted left by two bits, shifting in zeros for the bottom two bits. This disables interrupts and puts the processor in kernel mode.
c0_epc: set to suitable PC for restarting the instruction that failed.
Execution continues at a hardwired address, one of the following:
Address	Description
0x80000000	UTLB miss exception
0x80000080	Other exceptions
0xbfc00000	Processor reset
0xbfc00100	UTLB miss exception, if BEV is set in c0_status
0xbfc00180	Other exceptions, if BEV is set in c0_status
The exceptions are:
Code	Sets
c0_vaddr?	    Description
0	no	Interrupt (hardware or software)
1	yes	TLB protection fault ("modification request")
2	yes	TLB miss or UTLB miss on load or instruction fetch.
3	yes	TLB miss or UTLB miss on store.
4	yes	Address error on load or instruction fetch.
5	yes	Address error on store.
6	no	External bus error on instruction fetch
7	no	External bus error on data load or store
8	no	SYSCALL instruction
9	no	BREAK instruction
10	no	Reserved (illegal) instruction
11	no	Coprocessor unusable
12	no	Arithmetic overflow
An address error results from either use of an inadequately aligned pointer (an N-bit quantity must be aligned on an N-bit address boundary, unless the lwl/lwh/swl/swh instructions are used) or an attempt to access kernel memory from user mode.
A TLB entry is "matching" if its VPN field is the same as the page number portion of the virtual address being looked up, and either the G (global) bit is set or the PID field matches the PID field in c0_entryhi.

If no matching TLB entry is found, a TLB miss exception occurs, unless the address is in the user mode range (0-0x80000000) in which case a UTLB exception occurs. If a matching entry is found, but it is not marked valid (the V bit is clear), a TLB miss exception (never a UTLB miss exception) occurs. Then, if the dirty (D) bit is not set on a write access, a TLB protection fault occurs.

A UTLB miss exception uses (potentially) different exception handling code from a TLB miss exception, but is otherwise the same. The purpose, in conjunction with the c0_context register, is to enable fast-path TLB refill handling. Note that the UTLB exception applies to user addresses, not user mode - if the miss address is below 0x80000000, a UTLB exception occurs whether or not the miss was generated in kernel or user mode.

Segments

The MIPS divides its address space into several regions that have hardwired properties. These are:
kseg2, TLB-mapped cacheable kernel space
kseg1, direct-mapped uncached kernel space
kseg0, direct-mapped cached kernel space
kuseg, TLB-mapped cacheable user space
Both direct-mapped segments map to the first 512 megabytes of the physical address space.
The top of kuseg is 0x80000000. The top of kseg0 is 0xa0000000, and the top of kseg1 is 0xc0000000.

The memory map thus looks like this:

Address	Segment	Special properties
0xffffffff	kseg2
0xc0000000
0xbfffffff	kseg1
0xbfc00180	Exception address if BEV set.
0xbfc00100	UTLB exception address if BEV set.
0xbfc00000	Execution begins here after processor reset.
0xa0000000
0x9fffffff	kseg0
0x80000080	Exception address if BEV not set.
0x80000000	UTLB exception address if BEV not set.
0x7fffffff	kuseg
0x00000000