Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- General instruction encoding sequence:
- LSB MSB
- 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 34 25 26 27 28 29 30 31
- Z C V NZ NC NV Opcode---------------- Destination------ Source1---------- Source2----------
- Flags:
- Z = last operation yielded a zero
- C = last operation had an unsigned carry
- V = last operation had a signed carry
- I = interrupt flag
- P = priviledged instructions allowed
- X = paging in use
- But the processor is little endian so the first byte contains 6 bits of mask followed by the low two
- bits of opcode in its highest two bits...
- The first three bits skip the instruction if the flag is on, the second three skip if the flag is off.
- Many instructions have two forms, one that writes to the flags and one that doesn't. Some instructions
- never write to the flags at all.
- Writing to IP without using a JMP instruction imposes a large stall penalty.
- There are 64 addressible registers, of which 4 have special meanings to the processor.
- Register 0 = all zeros.
- Register 1 = IP
- Register 2 = return address
- Register 3 = multiply/divide high address
- By convention, the following registers are reserved for their purpose
- Register 63 = stack pointer
- Register 62 = thread-local storage pointer
- Register 61 = frame pointer (if frame pointers are used)
- Special Opcodes
- 0x00 Fault This instruction is always invalid, raises an error if not flag masked
- 0x01 Syscall This instruction raises a system call
- Bit twiddling instructions
- 0x02 and r, r, r
- 0x03 andf r, r, r
- 0x04 or r, r, r
- 0x05 orf r, r, r
- 0x06 xor r, r, r
- 0x07 xorf r, r, r
- 0x08 nor r, r, r
- 0x09 norf r, r, r
- nop is made as and 0, 0, 0
- not is made as nor dest, src, 0
- Bit shifting
- 0x0A shl r, r, r
- 0x0B shlf r, r, r
- 0x0C shl r, r, i
- 0x0D shlf r, r, i
- 0x0E shr r, r, r
- 0x0F shrf r, r, r
- 0x10 shr r, r, i
- 0x11 shrf r, r, i
- 0x12 sar r, r, r
- 0x13 sarf r, r, r
- 0x14 sar r, r, i
- 0x15 sarf r, r, i
- 0x16 rol r, r, r
- 0x17 ror r, r, r
- 0x18 rol r, r, i There's no need of an ror r, r, i as this can be encoded as rol r, r, 64 - i
- 0x1D swab r, r, i Register value conversion
- i = 0: alias for mov
- i = 1: reverse bits
- i = 2: swap endian for 4x 2 byte numbers
- i = 4: swap endian for 2x 4 byte numbers
- i = 8: swap endian for 1x 8 byte number
- i = 17: sign extend byte to 8 bytes
- i = 19: sign extend two bytes to 8 bytes
- i = 21: sign extend four bytes to 8 bytes
- i = 33: zero extend byte to 8 bytes
- i = 34: zero extend two bytes to 8 bytes
- i = 36: zero extend four bytes to 8 bytes
- JMP instructions
- 0x1A jmp near The register range encodes a signed number between -2^20 and 2^20-4 where the bottom 2 bits must be zero
- 0x1B call near like jmp, but writes return address to r2
- 0x19 call indirect Jumps to address in source1, writes return address to dest (should be r2; r0 results in jmp indirect)
- Addition and subtraction
- 0x20 add r, r, r
- 0x21 addf r, r, r
- 0x22 add r, r, i Unsigned 0-63
- 0x23 addf r, r, i
- 0x26 adcf r, r, r Also adds carry bit; this instruction is unusual in that there is no non-f version
- 0x27 sbbf r, r, r Also subtracts carry bit
- 0x28 sub r, r, r
- 0x29 subf r, r, r
- 0x2A sub r, r, i Unsigned 0-63
- 0x2B subf r, r, i Unsigned 0-63
- 0x2E flags (dest, source1, i)
- i = 0: clear carry
- i = 1: set carry
- i = 2: flip carry
- i = 4: copy source1 to flags (does not touch flags other than Z C V)
- i = 5: copy flags to dest (ditto)
- i = 6: copy source1 to flags including priviledged flags
- i = 7: copy flags to dest (including privileged flags)
- Special long-form instructions for dealing with large constants. Add and sub take up 2 slots, where the first 6
- bits of the second slot are set to 1 bits in encoding, while the 6 bits in source2 slot provide the actual value.
- To reconstruct the constant, take the bottom 6 bits from the second 32 bit word, move them to the top 6 bits,
- then take the bottom six bits of the first word and move them into the second word.
- 0x24 add r, r, i Unsigned 0-2^32-1
- 0x25 addf r, r, i Unsigned 0-2^32-1
- 0x2C sub r, r, i Unsigned 0-2^32-1
- 0x2D subf r, r, i Unsigned 0-2^32-1
- loadc and jmp far occupy 3 slots each with the same basic idea; the second slot is repaired from source2 slot
- while the third slot is repaired from the source1 slot
- 0x1C jmp far Unsigned 0-2^64-1
- 0x1D jmp far Unsigned 0-2^64-1
- 0x1E loadc r, i Unsigned 0-2^64-1
- Memory access
- 0x30 load8 r, r, i Loads 8 byte value into register; integer i is small offset (0-63)
- 0x31 load8h r, r, i Continues load by handling alignment errors
- 0x32 load4 r, r, i Loads 4 byte value into register
- 0x33 load4h r, r, i Continues load by handlign alignment erros
- 0x34 load2 r, r, i Loads 2 byte value into register
- 0x35 load2h r, r, i Continues load by handling alignment erros
- 0x36 load1 r, r, i Loads single byte into register
- 0x37 watch 0, r, i monitors aligned 16 byte region for writes; each core can monitor 1 address
- do not watch the highest possible 16 bytes of virtual address space; this
- is where the bogus watch is (in case of handling an interrupt)
- 0x38 sto8 r, r, i Writes 8 byte value from register to memory
- 0x39 sto8h r, r, i Continues store by handling alignmnet errors
- 0x3A sto4 r, r, i Writes 4 byte value from register to memory
- 0x3B sto4h r, r, i Continues store by handling alignment erros
- 0x3C sto2 r, r, i Writes 2 byte value from register to memory
- 0x3D sto2h r, r, i Continues store by handling alignment erros
- 0x3E sto1 r, r, i Stores one byte value from register to memory
- 0x3F storef r, r, i If watch is good, writes 8 byte value from memory to register, clears all flags
- ;The difference between the 0x3x and the 0x48 memory operations is the range of i; structures of 64 bytes
- ;or less may be accessed byte aligned with no additional instructions; larger structures can only be accessed
- ;at 8-byte aligned chunks without one additional instruction.
- 0x40 load8 r, r, i Loads 8 byte value into register; integer i is small offset (0-63)*8 + 64
- 0x41 load8h r, r, i Continues load by handling alignment errors
- 0x42 load4 r, r, i Loads 4 byte value into register
- 0x43 load4h r, r, i Continues load by handlign alignment erros
- 0x44 load2 r, r, i Loads 2 byte value into register
- 0x45 load2h r, r, i Continues load by handling alignment erros
- 0x46 load1 r, r, i Loads single byte into register
- 0x47 watch 0, r, i monitors aligned 16 byte region for writes; each core can monitor 1 address
- do not watch the highest possible 16 bytes of virtual address space; this
- is where the bogus watch is (in case of handling an interrupt)
- 0x48 sto8 r, r, i Writes 8 byte value from register to memory
- 0x49 sto8h r, r, i Continues store by handling alignmnet errors
- 0x4A sto4 r, r, i Writes 4 byte value from register to memory
- 0x4B sto4h r, r, i Continues store by handling alignment erros
- 0x4C sto2 r, r, i Writes 2 byte value from register to memory
- 0x4D sto2h r, r, i Continues store by handling alignment erros
- 0x4E sto1 r, r, i Stores one byte value from register to memory
- 0x4F storef r, r, i If watch is good, writes 8 byte value from memory to register, clears all flags
- If watch is bad or was not watching this address, sets all flags
- watch and storef implement atomic primitives. lock inc [ra] atomic is as follows:
- watch ra
- load8 rs1, ra, #0
- addf rs1, rs1, #1
- flags rs2, r0, #5
- storef rs1, ra, #0
- z jmp near @-6 ; goes back to watch above
- flags r0, rs2, #4
- Multiplication and Division:
- multiply sets no flags; division sets Z flag = whether or not divide overflow occurred
- (division by zero or largest negative number)
- 0x50 mul8 r, r, r Unsigned multiply 8 bytes; high output goes to r3
- 0x51 imul8 r, r, r Signed multiply 8 bytes; high output goes to r3
- 0x52 mul4 r, r, r Unsigned multiply 4 bytes
- 0x53 imul4 r, r, r Signed multiply 4 bytes
- 0x54 mul2 r, r, r Unsigned multiply 2 bytes
- 0x55 imul2 r, r, r Signed multiply 2 bytes
- 0x56 mul1 r, r, r Unsigned multiply 1 byte
- 0x57 imul1 r, r, r Signed multiply 1 byte
- 0x58 div8 r, r, r Unsigned divide 8 bytes; high input comes from r3, modulus goes to r3
- 0x59 div8 r, r, r Signed divide 8 bytes, high input comes from r3, modulus goes to r3
- 0x5A div4 r, r, r Unsigned multiply 4 bytes, modulus goes to r3
- 0x5B idiv4 r, r, r Signed divide 4 bytes, modulus goes to r3
- 0x5C div2 r, r, r Unsigned divide 2 bytes, modulus goes to r3
- 0x5D idiv2 r, r, r Signed divide 2 bytes, modulus goes to r3
- 0x5E div1 r, r, r Unsigned divide 1 byte, modulus goes to r3
- 0x5F idiv1 r, r, r Signed divide 1 byte, modulus goes to r3
- Floating point operations (double): All operations touch flags as expected
- 0x60 fadd r, r, r
- 0x61 fsub r, r, r
- 0x62 fmul r, r, r
- 0x63 fdiv r, r, r
- 0x64 fmod r, r, r (modulus of negative divided by positive is negative)
- 0x65 f_op r, r, identifier
- i = 0: load constant
- r = r0: 0
- r = r1: 1
- r = r2: 2
- r = r3: -1
- r = r4: log2 e
- r = r5: log2 10
- r = r6: 1/log2 e
- r = r7: 1/log2 10
- r = r8: pi
- r = r9: e
- r = r10: golden ratio
- r = r11: sqrt(2)
- r = r12: the correct constant for implementing 4/5 rounding (hint: +0.5 is wrong)
- r = r30: +nan
- r = r31: inf
- r = r62 -nan
- r = r63: -inf
- i = 1: log2
- i = 2: 2^x
- i = 3: sin
- i = 4: cos
- i = 5: tan
- i = 6: invsin
- i = 7: invcos
- i = 8: invtan
- i = 9: sinh
- i = 10: cosh
- i = 11: tanh
- i = 12: 1/x
- i = 13: sqrt
- i = 14: floor
- i = 15: ceil
- i = 60: convert double to integer (floor rounding)
- i = 61: convert integer to double
- i = 62: convert double to single
- i = 63: convert single to double
- Privileged instructions
- 0x70 setmsr msr, r, #0
- 0x71 readmsr r, msr, #0
- 0x72 cli 0, 0 ,0
- 0x73 sti 0, 0, 0
- 0x74 hlt 0, 0, 0
- 0x75 iret 0, 0, 0
- 0x76 read i, r, #size
- 0x77 read r, r, #size
- 0x78 write r, i, #size
- 0x79 write r, i, #size
- 0x7F readmagic r
- The following MSRs exist.
- 0 = address of page table
- 1 = address of ISR page table
- 2 = address of ISR
- 3 = interrupt reason
- 0 = hardware interrupt line
- 1 = page read fault
- 2 = page write fault
- 3 = page execute fault
- 4 = bus fault
- 5 = memory fault
- 6 = memory checksum error
- 7 = instruction fault
- 8 = syscall
- 4 = interrupt line or faulting instruction
- 5 = fault address
- 8 = saved address of page table
- 9 = saved IP
- 10 = saved flags register
- 12 = ISR register #1 (recommendation: ISR stack pointer)
- 13 = ISR register #2 (recommendation: user stack pointer)
- 14 = ISR register #3 (recommendation: pointer to process structure)
- 15 = ISR register #4 (spare)
- Instruction 0x6F is special. When in user mode, it doesn't fault but returns the real value of MSR #0.
- In kernel mode, it simply faults like executing an invalid instruction. This bizarreness is intentional
- and is designed so that virtualization of the CPU can be detected but nobody will use readmagic where
- they meant readmsr r, msr, #0.
- Calling convention:
- Stack frame is 16 byte aligned
- Register 63 is the stack frame pointer
- Registers 2, 4-31 and 61-62 are preserved registers.
- Function return value goes in register 32
- if return doesn't fit in eight bytes, register 32 points to where to write the return data
- Register 3 may be clobbered at call time by trampolines inserted by the linker; since the compiler
- knows where this can occur this does not conflict with its use as mul/div high half
- Register 33 is the trampoline binding register (for pointer to function with closure)
- Register 34 contains the this pointer for target call site
- The first 16 arguments are written to registers 35-51
- if an argument is larger than 8 bytes it uses more than one register and therefore counts
- as more than one argument here
- Any remaining arguments are pushed to the stack, last argument first
- Integers shorter than 8 bytes are zero extended or signed extended as appropriate
- _chkstk takes its argument in r60, preserves all registers except r2 and r57-59 and returns nothing
- Typically, r2 will be saved in r56 when calling _chkstk
- _chkstk:
- add r57, r0, #4096
- add r58, r63, r0
- add r59, r63, r0
- sub r59, r59, r60
- sto8 r58, r58, #0
- sub r58, r57, r0
- sub r0, r58, r59
- nz nc jmp @-4
- sto8 r59, rf9, r59
- jmp r2
- Function types:
- A lightweight leaf function does not touch r63, retains its return address in r2 throughout, and catches nothing.
- We note that _chkstk is an example of a lightweight leaf function despite touching the stack area; but unless the
- platform defines a red zone we may not depending on anything being preserved there. A trampoline looks like a
- lightweight leaf function that executes a tail call.
- An unregistered middleweight function always has a frame pointer, is laid out in linear order so that disassembly
- can always find the function epilog, and never throws or catches exceptions itself. On most platforms, middleweight
- functions can only exist if they are dynamically generated because code pages can be paged out to source media and
- that media can fail to be accessible. (If the page file fails, that's probably unrecoverable.) Unregistered functions
- may not call _chkstk
- A heavyweight function needs to be registered so that the frame unwinder can find its epilog code and catch blocks.
- Loaded modules have their functions registered at link time.
- Function prolog always looks like this:
- 3 dd 0x00 ; Optional hotpatch slab
- function:
- and r0, r0, r0 ; Optional hotpatch nop hitpoint
- add r60, r0, #stackspace ; Allocate stack space
- add r56, r0, r2 ; If #stackspace > 4096 we must call _chkstk
- call _chkstk
- add r2, r0, r56
- sub r63, r63, r60
- ; If we are a varargs function, spill all ... registers here (16 - number of fixed arguments)
- ; Store additional registers to be preserved below largest stack space required for call
- ; we *must* use r3 as the scribble register for large offsets so the unwind code doesn't hate us
- add r3, r63, #large-offset
- sto8 r4, r3, #0 ; save call-clobbered registers
- sto8 r2, r63, #some-offset ; store return address second to last
- sto8 r61, r63, #some-offset+1 ; Store frame pointer last
- ;; If the function has a large number of locals, additional frame pointers may be established
- add r4, r61, #320
- ;; Do function body
- ; Epilog code; again r3 is the scribble register for large offsets
- lod8 r61, r63, #some-offset+1 ; restore old frame pointer
- ; the unwind disassembler locks on to this instruction to find the
- ; start of the epilog; either variant of lod8 may be used with any
- ; offset in range
- lod8 r2, r63, #some-offset ; restore old return address
- ; Restore additional registers
- add r3, r63, #large-offset
- lod8 r4, r3, #0
- ; Return
- jmp r2
- If the function calls a function that takes an excessive number of arguments, the saved registers may be accessed
- via r3 in a manner similar to how the stack is grown for allocating a buffer on the stack.
- This processor has the design of a complete address split so that no kernel-mode memory is mapped in user mode, and the opposite is also true.
- To ensure TLB consistency, the kernel must write to MSR #0 after editing page tables.
- Here follows a vaguely plausible iterrupt service routine:
- _isr: ; Interrupt service routine
- ; Please note the ISR is at the bottom of the stack and can't really follow the calling conventions
- ; It would be registered in the unwinder so that the unwinder knows to give up
- setmsr 13, r63
- loadc r63, _isrfault
- setmsr 1, r63
- setmsr 15, r2
- readmsr r2, 14
- ; Save registers
- lod8 r2, r2, #(ptable.registers - ptable)
- sto8 r3, r2, 3*8
- readmsr r3, 15
- sto8 r3, r2, 3*8
- readmsr r3, 10
- sto8 r3, r2, 0*8
- readmsr r3, 9
- sto8 r3, r2, 1*8
- sto8 r4, r2, 4*8
- sto8 r5, r2, 5*8
- ;...
- sto8 r63, r2, 63*8
- readmsr r63, 12
- readmsr r33, 3 ; Arguments for specific handler
- readmsr r34, 4
- readmsr r35, 5
- readmsr r36, 14
- sub r0, r33, 6
- c jmp _isrbad ; Processor is too new
- lod8 r37, _isrmastertable
- shl r33, r33, 3
- add r33, r33, r37
- call r2, r33 ; Dispatch to handlers
- _transition: ; interrupt return, also jumped to by syscall handler to change execution context
- ; so that system calls can block and the kernel can be preempted
- ; Clear any userspace watch as we almost ceratinly messed it up
- sub r2, r0, 16
- watch r2, #0
- ; Restore registers
- readmsr r2, 14 ; Might have changed
- lod8 r2, r2, #(ptable.registers - ptable)
- lod8 r63, r2, 63*8
- ;...
- lod8 r4, r2, 4*8
- lod8 r3, r2, 1*8
- setmsr 9, r3
- lod8 r3, r2, 0
- setmsr 10, r1
- lod8 r3, r2, 2*8
- setmsr 15, r3
- readmsr r2, 15
- iret
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement