SHARE
TWEET

My Instruction Set

joshudson Aug 6th, 2019 971 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1.  
  2.  
  3. General instruction encoding sequence:
  4.  
  5.    LSB                                                                                        MSB
  6.    0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 34 25 26 27 28 29 30 31
  7.    Z  C  V  NZ NC NV Opcode---------------- Destination------ Source1---------- Source2----------
  8.  
  9. Flags:
  10.     Z = last operation yielded a zero
  11.     C = last operation had an unsigned carry
  12.     V = last operation had a signed carry
  13.     I = interrupt flag
  14.     P = priviledged instructions allowed
  15.     X = paging in use
  16.  
  17. But the processor is little endian so the first byte contains 6 bits of mask followed by the low two
  18. bits of opcode in its highest two bits...
  19.  
  20. The first three bits skip the instruction if the flag is on, the second three skip if the flag is off.
  21. Many instructions have two forms, one that writes to the flags and one that doesn't. Some instructions
  22. never write to the flags at all.
  23.  
  24. Writing to IP without using a JMP instruction imposes a large stall penalty.
  25.  
  26. There are 64 addressible registers, of which 4 have special meanings to the processor.
  27. Register 0 = all zeros.
  28. Register 1 = IP
  29. Register 2 = return address
  30. Register 3 = multiply/divide high address
  31.  
  32. By convention, the following registers are reserved for their purpose
  33. Register 63 = stack pointer
  34. Register 62 = thread-local storage pointer
  35. Register 61 = frame pointer (if frame pointers are used)
  36.  
  37. Special Opcodes
  38.  
  39. 0x00  Fault         This instruction is always invalid, raises an error if not flag masked
  40. 0x01  Syscall       This instruction raises a system call
  41.  
  42. Bit twiddling instructions
  43.  
  44. 0x02  and   r, r, r  
  45. 0x03  andf  r, r, r
  46. 0x04  or    r, r, r
  47. 0x05  orf   r, r, r
  48. 0x06  xor   r, r, r
  49. 0x07  xorf  r, r, r
  50. 0x08  nor   r, r, r
  51. 0x09  norf  r, r, r
  52.  
  53. nop is made as and 0, 0, 0
  54. not is made as nor dest, src, 0
  55.  
  56. Bit shifting
  57.  
  58. 0x0A  shl   r, r, r
  59. 0x0B  shlf  r, r, r
  60. 0x0C  shl   r, r, i
  61. 0x0D  shlf  r, r, i
  62. 0x0E  shr   r, r, r
  63. 0x0F  shrf  r, r, r
  64. 0x10  shr   r, r, i
  65. 0x11  shrf  r, r, i
  66. 0x12  sar   r, r, r
  67. 0x13  sarf  r, r, r
  68. 0x14  sar   r, r, i
  69. 0x15  sarf  r, r, i
  70. 0x16  rol   r, r, r
  71. 0x17  ror   r, r, r
  72. 0x18  rol   r, r, i There's no need of an ror r, r, i as this can be encoded as rol r, r, 64 - i
  73. 0x1D  swab  r, r, i Register value conversion
  74.     i = 0: alias for mov
  75.     i = 1: reverse bits
  76.     i = 2: swap endian for 4x 2 byte numbers
  77.     i = 4: swap endian for 2x 4 byte numbers
  78.     i = 8: swap endian for 1x 8 byte number
  79.     i = 17: sign extend byte to 8 bytes
  80.     i = 19: sign extend two bytes to 8 bytes
  81.     i = 21: sign extend four bytes to 8 bytes
  82.     i = 33: zero extend byte to 8 bytes
  83.     i = 34: zero extend two bytes to 8 bytes
  84.     i = 36: zero extend four bytes to 8 bytes
  85.  
  86. JMP instructions
  87.  
  88. 0x1A  jmp near          The register range encodes a signed number between -2^20 and 2^20-4 where the bottom 2 bits must be zero
  89. 0x1B  call near         like jmp, but writes return address to r2
  90. 0x19  call indirect     Jumps to address in source1, writes return address to dest (should be r2; r0 results in jmp indirect)
  91.  
  92. Addition and subtraction
  93.  
  94. 0x20  add   r, r, r
  95. 0x21  addf  r, r, r
  96. 0x22  add   r, r, i Unsigned 0-63
  97. 0x23  addf  r, r, i
  98. 0x26  adcf  r, r, r     Also adds carry bit; this instruction is unusual in that there is no non-f version
  99. 0x27  sbbf  r, r, r     Also subtracts carry bit
  100. 0x28  sub   r, r, r
  101. 0x29  subf  r, r, r
  102. 0x2A  sub   r, r, i     Unsigned 0-63
  103. 0x2B  subf  r, r, i     Unsigned 0-63
  104. 0x2E  flags (dest, source1, i)
  105.             i = 0: clear carry
  106.             i = 1: set carry
  107.             i = 2: flip carry
  108.             i = 4: copy source1 to flags (does not touch flags other than Z C V)
  109.             i = 5: copy flags to dest (ditto)
  110.             i = 6: copy source1 to flags including priviledged flags
  111.             i = 7: copy flags to dest (including privileged flags)
  112.  
  113. Special long-form instructions for dealing with large constants. Add and sub take up 2 slots, where the first 6
  114. bits of the second slot are set to 1 bits in encoding, while the 6 bits in source2 slot provide the actual value.
  115. To reconstruct the constant, take the bottom 6 bits from the second 32 bit word, move them to the top 6 bits,
  116. then take the bottom six bits of the first word and move them into the second word.
  117.  
  118. 0x24  add   r, r, i     Unsigned 0-2^32-1
  119. 0x25  addf  r, r, i     Unsigned 0-2^32-1
  120. 0x2C  sub   r, r, i     Unsigned 0-2^32-1
  121. 0x2D  subf  r, r, i     Unsigned 0-2^32-1
  122.  
  123. loadc and jmp far occupy 3 slots each with the same basic idea; the second slot is repaired from source2 slot
  124. while the third slot is repaired from the source1 slot
  125. 0x1C  jmp far           Unsigned 0-2^64-1
  126. 0x1D  jmp far           Unsigned 0-2^64-1
  127. 0x1E  loadc r, i        Unsigned 0-2^64-1
  128.  
  129. Memory access
  130.  
  131. 0x30  load8  r, r, i    Loads 8 byte value into register; integer i is small offset (0-63)
  132. 0x31  load8h r, r, i    Continues load by handling alignment errors
  133. 0x32  load4  r, r, i    Loads 4 byte value into register
  134. 0x33  load4h r, r, i    Continues load by handlign alignment erros
  135. 0x34  load2  r, r, i    Loads 2 byte value into register
  136. 0x35  load2h r, r, i    Continues load by handling alignment erros
  137. 0x36  load1  r, r, i    Loads single byte into register
  138. 0x37  watch  0, r, i    monitors aligned 16 byte region for writes; each core can monitor 1 address
  139.                         do not watch the highest possible 16 bytes of virtual address space; this
  140.             is where the bogus watch is (in case of handling an interrupt)
  141. 0x38  sto8   r, r, i    Writes 8 byte value from register to memory
  142. 0x39  sto8h  r, r, i    Continues store by handling alignmnet errors
  143. 0x3A  sto4   r, r, i    Writes 4 byte value from register to memory
  144. 0x3B  sto4h  r, r, i    Continues store by handling alignment erros
  145. 0x3C  sto2   r, r, i    Writes 2 byte value from register to memory
  146. 0x3D  sto2h  r, r, i    Continues store by handling alignment erros
  147. 0x3E  sto1   r, r, i    Stores one byte value from register to memory
  148. 0x3F  storef r, r, i    If watch is good, writes 8 byte value from memory to register, clears all flags
  149.  
  150. ;The difference between the 0x3x and the 0x48 memory operations is the range of i; structures of 64 bytes
  151. ;or less may be accessed byte aligned with no additional instructions; larger structures can only be accessed
  152. ;at 8-byte aligned chunks without one additional instruction.
  153.  
  154. 0x40  load8  r, r, i    Loads 8 byte value into register; integer i is small offset (0-63)*8 + 64
  155. 0x41  load8h r, r, i    Continues load by handling alignment errors
  156. 0x42  load4  r, r, i    Loads 4 byte value into register
  157. 0x43  load4h r, r, i    Continues load by handlign alignment erros
  158. 0x44  load2  r, r, i    Loads 2 byte value into register
  159. 0x45  load2h r, r, i    Continues load by handling alignment erros
  160. 0x46  load1  r, r, i    Loads single byte into register
  161. 0x47  watch  0, r, i    monitors aligned 16 byte region for writes; each core can monitor 1 address
  162.                         do not watch the highest possible 16 bytes of virtual address space; this
  163.             is where the bogus watch is (in case of handling an interrupt)
  164. 0x48  sto8   r, r, i    Writes 8 byte value from register to memory
  165. 0x49  sto8h  r, r, i    Continues store by handling alignmnet errors
  166. 0x4A  sto4   r, r, i    Writes 4 byte value from register to memory
  167. 0x4B  sto4h  r, r, i    Continues store by handling alignment erros
  168. 0x4C  sto2   r, r, i    Writes 2 byte value from register to memory
  169. 0x4D  sto2h  r, r, i    Continues store by handling alignment erros
  170. 0x4E  sto1   r, r, i    Stores one byte value from register to memory
  171. 0x4F  storef r, r, i    If watch is good, writes 8 byte value from memory to register, clears all flags
  172.                         If watch is bad or was not watching this address, sets all flags
  173.  
  174. watch and storef implement atomic primitives. lock inc [ra] atomic is as follows:
  175.  
  176.     watch   ra
  177.     load8   rs1, ra, #0
  178.     addf    rs1, rs1, #1
  179.     flags   rs2, r0, #5
  180.     storef  rs1, ra, #0
  181.       z jmp near @-6         ; goes back to watch above
  182.         flags   r0, rs2, #4
  183.  
  184. Multiplication and Division:
  185.  
  186. multiply sets no flags; division sets Z flag = whether or not divide overflow occurred
  187. (division by zero or largest negative number)
  188.  
  189. 0x50  mul8  r, r, r Unsigned multiply 8 bytes; high output goes to r3
  190. 0x51  imul8 r, r, r Signed multiply 8 bytes; high output goes to r3
  191. 0x52  mul4  r, r, r Unsigned multiply 4 bytes
  192. 0x53  imul4 r, r, r Signed multiply 4 bytes
  193. 0x54  mul2  r, r, r Unsigned multiply 2 bytes
  194. 0x55  imul2 r, r, r Signed multiply 2 bytes
  195. 0x56  mul1  r, r, r Unsigned multiply 1 byte
  196. 0x57  imul1 r, r, r Signed multiply 1 byte
  197. 0x58  div8  r, r, r Unsigned divide 8 bytes; high input comes from r3, modulus goes to r3
  198. 0x59  div8  r, r, r Signed divide 8 bytes, high input comes from r3, modulus goes to r3
  199. 0x5A  div4  r, r, r Unsigned multiply 4 bytes, modulus goes to r3
  200. 0x5B  idiv4 r, r, r Signed divide 4 bytes, modulus goes to r3
  201. 0x5C  div2  r, r, r Unsigned divide 2 bytes, modulus goes to r3
  202. 0x5D  idiv2 r, r, r Signed divide 2 bytes, modulus goes to r3
  203. 0x5E  div1  r, r, r Unsigned divide 1 byte, modulus goes to r3
  204. 0x5F  idiv1 r, r, r Signed divide 1 byte, modulus goes to r3
  205.  
  206. Floating point operations (double):  All operations touch flags as expected
  207.  
  208. 0x60  fadd  r, r, r
  209. 0x61  fsub  r, r, r
  210. 0x62  fmul  r, r, r
  211. 0x63  fdiv  r, r, r
  212. 0x64  fmod  r, r, r (modulus of negative divided by positive is negative)
  213. 0x65  f_op  r, r, identifier
  214.     i = 0: load constant
  215.         r = r0: 0
  216.         r = r1: 1
  217.         r = r2: 2
  218.         r = r3: -1
  219.         r = r4: log2 e
  220.         r = r5: log2 10
  221.         r = r6: 1/log2 e
  222.         r = r7: 1/log2 10
  223.         r = r8: pi
  224.         r = r9: e
  225.         r = r10: golden ratio
  226.         r = r11: sqrt(2)
  227.         r = r12: the correct constant for implementing 4/5 rounding (hint: +0.5 is wrong)
  228.         r = r30: +nan
  229.         r = r31: inf
  230.         r = r62 -nan
  231.         r = r63: -inf
  232.     i = 1: log2
  233.     i = 2: 2^x
  234.     i = 3: sin
  235.     i = 4: cos
  236.     i = 5: tan
  237.     i = 6: invsin
  238.     i = 7: invcos
  239.     i = 8: invtan
  240.     i = 9: sinh
  241.     i = 10: cosh
  242.     i = 11: tanh
  243.     i = 12: 1/x
  244.     i = 13: sqrt
  245.     i = 14: floor
  246.     i = 15: ceil
  247.     i = 60: convert double to integer (floor rounding)
  248.     i = 61: convert integer to double
  249.     i = 62: convert double to single
  250.     i = 63: convert single to double
  251.  
  252. Privileged instructions
  253.  
  254. 0x70    setmsr  msr, r, #0
  255. 0x71    readmsr r, msr, #0
  256. 0x72    cli 0, 0 ,0
  257. 0x73    sti 0, 0, 0
  258. 0x74    hlt 0, 0, 0
  259. 0x75    iret    0, 0, 0
  260. 0x76    read    i, r, #size
  261. 0x77    read    r, r, #size
  262. 0x78    write   r, i, #size
  263. 0x79    write   r, i, #size
  264. 0x7F    readmagic r
  265.  
  266. The following MSRs exist.
  267.     0 = address of page table
  268.     1 = address of ISR page table
  269.     2 = address of ISR
  270.     3 = interrupt reason
  271.         0 = hardware interrupt line
  272.         1 = page read fault
  273.         2 = page write fault
  274.         3 = page execute fault
  275.         4 = bus fault
  276.         5 = memory fault
  277.         6 = memory checksum error
  278.         7 = instruction fault
  279.         8 = syscall
  280.     4 = interrupt line or faulting instruction
  281.     5 = fault address
  282.     8 = saved address of page table
  283.     9 = saved IP
  284.     10 = saved flags register
  285.     12 = ISR register #1 (recommendation: ISR stack pointer)
  286.     13 = ISR register #2 (recommendation: user stack pointer)
  287.     14 = ISR register #3 (recommendation: pointer to process structure)
  288.     15 = ISR register #4 (spare)
  289.  
  290. Instruction 0x6F is special. When in user mode, it doesn't fault but returns the real value of MSR #0.
  291. In kernel mode, it simply faults like executing an invalid instruction. This bizarreness is intentional
  292. and is designed so that virtualization of the CPU can be detected but nobody will use readmagic where
  293. they meant readmsr r, msr, #0.
  294.  
  295.  
  296. Calling convention:
  297.     Stack frame is 16 byte aligned
  298.     Register 63 is the stack frame pointer
  299.     Registers 2, 4-31 and 61-62 are preserved registers.
  300.     Function return value goes in register 32
  301.         if return doesn't fit in eight bytes, register 32 points to where to write the return data
  302.     Register 3 may be clobbered at call time by trampolines inserted by the linker; since the compiler
  303.         knows where this can occur this does not conflict with its use as mul/div high half
  304.     Register 33 is the trampoline binding register (for pointer to function with closure)
  305.     Register 34 contains the this pointer for target call site
  306.     The first 16 arguments are written to registers 35-51
  307.         if an argument is larger than 8 bytes it uses more than one register and therefore counts
  308.         as more than one argument here
  309.     Any remaining arguments are pushed to the stack, last argument first
  310.     Integers shorter than 8 bytes are zero extended or signed extended as appropriate
  311.     _chkstk takes its argument in r60, preserves all registers except r2 and r57-59 and returns nothing
  312.     Typically, r2 will be saved in r56 when calling _chkstk
  313.  
  314. _chkstk:
  315.     add r57, r0, #4096
  316.     add r58, r63, r0
  317.     add r59, r63, r0
  318.     sub r59, r59, r60
  319.     sto8    r58, r58, #0
  320.     sub r58, r57, r0
  321.     sub r0, r58, r59
  322.   nz nc jmp @-4
  323.     sto8    r59, rf9, r59
  324.     jmp r2
  325.  
  326. Function types:
  327.     A lightweight leaf function does not touch r63, retains its return address in r2 throughout, and catches nothing.
  328.     We note that _chkstk is an example of a lightweight leaf function despite touching the stack area; but unless the
  329.     platform defines a red zone we may not depending on anything being preserved there.  A trampoline looks like a
  330.     lightweight leaf function that executes a tail call.
  331.  
  332.     An unregistered middleweight function always has a frame pointer, is laid out in linear order so that disassembly
  333.     can always find the function epilog, and never throws or catches exceptions itself. On most platforms, middleweight
  334.     functions can only exist if they are dynamically generated because code pages can be paged out to source media and
  335.     that media can fail to be accessible. (If the page file fails, that's probably unrecoverable.) Unregistered functions
  336.     may not call _chkstk
  337.  
  338.     A heavyweight function needs to be registered so that the frame unwinder can find its epilog code and catch blocks.
  339.     Loaded modules have their functions registered at link time.
  340.  
  341. Function prolog always looks like this:
  342.  
  343.     3 dd 0x00   ; Optional hotpatch slab
  344. function:
  345.     and r0, r0, r0          ; Optional hotpatch nop hitpoint
  346.     add r60, r0, #stackspace        ; Allocate stack space
  347.     add r56, r0, r2         ; If #stackspace > 4096 we must call _chkstk
  348.     call    _chkstk
  349.     add r2, r0, r56
  350.     sub r63, r63, r60
  351.     ; If we are a varargs function, spill all ... registers here (16 - number of fixed arguments)
  352.     ; Store additional registers to be preserved below largest stack space required for call
  353.     ; we *must* use r3 as the scribble register for large offsets so the unwind code doesn't hate us
  354.     add r3, r63, #large-offset
  355.     sto8    r4, r3, #0          ; save call-clobbered registers
  356.     sto8    r2, r63, #some-offset       ; store return address second to last
  357.     sto8    r61, r63, #some-offset+1    ; Store frame pointer last
  358.  
  359.     ;; If the function has a large number of locals, additional frame pointers may be established
  360.     add r4, r61, #320
  361.  
  362.     ;; Do function body
  363.  
  364.     ; Epilog code; again r3 is the scribble register for large offsets
  365.     lod8    r61, r63, #some-offset+1    ; restore old frame pointer
  366.                         ; the unwind disassembler locks on to this instruction to find the
  367.                         ; start of the epilog; either variant of lod8 may be used with any
  368.                         ; offset in range
  369.     lod8    r2, r63, #some-offset       ; restore old return address
  370.     ; Restore additional registers
  371.     add r3, r63, #large-offset
  372.     lod8    r4, r3, #0
  373.     ; Return
  374.     jmp r2
  375.  
  376. If the function calls a function that takes an excessive number of arguments, the saved registers may be accessed
  377. via r3 in a manner similar to how the stack is grown for allocating a buffer on the stack.
  378.  
  379. This processor has the design of a complete address split so that no kernel-mode memory is mapped in user mode, and the opposite is also true.
  380. To ensure TLB consistency, the kernel must write to MSR #0 after editing page tables.
  381.  
  382. Here follows a vaguely plausible iterrupt service routine:
  383.  
  384. _isr:   ; Interrupt service routine
  385.     ; Please note the ISR is at the bottom of the stack and can't really follow the calling conventions
  386.     ; It would be registered in the unwinder so that the unwinder knows to give up
  387.     setmsr  13, r63
  388.     loadc   r63, _isrfault
  389.     setmsr  1, r63
  390.     setmsr  15, r2
  391.     readmsr r2, 14
  392.  
  393.     ; Save registers
  394.     lod8    r2, r2, #(ptable.registers - ptable)
  395.     sto8    r3, r2, 3*8
  396.     readmsr r3, 15
  397.     sto8    r3, r2, 3*8
  398.     readmsr r3, 10
  399.     sto8    r3, r2, 0*8
  400.     readmsr r3, 9
  401.     sto8    r3, r2, 1*8
  402.     sto8    r4, r2, 4*8
  403.     sto8    r5, r2, 5*8
  404.     ;...
  405.     sto8    r63, r2, 63*8
  406.  
  407.     readmsr r63, 12
  408.     readmsr r33, 3          ; Arguments for specific handler
  409.     readmsr r34, 4
  410.     readmsr r35, 5
  411.     readmsr r36, 14
  412.     sub r0, r33, 6
  413.  c  jmp _isrbad         ; Processor is too new
  414.     lod8    r37, _isrmastertable
  415.     shl r33, r33, 3
  416.     add r33, r33, r37
  417.     call    r2, r33         ; Dispatch to handlers
  418.  
  419. _transition: ; interrupt return, also jumped to by syscall handler to change execution context
  420.              ; so that system calls can block and the kernel can be preempted
  421.     ; Clear any userspace watch as we almost ceratinly messed it up
  422.     sub r2, r0, 16
  423.     watch   r2, #0
  424.     ; Restore registers
  425.     readmsr r2, 14          ; Might have changed
  426.     lod8    r2, r2, #(ptable.registers - ptable)
  427.     lod8    r63, r2, 63*8
  428.     ;...
  429.     lod8    r4, r2, 4*8
  430.     lod8    r3, r2, 1*8
  431.     setmsr  9, r3
  432.     lod8    r3, r2, 0
  433.     setmsr  10, r1
  434.     lod8    r3, r2, 2*8
  435.     setmsr  15, r3
  436.     readmsr r2, 15
  437.     iret
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
Not a member of Pastebin yet?
Sign Up, it unlocks many cool features!
 
Top