Advertisement
joshudson

My Instruction Set

Aug 6th, 2019
1,860
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 16.88 KB | None | 0 0
  1.  
  2.  
  3. General instruction encoding sequence:
  4.  
  5. LSB MSB
  6. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 34 25 26 27 28 29 30 31
  7. Z C V NZ NC NV Opcode---------------- Destination------ Source1---------- Source2----------
  8.  
  9. Flags:
  10. Z = last operation yielded a zero
  11. C = last operation had an unsigned carry
  12. V = last operation had a signed carry
  13. I = interrupt flag
  14. P = priviledged instructions allowed
  15. X = paging in use
  16.  
  17. But the processor is little endian so the first byte contains 6 bits of mask followed by the low two
  18. bits of opcode in its highest two bits...
  19.  
  20. The first three bits skip the instruction if the flag is on, the second three skip if the flag is off.
  21. Many instructions have two forms, one that writes to the flags and one that doesn't. Some instructions
  22. never write to the flags at all.
  23.  
  24. Writing to IP without using a JMP instruction imposes a large stall penalty.
  25.  
  26. There are 64 addressible registers, of which 4 have special meanings to the processor.
  27. Register 0 = all zeros.
  28. Register 1 = IP
  29. Register 2 = return address
  30. Register 3 = multiply/divide high address
  31.  
  32. By convention, the following registers are reserved for their purpose
  33. Register 63 = stack pointer
  34. Register 62 = thread-local storage pointer
  35. Register 61 = frame pointer (if frame pointers are used)
  36.  
  37. Special Opcodes
  38.  
  39. 0x00 Fault This instruction is always invalid, raises an error if not flag masked
  40. 0x01 Syscall This instruction raises a system call
  41.  
  42. Bit twiddling instructions
  43.  
  44. 0x02 and r, r, r
  45. 0x03 andf r, r, r
  46. 0x04 or r, r, r
  47. 0x05 orf r, r, r
  48. 0x06 xor r, r, r
  49. 0x07 xorf r, r, r
  50. 0x08 nor r, r, r
  51. 0x09 norf r, r, r
  52.  
  53. nop is made as and 0, 0, 0
  54. not is made as nor dest, src, 0
  55.  
  56. Bit shifting
  57.  
  58. 0x0A shl r, r, r
  59. 0x0B shlf r, r, r
  60. 0x0C shl r, r, i
  61. 0x0D shlf r, r, i
  62. 0x0E shr r, r, r
  63. 0x0F shrf r, r, r
  64. 0x10 shr r, r, i
  65. 0x11 shrf r, r, i
  66. 0x12 sar r, r, r
  67. 0x13 sarf r, r, r
  68. 0x14 sar r, r, i
  69. 0x15 sarf r, r, i
  70. 0x16 rol r, r, r
  71. 0x17 ror r, r, r
  72. 0x18 rol r, r, i There's no need of an ror r, r, i as this can be encoded as rol r, r, 64 - i
  73. 0x1D swab r, r, i Register value conversion
  74. i = 0: alias for mov
  75. i = 1: reverse bits
  76. i = 2: swap endian for 4x 2 byte numbers
  77. i = 4: swap endian for 2x 4 byte numbers
  78. i = 8: swap endian for 1x 8 byte number
  79. i = 17: sign extend byte to 8 bytes
  80. i = 19: sign extend two bytes to 8 bytes
  81. i = 21: sign extend four bytes to 8 bytes
  82. i = 33: zero extend byte to 8 bytes
  83. i = 34: zero extend two bytes to 8 bytes
  84. i = 36: zero extend four bytes to 8 bytes
  85.  
  86. JMP instructions
  87.  
  88. 0x1A jmp near The register range encodes a signed number between -2^20 and 2^20-4 where the bottom 2 bits must be zero
  89. 0x1B call near like jmp, but writes return address to r2
  90. 0x19 call indirect Jumps to address in source1, writes return address to dest (should be r2; r0 results in jmp indirect)
  91.  
  92. Addition and subtraction
  93.  
  94. 0x20 add r, r, r
  95. 0x21 addf r, r, r
  96. 0x22 add r, r, i Unsigned 0-63
  97. 0x23 addf r, r, i
  98. 0x26 adcf r, r, r Also adds carry bit; this instruction is unusual in that there is no non-f version
  99. 0x27 sbbf r, r, r Also subtracts carry bit
  100. 0x28 sub r, r, r
  101. 0x29 subf r, r, r
  102. 0x2A sub r, r, i Unsigned 0-63
  103. 0x2B subf r, r, i Unsigned 0-63
  104. 0x2E flags (dest, source1, i)
  105. i = 0: clear carry
  106. i = 1: set carry
  107. i = 2: flip carry
  108. i = 4: copy source1 to flags (does not touch flags other than Z C V)
  109. i = 5: copy flags to dest (ditto)
  110. i = 6: copy source1 to flags including priviledged flags
  111. i = 7: copy flags to dest (including privileged flags)
  112.  
  113. Special long-form instructions for dealing with large constants. Add and sub take up 2 slots, where the first 6
  114. bits of the second slot are set to 1 bits in encoding, while the 6 bits in source2 slot provide the actual value.
  115. To reconstruct the constant, take the bottom 6 bits from the second 32 bit word, move them to the top 6 bits,
  116. then take the bottom six bits of the first word and move them into the second word.
  117.  
  118. 0x24 add r, r, i Unsigned 0-2^32-1
  119. 0x25 addf r, r, i Unsigned 0-2^32-1
  120. 0x2C sub r, r, i Unsigned 0-2^32-1
  121. 0x2D subf r, r, i Unsigned 0-2^32-1
  122.  
  123. loadc and jmp far occupy 3 slots each with the same basic idea; the second slot is repaired from source2 slot
  124. while the third slot is repaired from the source1 slot
  125. 0x1C jmp far Unsigned 0-2^64-1
  126. 0x1D jmp far Unsigned 0-2^64-1
  127. 0x1E loadc r, i Unsigned 0-2^64-1
  128.  
  129. Memory access
  130.  
  131. 0x30 load8 r, r, i Loads 8 byte value into register; integer i is small offset (0-63)
  132. 0x31 load8h r, r, i Continues load by handling alignment errors
  133. 0x32 load4 r, r, i Loads 4 byte value into register
  134. 0x33 load4h r, r, i Continues load by handlign alignment erros
  135. 0x34 load2 r, r, i Loads 2 byte value into register
  136. 0x35 load2h r, r, i Continues load by handling alignment erros
  137. 0x36 load1 r, r, i Loads single byte into register
  138. 0x37 watch 0, r, i monitors aligned 16 byte region for writes; each core can monitor 1 address
  139. do not watch the highest possible 16 bytes of virtual address space; this
  140. is where the bogus watch is (in case of handling an interrupt)
  141. 0x38 sto8 r, r, i Writes 8 byte value from register to memory
  142. 0x39 sto8h r, r, i Continues store by handling alignmnet errors
  143. 0x3A sto4 r, r, i Writes 4 byte value from register to memory
  144. 0x3B sto4h r, r, i Continues store by handling alignment erros
  145. 0x3C sto2 r, r, i Writes 2 byte value from register to memory
  146. 0x3D sto2h r, r, i Continues store by handling alignment erros
  147. 0x3E sto1 r, r, i Stores one byte value from register to memory
  148. 0x3F storef r, r, i If watch is good, writes 8 byte value from memory to register, clears all flags
  149.  
  150. ;The difference between the 0x3x and the 0x48 memory operations is the range of i; structures of 64 bytes
  151. ;or less may be accessed byte aligned with no additional instructions; larger structures can only be accessed
  152. ;at 8-byte aligned chunks without one additional instruction.
  153.  
  154. 0x40 load8 r, r, i Loads 8 byte value into register; integer i is small offset (0-63)*8 + 64
  155. 0x41 load8h r, r, i Continues load by handling alignment errors
  156. 0x42 load4 r, r, i Loads 4 byte value into register
  157. 0x43 load4h r, r, i Continues load by handlign alignment erros
  158. 0x44 load2 r, r, i Loads 2 byte value into register
  159. 0x45 load2h r, r, i Continues load by handling alignment erros
  160. 0x46 load1 r, r, i Loads single byte into register
  161. 0x47 watch 0, r, i monitors aligned 16 byte region for writes; each core can monitor 1 address
  162. do not watch the highest possible 16 bytes of virtual address space; this
  163. is where the bogus watch is (in case of handling an interrupt)
  164. 0x48 sto8 r, r, i Writes 8 byte value from register to memory
  165. 0x49 sto8h r, r, i Continues store by handling alignmnet errors
  166. 0x4A sto4 r, r, i Writes 4 byte value from register to memory
  167. 0x4B sto4h r, r, i Continues store by handling alignment erros
  168. 0x4C sto2 r, r, i Writes 2 byte value from register to memory
  169. 0x4D sto2h r, r, i Continues store by handling alignment erros
  170. 0x4E sto1 r, r, i Stores one byte value from register to memory
  171. 0x4F storef r, r, i If watch is good, writes 8 byte value from memory to register, clears all flags
  172. If watch is bad or was not watching this address, sets all flags
  173.  
  174. watch and storef implement atomic primitives. lock inc [ra] atomic is as follows:
  175.  
  176. watch ra
  177. load8 rs1, ra, #0
  178. addf rs1, rs1, #1
  179. flags rs2, r0, #5
  180. storef rs1, ra, #0
  181. z jmp near @-6 ; goes back to watch above
  182. flags r0, rs2, #4
  183.  
  184. Multiplication and Division:
  185.  
  186. multiply sets no flags; division sets Z flag = whether or not divide overflow occurred
  187. (division by zero or largest negative number)
  188.  
  189. 0x50 mul8 r, r, r Unsigned multiply 8 bytes; high output goes to r3
  190. 0x51 imul8 r, r, r Signed multiply 8 bytes; high output goes to r3
  191. 0x52 mul4 r, r, r Unsigned multiply 4 bytes
  192. 0x53 imul4 r, r, r Signed multiply 4 bytes
  193. 0x54 mul2 r, r, r Unsigned multiply 2 bytes
  194. 0x55 imul2 r, r, r Signed multiply 2 bytes
  195. 0x56 mul1 r, r, r Unsigned multiply 1 byte
  196. 0x57 imul1 r, r, r Signed multiply 1 byte
  197. 0x58 div8 r, r, r Unsigned divide 8 bytes; high input comes from r3, modulus goes to r3
  198. 0x59 div8 r, r, r Signed divide 8 bytes, high input comes from r3, modulus goes to r3
  199. 0x5A div4 r, r, r Unsigned multiply 4 bytes, modulus goes to r3
  200. 0x5B idiv4 r, r, r Signed divide 4 bytes, modulus goes to r3
  201. 0x5C div2 r, r, r Unsigned divide 2 bytes, modulus goes to r3
  202. 0x5D idiv2 r, r, r Signed divide 2 bytes, modulus goes to r3
  203. 0x5E div1 r, r, r Unsigned divide 1 byte, modulus goes to r3
  204. 0x5F idiv1 r, r, r Signed divide 1 byte, modulus goes to r3
  205.  
  206. Floating point operations (double): All operations touch flags as expected
  207.  
  208. 0x60 fadd r, r, r
  209. 0x61 fsub r, r, r
  210. 0x62 fmul r, r, r
  211. 0x63 fdiv r, r, r
  212. 0x64 fmod r, r, r (modulus of negative divided by positive is negative)
  213. 0x65 f_op r, r, identifier
  214. i = 0: load constant
  215. r = r0: 0
  216. r = r1: 1
  217. r = r2: 2
  218. r = r3: -1
  219. r = r4: log2 e
  220. r = r5: log2 10
  221. r = r6: 1/log2 e
  222. r = r7: 1/log2 10
  223. r = r8: pi
  224. r = r9: e
  225. r = r10: golden ratio
  226. r = r11: sqrt(2)
  227. r = r12: the correct constant for implementing 4/5 rounding (hint: +0.5 is wrong)
  228. r = r30: +nan
  229. r = r31: inf
  230. r = r62 -nan
  231. r = r63: -inf
  232. i = 1: log2
  233. i = 2: 2^x
  234. i = 3: sin
  235. i = 4: cos
  236. i = 5: tan
  237. i = 6: invsin
  238. i = 7: invcos
  239. i = 8: invtan
  240. i = 9: sinh
  241. i = 10: cosh
  242. i = 11: tanh
  243. i = 12: 1/x
  244. i = 13: sqrt
  245. i = 14: floor
  246. i = 15: ceil
  247. i = 60: convert double to integer (floor rounding)
  248. i = 61: convert integer to double
  249. i = 62: convert double to single
  250. i = 63: convert single to double
  251.  
  252. Privileged instructions
  253.  
  254. 0x70 setmsr msr, r, #0
  255. 0x71 readmsr r, msr, #0
  256. 0x72 cli 0, 0 ,0
  257. 0x73 sti 0, 0, 0
  258. 0x74 hlt 0, 0, 0
  259. 0x75 iret 0, 0, 0
  260. 0x76 read i, r, #size
  261. 0x77 read r, r, #size
  262. 0x78 write r, i, #size
  263. 0x79 write r, i, #size
  264. 0x7F readmagic r
  265.  
  266. The following MSRs exist.
  267. 0 = address of page table
  268. 1 = address of ISR page table
  269. 2 = address of ISR
  270. 3 = interrupt reason
  271. 0 = hardware interrupt line
  272. 1 = page read fault
  273. 2 = page write fault
  274. 3 = page execute fault
  275. 4 = bus fault
  276. 5 = memory fault
  277. 6 = memory checksum error
  278. 7 = instruction fault
  279. 8 = syscall
  280. 4 = interrupt line or faulting instruction
  281. 5 = fault address
  282. 8 = saved address of page table
  283. 9 = saved IP
  284. 10 = saved flags register
  285. 12 = ISR register #1 (recommendation: ISR stack pointer)
  286. 13 = ISR register #2 (recommendation: user stack pointer)
  287. 14 = ISR register #3 (recommendation: pointer to process structure)
  288. 15 = ISR register #4 (spare)
  289.  
  290. Instruction 0x6F is special. When in user mode, it doesn't fault but returns the real value of MSR #0.
  291. In kernel mode, it simply faults like executing an invalid instruction. This bizarreness is intentional
  292. and is designed so that virtualization of the CPU can be detected but nobody will use readmagic where
  293. they meant readmsr r, msr, #0.
  294.  
  295.  
  296. Calling convention:
  297. Stack frame is 16 byte aligned
  298. Register 63 is the stack frame pointer
  299. Registers 2, 4-31 and 61-62 are preserved registers.
  300. Function return value goes in register 32
  301. if return doesn't fit in eight bytes, register 32 points to where to write the return data
  302. Register 3 may be clobbered at call time by trampolines inserted by the linker; since the compiler
  303. knows where this can occur this does not conflict with its use as mul/div high half
  304. Register 33 is the trampoline binding register (for pointer to function with closure)
  305. Register 34 contains the this pointer for target call site
  306. The first 16 arguments are written to registers 35-51
  307. if an argument is larger than 8 bytes it uses more than one register and therefore counts
  308. as more than one argument here
  309. Any remaining arguments are pushed to the stack, last argument first
  310. Integers shorter than 8 bytes are zero extended or signed extended as appropriate
  311. _chkstk takes its argument in r60, preserves all registers except r2 and r57-59 and returns nothing
  312. Typically, r2 will be saved in r56 when calling _chkstk
  313.  
  314. _chkstk:
  315. add r57, r0, #4096
  316. add r58, r63, r0
  317. add r59, r63, r0
  318. sub r59, r59, r60
  319. sto8 r58, r58, #0
  320. sub r58, r57, r0
  321. sub r0, r58, r59
  322. nz nc jmp @-4
  323. sto8 r59, rf9, r59
  324. jmp r2
  325.  
  326. Function types:
  327. A lightweight leaf function does not touch r63, retains its return address in r2 throughout, and catches nothing.
  328. We note that _chkstk is an example of a lightweight leaf function despite touching the stack area; but unless the
  329. platform defines a red zone we may not depending on anything being preserved there. A trampoline looks like a
  330. lightweight leaf function that executes a tail call.
  331.  
  332. An unregistered middleweight function always has a frame pointer, is laid out in linear order so that disassembly
  333. can always find the function epilog, and never throws or catches exceptions itself. On most platforms, middleweight
  334. functions can only exist if they are dynamically generated because code pages can be paged out to source media and
  335. that media can fail to be accessible. (If the page file fails, that's probably unrecoverable.) Unregistered functions
  336. may not call _chkstk
  337.  
  338. A heavyweight function needs to be registered so that the frame unwinder can find its epilog code and catch blocks.
  339. Loaded modules have their functions registered at link time.
  340.  
  341. Function prolog always looks like this:
  342.  
  343. 3 dd 0x00 ; Optional hotpatch slab
  344. function:
  345. and r0, r0, r0 ; Optional hotpatch nop hitpoint
  346. add r60, r0, #stackspace ; Allocate stack space
  347. add r56, r0, r2 ; If #stackspace > 4096 we must call _chkstk
  348. call _chkstk
  349. add r2, r0, r56
  350. sub r63, r63, r60
  351. ; If we are a varargs function, spill all ... registers here (16 - number of fixed arguments)
  352. ; Store additional registers to be preserved below largest stack space required for call
  353. ; we *must* use r3 as the scribble register for large offsets so the unwind code doesn't hate us
  354. add r3, r63, #large-offset
  355. sto8 r4, r3, #0 ; save call-clobbered registers
  356. sto8 r2, r63, #some-offset ; store return address second to last
  357. sto8 r61, r63, #some-offset+1 ; Store frame pointer last
  358.  
  359. ;; If the function has a large number of locals, additional frame pointers may be established
  360. add r4, r61, #320
  361.  
  362. ;; Do function body
  363.  
  364. ; Epilog code; again r3 is the scribble register for large offsets
  365. lod8 r61, r63, #some-offset+1 ; restore old frame pointer
  366. ; the unwind disassembler locks on to this instruction to find the
  367. ; start of the epilog; either variant of lod8 may be used with any
  368. ; offset in range
  369. lod8 r2, r63, #some-offset ; restore old return address
  370. ; Restore additional registers
  371. add r3, r63, #large-offset
  372. lod8 r4, r3, #0
  373. ; Return
  374. jmp r2
  375.  
  376. If the function calls a function that takes an excessive number of arguments, the saved registers may be accessed
  377. via r3 in a manner similar to how the stack is grown for allocating a buffer on the stack.
  378.  
  379. This processor has the design of a complete address split so that no kernel-mode memory is mapped in user mode, and the opposite is also true.
  380. To ensure TLB consistency, the kernel must write to MSR #0 after editing page tables.
  381.  
  382. Here follows a vaguely plausible iterrupt service routine:
  383.  
  384. _isr: ; Interrupt service routine
  385. ; Please note the ISR is at the bottom of the stack and can't really follow the calling conventions
  386. ; It would be registered in the unwinder so that the unwinder knows to give up
  387. setmsr 13, r63
  388. loadc r63, _isrfault
  389. setmsr 1, r63
  390. setmsr 15, r2
  391. readmsr r2, 14
  392.  
  393. ; Save registers
  394. lod8 r2, r2, #(ptable.registers - ptable)
  395. sto8 r3, r2, 3*8
  396. readmsr r3, 15
  397. sto8 r3, r2, 3*8
  398. readmsr r3, 10
  399. sto8 r3, r2, 0*8
  400. readmsr r3, 9
  401. sto8 r3, r2, 1*8
  402. sto8 r4, r2, 4*8
  403. sto8 r5, r2, 5*8
  404. ;...
  405. sto8 r63, r2, 63*8
  406.  
  407. readmsr r63, 12
  408. readmsr r33, 3 ; Arguments for specific handler
  409. readmsr r34, 4
  410. readmsr r35, 5
  411. readmsr r36, 14
  412. sub r0, r33, 6
  413. c jmp _isrbad ; Processor is too new
  414. lod8 r37, _isrmastertable
  415. shl r33, r33, 3
  416. add r33, r33, r37
  417. call r2, r33 ; Dispatch to handlers
  418.  
  419. _transition: ; interrupt return, also jumped to by syscall handler to change execution context
  420. ; so that system calls can block and the kernel can be preempted
  421. ; Clear any userspace watch as we almost ceratinly messed it up
  422. sub r2, r0, 16
  423. watch r2, #0
  424. ; Restore registers
  425. readmsr r2, 14 ; Might have changed
  426. lod8 r2, r2, #(ptable.registers - ptable)
  427. lod8 r63, r2, 63*8
  428. ;...
  429. lod8 r4, r2, 4*8
  430. lod8 r3, r2, 1*8
  431. setmsr 9, r3
  432. lod8 r3, r2, 0
  433. setmsr 10, r1
  434. lod8 r3, r2, 2*8
  435. setmsr 15, r3
  436. readmsr r2, 15
  437. iret
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement