Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Analysis of M-Machine
- =====================
- Skullcode version: 1408337099853
- (possibly a UNIX timestamp in milliseconds, i.e. 2014-08-18 04:44:59.853 UTC).
- VM is a 32-bit CISC Von Neumann architecture with separate user/system rings and unrecoverable faults.
- Endianness appears to be platform-dependent (due to the use of JS typed arrays). Despite appearances, the design is much closer to a typical VM design than a practical machine.
- Rings
- -----
- The VM supports two rings, Ring 1 (User) and Ring 0 (System). The system boots in Ring 0 if the initial lock register value is 0, otherwise it will boot in Ring 1. Switching between rings uses the SET\_RING instruction, entering into Ring 0 from Ring 1 requires a 128-bit key to be specified or the VM will trap with an E\_PERM fault. Instructions specifying certain opcodes or utilizing the Register addressing mode may only be executed in Ring 0, and will trigger an E_PERM fault if executed in Ring 1.
- Faults
- ------
- Faults cause the VM to hang ('dead' state) and bubble up to the VM management code,
- which translates them to console messages. In addition to the reasons listed, faults
- can be triggered manually via the TRAP instruction.
- [number]: [name] - [message] - [description]
- 0: E_NONE - "OK."
- 1: E_PERM - "Permission Required." - Triggered on attempt to execute protected instruction in Ring 1.
- 2: E_NOMEM - "Out of memory." - Triggered when 'grow()' fails. 'grow()' appears to be unused.
- 3: E_INVALID - "Invalid operation." - Triggered when a malformed instruction is encountered.
- 4: E_ACCESS - "Invalid memory access." - Unused, there is no memory protection.
- 5: E_ALIGN - "Invalid access alignment." - Unused, unaligned accesses simply ignore the LSB.
- 6: E_UNKNOWN - "Unknown error." - Unused.
- 7: E_ENV - "Host environment exception." - Unused.
- 8: E_HALT - "Halted." - Triggered by executing HLT
- 9: E_NOSUPPORT - "Operation not supported." - Triggered on attempt to execute certain instructions.
- Registers
- ---------
- The VM supports 7 'normal' registers and 7 'special' registers.
- Only rParam can be directly read in Ring 1, though the registers 1-6 can be used as base registers for indexed addressing.
- Note that 0x00 and 0x08 are not valid registers and will trigger E_INVALID or unspecified (instruction-dependent) behaviour if accessed.
- Real:
- 0x01: rHeap
- 0x02: rParam - Index address for indexed/complex addressing, possibly intended as scratch memory/parameter storage area.
- 0x03: rText
- 0x04: rEntity
- 0x05: rCode - Base address for branches
- 0x06: rCall - Base address for the call stack
- 0x07: *flags register*
- Normal:
- 0x09: body (?)
- 0x0A: spine (?)
- 0x0B: free (?)
- 0x0C: seed (?)
- 0x0D: rSig
- 0x0E: rVirt
- 0x0F: (op - rCode) (?)
- Flags register is read only (E_INVALID on write?), with following format:
- fedcba9876543210 fedcba9876543210
- ?............... .............P?? // | cmp
- P - 1 if executing in Ring 1, else 0
- . - Read as zero, writes ignored.
- todo: work out cmp
- Call Stack
- ----------
- rCall stores the base address of a call stack. Position within the stack is stored in the *ret* internal register.
- Function calls are implemented via two instructions, VISIT and RET.
- VISIT pushes rParam (before adjustment) then PC, leading to the following call stack layout:
- 0x00: [rParam]
- 0x04: [ PC ]
- RET simply pops PC and rParam.
- Floating Point
- --------------
- The VM includes native support for floating point numbers, both single-precision and double-precision. Most arithmetic operations support floating point operations, but conversion between floating point and integer representations is not automatic (i.e. one cannot set *src* to a float and *dst* to an int, or vice versa).
- There does not appear to be an obvious way to convert between floating point and integer representations; it is possible this was the intended purpose of the CONV_INT32 opcode.
- Source/Destination Encoding
- ---------------------------
- Instructions marked with an (S) or (D) encode a source or destination in the instruction.
- The VM supports Immediate, Register, Indexed and Complex address modes, depending on the selected flags. These flags are shared between source and destination encodings. The immediate flag applies only to source operands. Source and destination operands cannot both use Complex addressing simultaneously; if Complex is used for the source operand Indexed must be used for the destination operand, and vice versa.
- The instruction source can be decoded by reading the following list and picking the first option that applies:
- * If the **Immediate** flag is set AND the **Size** field is 2 AND *imm* is non-zero, use Register addressing with the register specified by *imm* (Note 1).
- * If the **Immediate** flag is set, use Immediate addressing (Note 2).
- * If the **inDexed** flag is set OR the **Base** field is 0, use Indexed addressing with offset given by *imm*.
- * Use Complex addressing with offset given by *imm*.
- Note 1: If the instruction is executed in Ring 1, fail with E\_PERM. If an invalid register number is given, fail with E\_INVALID. If a register number of 0 is given for the destination, the value is discarded.
- Note 2: If the chosen size is 8-bit, the operand is encoded as *imm*, otherwise it will be encoded follow the instruction.
- The instruction destination can be decoded by reading the following list and picking the first option that applies:
- * If the **inDexed** flag is set AND the **Base** field is 0, use Register addressing with the register specified by *slot* (Note 1).
- * If the **inDexed** flag is set, use Complex addressing with offset given by *slot*.
- * Use Indexed addressing with offset given by *slot*.
- Note 1: If the source operand is a floating point number, the value written to the register is undefined. If the source operand is longer than 32-bit, only the first 32 bits will be stored in the register.
- Possible combinations of source and destination addressing modes and their encodings have been tabulated below:
- | Source | Destination | Immediate | Base | inDexed | Size | imm |
- |------------------|-------------|-----------|------|---------|------|------|
- | Immediate | Register | 1 | 0 | 1 | != 2 | X |
- | Immediate (Zero) | Register | 1 | 0 | 1 | 2 | 0 |
- | Immediate | Indexed | 1 | X | 0 | != 2 | X |
- | Immediate (Zero) | Indexed | 1 | X | 0 | 2 | 0 |
- | Immediate | Complex | 1 | != 0 | 1 | != 2 | X |
- | Immediate (Zero) | Complex | 1 | != 0 | 1 | 2 | 0 |
- | Register | Register | 1 | 0 | 1 | 2 | != 0 |
- | Register | Indexed | 1 | X | 0 | 2 | != 0 |
- | Register | Complex | 1 | X | 1 | 2 | != 0 |
- | Indexed | Register | 0 | 0 | 1 | X | X |
- | Indexed | Indexed | 0 | 0 | 0 | X | X |
- | Indexed | Complex | 0 | != 0 | 1 | X | X |
- | Complex | Register | - | - | - | - | - |
- | Complex | Indexed | 0 | !=0 | 0 | X | X |
- | Complex | Complex | - | - | - | - | - |
- Note: '!= n' indicates that the address mode applies when the field is NOT equal to n. 'X' indicates that the value has no effect on the choice of addressing mode. A row of '-' indicates that the listed combination of address modes cannot be encoded and hence is invalid.
- Operand size is encoded in the **Size** field, as follows:
- 0: 8-bit (byte) load
- 1: 16-bit (word) load
- 2: 32-bit (dword) load
- 3: 64-bit (qword) load
- 4: 128-bit (dqword) load
- 5: 32-bit Float (single) load
- 6: 64-bit Float (double) load
- 7: *Same as 4*
- 8-bit and 16-bit loads will be zero-extended if necessary. 8-bit loads are byte-aligned; 16-bit loads are word-aligned (i.e. LSB is ignored); 32-bit, 64-bit, 128-bit and Float (single) loads are dword-aligned (i.e. the 1st and 2nd LSB are ignored); and Float (double) loads are qword-aligned (i.e. the 1st through 3rd LSB are ignored).
- Complex addressing utilizes the **Base** field, which encodes a register (or zero) as follows:
- 0: *See below*
- 1-6: Corresponding register in register table
- 7: *Set base to zero*
- For source encoding, a base of 0 will decode to Indexed addressing. For destination encoding, a base of 0 will decode to Register addressing.
- Address Modes
- -------------
- **Register addressing** reads from or writes to a VM register directly. Register addressing is only supported in Ring 0, and will trigger an E_PERM fault if attempted in Ring 1.
- Example:
- MOV rParam, rHeap // Load the value of rParam into rHeap
- **Immediate addressing** is only valid for source operands, and encodes the operand value directly within the instruction. If the operand is 8-bit, it is encoded in the instruction *imm* field, otherwise it is encoded in the instruction dwords immediately following the current instruction. Padding is required for 16-bit and Float (double) sizes:
- * If the chosen size is 'double' and the next instruction offset is not 8-byte aligned, an additional 4 bytes of padding will be encoded BEFORE the operand to ensure the operand has 8-byte alignment.
- * If the chosen size is 16-bit, padding will be added AFTER the operand to ensure the PC has 4-byte alignment.
- Example:
- TRAP.BYTE =0x2 // Trigger an E_NOMEM fault
- **Indexed addressing** addresses a location in memory by adding an 8-bit offset encoded as *imm* (for source) or *slot (for destination) to the value of rParam.
- Example:
- MOV.DWORD [rParam+0x4], rText // Copy the 32-bit DWORD from the memory address specified by rParam+0x4 to rText
- **Complex addressing** (or more accurately, Based Memory Indirect Indexed Plus Displacement addressing) addresses a location in memory by first performing an indexed addressed load as described above to obtain an address, then adding the value of a base register to that address to get the target address.
- Example:
- MOV.BYTE [[rParam+0x8]+rHeap], [rParam+0xC] // Load the DWORD at rParam+0x8 and add rHeap to get the target address,
- // then load the data at the target address and copy it to rParam+0xC.
- This addressing mode allows register-relative addresses to be stored relative to rParam, allowing (for example) a 'heap' base address to be loaded into rHeap and pointers to data on the heap to be stored in the function-specific 'rParam' address.
- Registers 1-6 may be used as base registers,
- Instructions
- ------------
- General instruction format (LSB 0 numbering, names are provisional):
- fedcba9876543210 fedcba9876543210
- [ imm ][ slot ] I[B]D[S][opcode]
- imm: Immediate value, see 'Source/Destination Encoding'
- slot: ? (used by setdst)
- I: 'Immediate' flag, see 'Source/Destination Encoding'
- B: 'Base' field, see 'Source/Destination Encoding'
- D: 'inDexed' field, see 'Source/Destination Encoding'
- S: 'Size' field, see 'Source/Destination Encoding'
- opcode: Instruction opcode, see table.
- Opcodes are as follows (names are provisional, descriptions/tags are incomplete):
- 0x00 - HALT (P)
- Trap VM with error code E_HALT
- 0x01 - TRAP (S)
- Trap VM with error code in *src*
- 0x02 - WAIT (S)
- ?
- 0x03 - FINGER (SD)
- ?
- 0x04 - FINGER_RESET
- Clear finger state?
- 0x05 - ENHANCE (D)
- ?
- 0x06 - PUT16 (D)
- Write 0x00000000 0x00000000 0x00000000 0x00000016 to *dst*?
- 0x07 - ENHANCE_LOCK (P)
- ENHANCE and store result in lock registers
- 0x08 - SET_RING (S)
- Set the RING bit to LSB of *src*. If moving from Ring 1 to Ring 0, VM will trap with E_PERM unless "ENHANCE *src*" == *lock*?
- 0x09 - MOV (SD)
- Move data from *src* to *dst*
- 0x0A - COMPARE (S)
- ?
- 0x0B - B_*cond* (SU)
- Jump to rCode + *src* if condition is true.
- Format:
- fedcba9876543210 fedcba9876543210
- [ imm ][cond].. I[B]D[S][opcode]
- cond - condition to check.
- Conditions are as follows:
- 0x0: ALWAYS
- 0x1: ? (cmp bit 0x2 set)
- 0x2: ? (cmp bit 0x80000000 set)
- 0x3: ? (cmp bit 0x80000000 OR 0x2 set)
- 0x4: ? (cmp bit 0x80000000 AND 0x2 clear)
- 0x5: ? (cmp bit 0x80000000 clear)
- 0x6: ? (cmp bit 0x2 clear)
- 0x7: ? (cmp bit 0x1 set)
- 0x8: ? (cmp bit 0x1 OR 0x2 set)
- 0x9: ? (cmp bit 0x1 AND 0x2 clear)
- 0xA: ? (cmp bit 0x1 clear)
- All other conditions trap with E_INVALID.
- 0x0C - *unsupported*
- 0x0D - ADD (S) ???
- todo: see 'addition'
- 0x0E - SUB (S) ???
- todo: see 'subtraction'
- 0x0F - NOT (S) ???
- todo: see 'bitNOT'
- 0x10 - XOR (S) ???
- todo: see 'bitXOR'
- 0x11 - AND (S) ???
- todo: see 'bitAND'
- 0x12 - OR (S) ???
- todo: see 'bitOR'
- 0x13 - SHL ???
- todo: see 'bitSHL'
- 0x14 - SHRX ???
- todo: see 'bitSHRX'
- 0x15 - SHR ???
- todo: see 'bitSHR'
- 0x16 - SPINL ???
- todo: see 'bitSpinL'
- 0x17 - SPINR ???
- todo: see 'bitSpinR'
- 0x18 - SIGN_EXTEND (S) ???
- todo: see 'signExtend'
- 0x19 - NEGATE (S) ???
- todo: see 'negate'
- 0x1A - MULTIPLY (S) ???
- todo: see 'multiplication'
- 0x1B - DIVIDE (S) ???
- todo: see 'division'
- 0x1C - MODULO (S) ???
- todo: see 'modulous'
- 0x1D - DIVIDE_UNSIGNED (S) ???
- todo: see 'divisionUnsigned'
- 0x1E - MODULO_UNSIGNED (S) ???
- todo: see 'modulousUnsigned'
- 0x1F - VISIT (SU)
- Add *adj* to rParam and call function at rCode + *src* (refer to **Call Stack**).
- Format:
- fedcba9876543210 fedcba9876543210
- [ imm ][ adj ] I[B]D[S][opcode]
- adj - value to add to rParam (will be multiplied by 4).
- 0x20 - RET
- Return to caller (refer to **Call Stack**).
- 0x21 - *unsupported*
- 0x22 - *unsupported*
- 0x23 - CONV_INT32 (S)
- Appears to be unimplemented (*src* is decoded but no operation is performed).
- 0x24 - *unsupported*
- 0x25 - *unsupported*
- 0x26 - *unsupported*
- 0x27 - *unsupported*
- 0x28 - *unsupported*
- 0x29 - GET_TIME (D)
- Get current time as a floating point number (equiv. to javascript Date.now()). Requires floating-point destination.
- 0x2A - CLOCK (D)
- Get time since last call to CLOCK in seconds as an integer. Requires integer destination.
- Notes: Some opcodes have labels, these labels are to be interpreted as follows
- P - Protected Instruction (execution in Ring 1 will trap E_PERM)
- S - Uses SRC field (see SRC notes)
- D - Uses DST field (see DST notes)
- U - Uses instruction-specific format
- *unsupported* opcodes will trap with E\_NOSUPPORT. Unlisted opcodes will trap with E\_INVALID.
- VM internal names:
- cute = insn
- op = PC
- ring = current mode
- cmp = flags (excluding ring)
- v0, v1, v2, v3 = working registers?
- f0 = floating point working register?
- lock0, lock1, lock2, lock3 = 'Lock' registers for entering ring0?
- delta = time of previous call to CLOCK
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement