CHIP8/SCHIP - Emulation Guide (Updated March 2024)

THIS DOCUMENT GOES INTO DETAIL ABOUT IMPLEMENTING PRACTICALLY ALL
CHIP-8 AND SUPERCHIP OPCODES, WITH PERSONALIZED NOTES FROM EXPERIENCE.

First things first, let's review the arrays and data types and their
initialization. Please bear in mind that every one of them is an unsigned
integer, and over/under flow can and *will* occur.

	'I' is your index register.
		It has a size of 2 bytes (but effectively only needs 12 bits) (up to 0xFFFF)
		Initialized at value: 0

	'PC' is your program counter.
		It has a type of 2 bytes (but effectively only needs 12 bits) (up to 0xFFFF)
		Initialized at value: 512 (0x0200)

	'SP' is your stack index pointer.
		Its type is irrelevant, but at least 1 byte is needed. It's tied to the STACK array mentioned below.
		Initialized at value: 0

	'STACK' represents your routine stack array.
		It is made up of 16 levels (STACK[0x0]..STACK[0xF])
		Each slot has a size of 2 bytes (up to 0xFFFF)
		Initialized at value: 0

	'V' represents your V registers array.
		There are a total of 16 registers (V[0x0]..V[0xF])
		Each register has a size of 1 byte (up to 0xFF)
		Initialized at value: 0

	'MEM' represents your memory array.
		There are a total of 4096 slots (MEM[0x000]..MEM[0xFFF])
			-- on XO-CHIP, it has 65.536 slots (MEM[0x0000]..MEM[0xFFFF])
		Each slot has a size of 1 byte (up to 0xFF)
		Initialized at value: 0

	'KEY' represents your keypad array.
		There are a total of 16 keys (KEY[0x0]..KEY[0xF])
		Each key has a boolean value (either 1 or 0)
		Initialized at value: false
			-- Note that it's also valid to implement the keys as a single bitwise value as well, so long as you adjust your code accordingly.

	'RPL' represents your persistent registers array. They are unique for each rom.
		There are a total of 16 registers (RPL[0x0]..RPL[0xF])
			-- the original hardware only allowed use of the first 8 registers, but modern implementations and roms may require all 16. Their use is extremely rare.
		Each register has a size of 1 byte (up to 0xFF)
		Initialized at value: 0

	'VRAM' represents your display pixel array.
		There are a total of 128 pixels horizontally, and 64 vertically, for a total of 8192.
		It should be noted that the system always begins by only allowing the top-left quadrant of the total VRAM area to be seen. This can be changed by instructions.
		The exact implementation of this depends on your preference and ability:
			A) could be a 2D boolean array: VRAM[0][0]..VRAM[127/63]
			B) could be a 1D boolean array: VRAM[0]..VRAM[8191]
			C) or it could be a 1D byte array: VRAM[0]..VRAM[1023]
			D) or even a 2D byte array: VRAM[0][0]..VRAM[15][63]
				-- I will only be providing instruction examples for method A.
		Initialized at value: 0

	'DELAY_TIMER' is what the name implies. Counts down once per frame if the value is not 0.
		It has a size of 1 byte (up to 0xFF)
		Initialized at value: 0

	'SOUND_TIMER' is what the name implies. Counts down once per frame if the value is not 0. The buzzer is active for as long as the timer value is not 0 too.
		It has a size of 1 byte (up to 0xFF)
		Initialized at value: 0

Having said that, let's establish some platform basics. The system has a screen refresh rate of 60 Hz, and the two timers it has also count down by 1 for every frame in tandem if their value is non-zero.
This does not, however, imply that the "cpu" of the system runs at the same rate. Some people prefer to have a separate timing system to finely-tune how many instructions they run per second (IPS). This approach, while valid, is also more complicated to pull off. If you're aiming for simplicity, what I would recommend is to run a fixed amount of instructions per frame (IPF). This means that you only need to bother with a single timer implementation to control your 60 Hz loop and nothing more.

The typical chip-8 emulation speed is around 540-660 IPS, and for super-chip it's around 1800 IPS. To match that in IPF, you multiply IPF by 60 and pick whichever multiple is closest or feels best for you. Do keep in mind though that these speeds concern old roms for these two platforms. There exist newer chip-8 and super-chip roms that may require IPS in the hundreds, or more, to run at proper speed, thus your IPF should be a configurable variable.

Now that you're a bit more informed on the topic of emulation speed, back to the details. Starting off with the timers, as mentioned, both count down at the same rate as the screen refreshes. The sound timer specifically though will produce a (usually) square wave tone when its value is non-zero. Given that some roms may have obnoxiously long buzz sequences, I'd recommend that the tone you pick isn't ear-piercing, nor too loud.

As is usually the start to any emulation journey, you'll first need to have some data at the ready for you to play around with, and this means loading a rom into memory. The implementation for that part will be up to you, but here's the general guideline on how to arrange things:

MEM[0]..MEM[79] = FONT DATA (chip-8)
MEM[80]..MEM[239] = BIG FONT DATA (super-chip)
MEM[512]..onwards = ROM DATA

If you followed the notes on initialization values for the rest earlier on, then that's about it for this segment. Different extensions such as Chip-8 HiRes or Chip-8X have different init values for certain variables, or extra data to tango with. I will not be detailing these here.


///////////////////////////////////////////////////////////////////////////
// This next segment will go over some basic implementations for the     //
// instructions themselves, extra notes about them and any applicable    //
// quirks, and generally spoilers. You have been warned.                 //
///////////////////////////////////////////////////////////////////////////

As described previously, we want to be running multiple instructions per frame, so bear in mind that the following process (which ideally should be a function of its own for ease of use) will be running many times each frame.

The very first thing we must do is, of course, to fetch some data from memory and assemble our instruction (also known as opcode). In this system, instructions are always 2 bytes long -- but make no mistake, it does not imply that they will always be aligned in memory to start from an even index. A rom may take such routes that the program counter would start assembling an opcode from an odd index too. Anyway, here's how we'll start:

OPCODE = (MEM[PC] << 8) | MEM[PC+1]

Simple enough! I'd like to take a moment to note here that most instructions will bump the program counter (PC) up by 2 at their end. Rather than risk making some mistake in this process, it's easier and prudent to play it safe and simply increment the PC as the very next step:

PC += 2

There we go. Before we dive deeper into the instructions themselves, I like to set up some commonly used variables ahead of time. It's *technically* wasteful when not all of them will come into use, but I like the clarity of avoiding macros or redefining the needed bits in every single instruction:

NNN = OPCODE & 0x0FFF // our 12 bit JUMP address
NN  = OPCODE & 0x00FF // the lowest byte (also seen as KK in other guides)

Now for the individual nibbles.

P  = (OPCODE & 0xF000) >> 12 // 1st nibble - most significant
X  = (OPCODE & 0x0F00) >>  8 // 2nd nibble - also known as X in opcodes
Y  = (OPCODE & 0x00F0) >>  4 // 3rd nibble - also known as Y in opcodes
N  =  OPCODE & 0x000F        // 4th nibble - least significant

And yes, there's more efficient ways to arrange these if you want. You can figure them out if you think about them, but easier understanding is what I'm going for.

Now we can tackle the instructions themselves. There's a switch/case tree at the bottom of this doc that showcases an example structure for the opcode matching process. If you have a different approach you're always welcome to experiment, just make sure not to leave any holes. It's bad practice in general to allow any kind of potential mismatching.

Let's get to bashing the opcode into actual code.


.... 00CN - scroll display N lines down (SUPER-CHIP)
	Its purpose is to scroll the VRAM a certain amount of rows downwards, depending on the N value. An N value of 0 is invalid, and thus you should either throw an error for incorrect instruction, or handle as a no-op. The rows that go off-bounds do not get wrapped around and are discarded.
		// The provided example tackles the aforementioned method A

		for(y = 63; y >= N; y--)
			for(x = 0; x < 128; x++)
				VRAM[x][y] = VRAM[x][y-N]
		for(y = 0; y < N; y++)
			for(x = 0; x < 128; x++)
				VRAM[x][y] = 0


.... 00E0 - clear the screen
	Its purpose is to clear out the VRAM, so everything must go back to the default initialization value.
		// The provided example tackles the aforementioned method A

		for(y = 0; y < 64; y++)
			for(x = 0; x < 128; x++)
				VRAM[x][y] = 0


.... 00EE - return from subroutine
	Its purpose is to return back to the instruction stored in the last STACK entry, as denoted by the stack index pointer (SP). Do note that there's potential for OOB access here.
		if SP
			PC = STACK[--SP]
		else
			QUIT_WITH_MSG: EXIT FROM EMPTY STACK


.... 00FB - scroll display 4 pixels to the right (SUPER-CHIP)
	Its purpose is to scroll the VRAM 4 columns to the right. The rows that go off-bounds do not get wrapped around and are discarded.
		// ^^ The provided example tackles the aforementioned method A

		for(y = 0; y < 64; y++)
			for(x = 127; x >= 4; x--)
				VRAM[x][y] = VRAM[x-4][y]
			for(x = 0; x < 4; x++)
				VRAM[x][y] = 0


.... 00FC - scroll display 4 pixels to the left (SUPER-CHIP)
	Its purpose is to scroll the VRAM 4 columns to the left. The rows that go off-bounds do not get wrapped around and are discarded.
		// ^^ The provided example tackles the aforementioned method A

		for(y = 0; y < 64; y++)
			for(x = 0; x < 128; x++)
				VRAM[x][y] = VRAM[x+4][y]
			for(x = 124; x <= 127; x++)
				VRAM[x][y] = 0


.... 00FD - stop signal (SUPER-CHIP)
	Much likes its name implies, it's a signal to stop. What this would actually do for you is, well, up to you. I like to stop further instruction fetching.
		QUIT_WITH_MSG: RECEIVED STOP SIGNAL


.... 00FE - disable extended screen mode (SUPER-CHIP - will run at 64x32)
	Its purpose is to change the resolution of your display. Typically this means limiting your visibility to the top left quadrant of the VRAM.


.... 00FF - enable extended screen mode (SUPER-CHIP - will run at 128x64)
	Its purpose is to change the resolution of your display. Typically this means extending the visibility to the entirety of the VRAM.


.... 0NNN - ML routines
	 This category is for anything outside of the aforementioned instructions. If it's a 0x0000 then you've just hit blank memory. Either the rom is malformed, or you did something wrong somewhere. Anything else in this range is a machine-language routine and those should either be no-op'd or stop emulation, as you can't emulate them without emulating the actual computer that chip-8/super-chip used to run on.
		if NNN
			QUIT_WITH_MSG: MACHINE CODE <OPCODE> NOT SUPPORTED
		else
			QUIT_WITH_MSG: CALL TO 0x0000 DETECTED


.... 1NNN - jump to address
	Its purpose is to set the program counter to address NNN. Fairly simple. In many roms, it's used as a method of "stopping" by jumping to itself. You may wish to prevent needless execution by catching such a scenario.
		if PC - 2 == NNN
			QUIT_WITH_MSG: ADDRESS JUMP LOOP DETECTED
				// ^^ lots of games initiate one to signal they're done
		else
			PC = NNN


.... 2NNN - call subroutine
	Its purpose is to store the current program counter to the STACK as denoted by the stack index pointer, as denoted by the stack index pointer (SP), then jump to the address NNN. Do note that there's potential for OOB access here.
		if SP >= 16
			QUIT_WITH_MSG: CALL STACK OVERFLOW
		else
			STACK[SP++] = PC
			PC = NNN


.... 3XNN - skip next instruction if VX == NN
		if V[X] == NN
			PC += 2


.... 4XNN - skip next instruction if VX != NN
		if V[X] != NN
			PC += 2


.... 5XY0 - skip next instruction if VX == VY
		if V[X] == V[Y]
			PC += 2


.... 6XNN - set VX = NN
		V[X] = NN


.... 7XNN - set VX = VX + NN
	If you don't have a single-byte type, you'll need to mask like in the commented snippet.
		V[X] += NN
		// V[X] &= 0xFF

.... 8XYN - Arithmetic Instructions. This note is not an instruction itself. I just want to interject and clarify that the order of operations seen here for instructions 8xy4 through 8xyE is important. Remember, either V[X] or V[Y] could, actually, be a V[0xF] underneath.

.... 8XY0 - set VX = VY
		V[X]  = V[Y]


.... 8XY1 - set VX = VX | NY
		V[X] |= V[Y]


.... 8XY2 - set VX = VX & VY
		V[X] &= V[Y]


.... 8XY3 = set VX = VX ^ VY
		V[X] ^= V[Y]


.... 8XY4 - set VX = VX + VY, VF = carry
		SUM   = V[X] + V[Y] (size of at least 2 bytes)
		V[X]  = SUM & 0xFF
		V[15] = SUM >> 8


.... 8XY5 - set VX = VX - VY, VF = !borrow
		FLAG  = V[X] >= V[Y]
		V[X]  = V[X] - V[Y]
		// V[X] &= 0xFF
		V[15] = FLAG


.... 8XY7 - set VX = VY - VX, VF = !borrow
		FLAG  = V[Y] >= V[X]
		V[X]  = V[Y] - V[X]
		// V[X] &= 0xFF
		V[15] = FLAG


.... 8XY6 - set VX = VX >> 1, VF = carry
	This instruction has a discrepancy due to confused documentation. The original (chip-8) method is to shift VY into VX, whereas the alternative (super-chip) is to shift VX itself. The SHIFTQUIRK variable in this case is TRUE if we want the latter behavior.
		if SHIFTQUIRK  Y = X
		FLAG  = V[Y] & 1
		V[X]  = (V[Y] >> 1) & 0xFF
		V[15] = FLAG


.... 8XYE - set VX = VX << 1, VF = carry
	This instruction has a discrepancy due to confused documentation. The original (chip-8) method is to shift VY into VX, whereas the alternative (super-chip) is to shift VX itself. The SHIFTQUIRK variable in this case is TRUE if we want the latter behavior.
		if SHIFTQUIRK  Y = X
		FLAG  = V[Y] >> 7
		V[X]  = V[Y] << 1
		// V[X] &= 0xFF
		V[15] =	FLAG


.... 9XY0 - skip next instruction if VX != VY
		if V[X] != V[Y]
			PC += 2


.... ANNN - set I = NNN
		I = NNN


.... BNNN - jump to NNN + V0 (or V[X])
	This instruction has a discrepancy due to confused documentation. The original (chip-8) method is to add V[0] to the address NNN for the jump, whereas the alternative (super-chip) is to add V[X] instead. The JUMPQUIRK variable in this case is TRUE if we want the latter behavior.

		if JUMPQUIRK
			PC = NNN + V[X]
		else
			PC = NNN + V[0]


.... CXNN - set VX = RND & NN
		V[X] = RND(256) & NN


.... DXYN - draw sprite
		// TBD


.... EX9E - skip next instruction if key VX is held
	The system polls the key denoted at V[X] to check if it's held down. Since the system only had 4 hardware lines for the keyboard, all bits past the 4th of the V[X] value are ignored, thus the masking.
		if KEY[V[X] & 0xF] == 1
			PC += 2


.... EXA1 - skip next instruction if key VX is not held
	The system polls the key denoted at V[X] to check if it's held down. Since the system only had 4 hardware lines for the keyboard, all bits past the 4th of the V[X] value are ignored, thus the masking.
		if KEY[V[X] & 0xF] == 0
			PC += 2


.... FX07 - set VX = delaytimer
		V[X] = DELAY_TIMER


.... FX0A - wait for key press and release, set VX = key
	This instruction awaits for a key to be pressed and subsequently released. To accomplish this, you need to be able to compare the current key state with that of the last frame. Example code that'd take place along with your timer decrements:
		//    for(z = 0; z < 16; z++)
		//        CACHED_KEY[z]    = CUR_KEY_STATE[z]
		//        CUR_KEY_STATE[z] = iskeyheld(z)

	You can also cheat a little by only checking for a key release only, it will work fine:
		for(z = 0; z < 16; z++)
			if CACHED_KEY[z] & !CUR_KEY_STATE[z]
				V[X] = z
				return // < terminate early if we got a match

		PC -= 2 // if the loop didn't detect a key release, we must backtrack


.... FX15 - set delaytimer = VX
		DELAY_TIMER = V[X]


.... FX18 - set soundtimer = VX
		SOUND_TIMER = V[X]


.... FX1E - set I = I + VX
		I += V[X]
	You may have seen other guides suggesting to set VF according to whether the I register overflowed past 0xFFF. This is incorrect. The only rom that makes use of this is called Spacefight 2091, a super-chip game. No overflow occurs, it just wants VF to be set to 0 because the game is buggy. Do not implement this behavior, look for the patched rom instead.


.... FX29 - point I to 5-byte-tall numeric sprite for value in VX
	The system used a jump table originally, and would mask the value of V[X] to ensure it doesn't go OOB.
		I = (V[X] & 0xF) * 5


.... FX30 - point I to 10-byte-tall numeric sprite for value in VX (SUPER-CHIP)
	The system used a jump table originally, and would mask the value of V[X] to ensure it doesn't go OOB.
		I = (V[X] * 10) + 80


.... FX33 - store BCD of VX in memory at I, I+1 and I+2
	The instruction merely separates the hundreds, tens, and singles numbers from V[X] and stores them into memory in sequence.
		MEM[I]   =  V[X] / 100
		MEM[I+1] = (V[X] /  10) % 10
		MEM[I+2] =  V[X] %  10


.... FX55 - save V0..VX in memory at I..I+X
	This instruction has a discrepancy due to confused documentation. The original (chip-8) method increments the I register for each loop iteration, whereas the alternative (super-chip) does not. The LOADSTOREQUIRK variable in this case is TRUE if we want the latter behavior.
		for(n = 0; n <= X; n++)
			MEM[I+n] = V[n]
		if !LOADSTOREQUIRK
			I += X + 1


.... FX65 - load V0..VX from memory at I..I+X
	This instruction has a discrepancy due to confused documentation. The original (chip-8) method increments the I register for each loop iteration, whereas the alternative (super-chip) does not. The LOADSTOREQUIRK variable in this case is TRUE if we want the latter behavior.
		for(n = 0; n <= X; n++)
			V[n] = MEM[I+n]
		if !LOADSTOREQUIRK
			I += X + 1


.... FX75 - save V0..VX (X<8) in the RPL flags (SUPER-CHIP)
	This instruction, in the original super-chip implementation, was limited to 8 RPL registers. If it was called with an X of 8 or larger, it was capped to 7. You *probably* don't have to worry about emulating that.
		for(n = 0; n <= X; n++)
			RPL[n] = V[n]


.... FX85 - load V0..VX (X<8) from the RPL flags (SUPER-CHIP)
	This instruction, in the original super-chip implementation, was limited to 8 RPL registers. If it was called with an X of 8 or larger, it was capped to 7. You *probably* don't have to worry about emulating that.
		for(n = 0; n <= X; n++)
			V[n] = RPL[n]


///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////

On this part I'll show you a structure for matching the opcode to the appropriate instructions. I'll try to keep it simple, but let's review the variables we defined earlier so you don't have to scroll all the way back up, context is important at all times!

NNN = OPCODE & 0x0FFF // our 12 bit JUMP address
NN  = OPCODE & 0x00FF // the lowest byte (also seen as KK in other guides)

P  = (OPCODE & 0xF000) >> 12 // 1st nibble - most significant
X  = (OPCODE & 0x0F00) >>  8 // 2nd nibble - also known as X in opcodes
Y  = (OPCODE & 0x00F0) >>  4 // 3rd nibble - also known as Y in opcodes
N  =  OPCODE & 0x000F        // 4th nibble - least significant

You will want to break out after executing any opcode. Don't forget it.


CASE P
    0x0 :
		CASE (OPCODE & 0x0FF0)
			0x0C :
				CASE N
					0x0 : INVALID
					DEF : 00CN (SUPER-CHIP)
			0x0E :
				CASE N
					0x0 : 00E0
					0xE : 00EE
					DEF : INVALID
			0x0F :
				case N
					0xB : 00FB (SUPER-CHIP)
            		0xC : 00FC (SUPER-CHIP)
            		0xD : 00FD (SUPER-CHIP)
            		0xE : 00FE (SUPER-CHIP)
            		0xF : 00FF (SUPER-CHIP)
					DEF : INVALID
			DEF : INVALID
    0x1 : 1NNN
    0x2 : 2NNN
    0x3 : 3XNN
    0x4 : 4XNN
    0x5 : 5XY0 // if N > 0 that's invalid
    0x6 : 6XNN
    0x7 : 7XNN
    0x8 :
        CASE N
            0x0 : 8XY0
            0x1 : 8XY1
            0x2 : 8XY2
            0x3 : 8XY3
            0x4 : 8XY4
            0x5 : 8XY5
            0x6 : 8XY6
            0x7 : 8XY7
            0xE : 8XYE
    0x9 : 9XY0 // if N > 0 that's invalid
    0xA : ANNN
    0xB : BNNN
    0xC : CXNN
    0xD : DXYN
    0xE :
        CASE NN
            0x9E : EX9E
            0xA1 : EXA1
    0xF :
        CASE NN
            0x07 : FX07
            0x0A : FX0A
            0x15 : FX15
            0x18 : FX18
            0x1E : FX1E
            0x29 : FX29
            0x30 : FX30 (SUPER-CHIP)
            0x33 : FX33
            0x55 : FX55
            0x65 : FX65
            0x75 : FX75 (SUPER-CHIP)
            0x85 : FX85 (SUPER-CHIP)