Untitled

CSCE 212H: Introduction to Computer Architecture
================================================

## 2018-01-18
* Number representation
  * Unsigned: represented in traditional binary
    * On n-bit architecture, \(2^n\) total
    * \(Umax=2^n-1\) (minus one because of zero)
    * All operations are technically modulo \(2^n\)
    * Ints are not integers!
  * Signed magnitude: represents signed numbers by using first bit as sign
    * Represents \(-(2^{n-1}-1) \textrm{ to } 2^{n-1}-1\) with a duplicate 0
    * Does not work well with arithmetic
  * Two's Complement: alternate to signed magnitude good for arithmetic
    * Operation and representation
    * Represents negatives by flipping all bits and adding 1
    * First bit is sign bit - 1 represents negative
    * Can be passed through arithmetic with no further modification
    * Ex.  24 = 0..011000 -> 1..100111 (1's Comp) -> 1..101000 (2's Comp) = -24
    * Ex. -24 = 1..101000 -> 0..010111 (1's Comp) -> 0..011000 (2's Comp) =  24
    * Ex. 6 + (-5) = 0..0110 + 1..1011 = 0..0001 = 1
    * Ex. (-7) + 3 = 1..1001 + 0..0011 = 1..1100 = -4
    * Alternative method: flip all bits to the left of the rightmost 1
    * On n-bit architecture, \(Tmax=2^(n-1)-1, Tmin=-2^(n-1)\)
    * Special case \(Tmin = -2^{63} = 10..0b: 2sComp(Tmin) = 2sComp(10..0) = 10..0 = Tmin\)
    * So 2sComp(Tmin) = Tmin or -Tmin = Tmin (because the magnitude of Tmin is greater than Tmax)
  * Floats (32 bits)
    * IEEE 754 Standard
      * (Sign (1b)) (   Exponent (8b)   ) (          Fraction (23b)          )
      * Exponent sign is part of 8-bits by adding a bias to the field value mod 2^31
      * Exponent base is 2 not 10
   * Floats are not reals!
   * Doubles are just double-precision floats so 64-bits
* Funky arithmetic
  * Example 1: Is \(x^2 >= 0\) always?
    * For unsigned ints, yes because they are unsigned
    * For signed ints, no because arithmetic is mod n which can be negative
    * For floats, yes
  * Example 2: Is \((x+y)+z = x+(y+z)\) always?
    * For ints (signed and unsigned), yes because modular arithmetic is associative
    * For floats, no because float addition varies with order of magnitude
      * \((1e20 + -1e20) + 3.14 \rightarrow 3.14\)
      * \(1e20 + (-1e20 + 3.14) \rightarrow 0\)
* Computer arithmetic
  * Cannot generate random values - always follows strict mathematical rules
  * Not all usual mathematical properties hold for computer data types
  * Int operations follow "ring" properties
  * Floating point operations follow "ordering" properties (not field though)
* GR#1 **Ints are not Integers, Floats are not Reals**
* GR#2 **You have to Assembly language**
  * Probably won't be used after this course because compilers are better than you
  * Textbook uses variant of x86 called y86-64

## 2018-01-23
* GR#3 Memory matters
  * Not infinite
  * Must be allocated and managed
  * Pointer errors are very hard to debug
* GR#4 **There's more to performance than asymptotic complexity**
  * Ex. row major order iteration of 2D array is much faster than column major
    * Fortran is column major, but most other languages are row major
* More integer representation details
  * Unsigned: \
              \(\begin{align}
                & X[n] & * & X[n-1] & * & ... & * & X[2] & * & X[1] & * & X[0] \\
                & 2^n & * & 2^{n-1} & * & ... & * & 2^2 & * & 2^1 & * & 2^0
                \end{align}\)
  * Signed:
  * In 2's complement, positive values are the same as their unsigned counterparts
  * The magnitude of 2's complement can be found by doing 2's complement operation
  * 2's complement value: \(VT(x) = -X[w-1] * 2^{w-1} + \sum_{i=0}^{w-2}{X[i] * 2^{i}}\)
* GR#5 **Computers do more than execute programs**
* Bits vs. Byte
 * No intrinsic reason that a byte must be 8 bits but it was decided by Fred Brook
 * Used to be 6 bits per byte (because it was enough to store a char)
 * All modern computers have Byte Addressable Memory (each byte accessed individually)
 * Word is usually 4 bytes (grouped by address last two digits), short usually 2 bytes
  * By grouping into words and shorts, some memory can address an entire chunk of 4 bytes
 * Half byte is a nibble (each nibble is a hex digit)

## 2018-01-25
* Little Endian vs Big Endian
  * For some value 0xFF4292A1,
    * FF is MSB=Most Significant Byte, A1 is LSB=Least Significant Byte
    * Little Endian starts with LSB in memory
    * Big Endian starts with MSB in memory
* Fractional binary numbers
  * Ex: \(1011.101b = 1*2^3 + 0*2^2 + 1*2^1 + 1*2^0 + 1*2^{-1} + 0*2^{-2} + 1*2^{-3} \\
   = 8 + 2 + 1 + \frac{1}{2} + \frac{1}{8} = 11 + 0.5 + 0.125 = 11.625\)
  * Numbers of form \(0.111111...\) are just below 1 (like $0.999...$ is just below 1 in decimal)
    * Represented \(1.0 - \varepsilon\)
* Floating point representation
  * \((-1)^s M 2^E\) where $s$ is sign bit, $E$ is exponent of 2, $M$ is fraction field
  * Single Precision: 32 bits = (1 s) + (8 E) + (23 M)
  * Double Precision: 64 bits = (1 s) + (11 E) + (52 M)
  * Extended Precision: 80 bits (Intel only) = (1 s) + (15 E) + (63 or 64 M)
  * "Normalized" Values
    * If E=00..0 or E=11..1, value is "special"
    * Exponent coded as *biased* value \(E=Exp-Bias\)

## 2018-01-30
* Examples
  *  357 = 0000000101100101 (16-bit 2's complement)
  * -357 = 1111111010011011 (16-bit 2's complement)
  * 357.0 = 0 10000111 011001000..0 (IEEE 754 standard float)
    * Sign = 0 because positive
    * Exp = 8 + Bias = 8 + 127 = 135
    * Frac = After decimal of 1.00100101 (with trailing 0s)
* Floating point special values
  * Denormalized values
    * When Exp = 000..0
    * Exponent value: E = 1 - Bias (instead of E = 0 - Bias as would be normal for Exp=0)
    * 0 before decimal point, not 1 like usual
    * Cases:
      * Exp=000..0, Frac=000..0
        * Represents zero (positive and negative are distinct)
      * Exp=000..0, Frac $\neq$ 000..0
        * Numbers closest to zero, equispaced
    * Smallest normalized value (Exp = 000..01)
      * E = Exp - Bias = 1 - 127 = -126
      * Value = \(1-2^{126}\)
      * Cannot be normalized
  * Infinity
    * When Exp=111..1, frac = 000..0
      * represents $\pm\infty$
      * 1.0/0.0 = -1.0/-0.0 = $+\infty$, 1.0/-0.0 = $-\infty$
  * Not a Number (NaN)
    * When Exp=111..1, frac $\neq$ 000..0
* More Examples
  * Gap size in large floats:
    * \[\begin{align}
        & 0 \ 11111110 \ 111..11 \\
      - & 0 \ 11111110 \ 111..10 \\
      \hline
        & 0 \ 11111110 \ 000..01 \\
      = & 0.000..01 * 2^{127} \textrm{ (Must be normalized)} \\
      = & 1.000..00 * 2^{127-23} = 1.0 * 104 \\
      \approx & 2*10^{31} \textrm{ (HUGE!)}
      \end{align}\]
* Float Rounding
  * Basic idea:
    * First computer exact result
    * Make fit into desired precision
      * Possibly overflow if exponent is too large
      * Possibly round to fit in Frac
* Float Multiplication
  * Exact result:
    * Sign bit \(s_1 \oplus s_2\)
    * Significand \(M_1 \times M_2\)
    * Exponent \(E_1 + E_2\)
  * Then fix result to fit correctly in float
* Float Addition
  * Must first get binary points aligned:
    * \((-1)^{s_1}\cdot M_1\cdot 2^{E_1} + (-1)^{s_2}\cdot M_2\cdot 2^{E_2}\)
    * Then fix correclty to float
* Mathematical properties of FP addition
  * Closed under addition?
    * Yes but may produce infinity or NaN
  * Commutative?
    * Yes
  * Associative?
    No: (in case of numbers of different magnitudes as seen earlier)
  * 0 is Additive Identity?
    * Yes
  * Every element has additive inverse?
    * Almost: everything except infinities and NaNs
  * Monotonicity (\(a \geq b \Rightarrow a+c \geq b+c\))
    * Almost: except for infinities and NaNs

## 2018-02-01
* Mathematical properties of FP multiplication
  * Compare to Commutative ring
    * Closed under multiplication?
      * Yes (but may generate infinities and NaNs)
    * Commutative?
      * Yes
    * Associative?
      * No: possibility of overflow and inexactness of rounding (as in addition)
    * 1 is Multiplicative Identity?
      * Yes
    * Multiplication distributive over addition?
      * No: possibility of overflow and inexactness of rounding (as in associativity)
  * Monotonicity?
    * Almost: all but infinities and NaNs
* Floating point specifications in C
  * C guarantees two levels of precision
    * float: 32 bits (1 + 8 + 25)
    * double: 64 bits (1 + 11 + 52)
    * Some machines offer a long double
  * Conversions/Casting
    * double/float $\rightarrow$ int
      * truncates fractional part (like rounding toward zero): \(\lfloor frac \rfloor\)
      * Not defined when out of range or NaN (usually sets to TMin)
    * int $\rightarrow$ double
      * Exact conversion as long as int has $\leq$ 53 bit word size
* FP Puzzles
  * Assume `int x = _; float f = _; double d = _;` and neither d nor f is NaN.
  * `x == (int) (float) x`?
    * Not true for large numbers because large x cannot be represented in float without loss of precision.
  * `x == (int) (double) x`?
    * True because 32-bit int largest number can be represented in $\leq$ 53 bits without loss of precision.
  * `f == (float) (double) f`?
    * True because conversion from float to double can be done without loss of precision and conversion back should always work because number is within range of float.
  * `d == (double) (float) d`?
    * Not true because large doubles would lose precision when converted to float.
  * `f == -(-f)`?
    * True because negation only affects the sign bit which can be easily reverted with another negation.
  * `2/3 == 2/3.0`?
    * Not true because `2/3` does integer division where `2/3.0` does floating point division.
  * `d < 0.0` $\rightarrow$ `d*2 < 0.0`?
    * True because multiplying by 2 does not affect sign bit.
  * `d > f` $\rightarrow$ `-f < -d`?
    * In comparing float to double, the float is promoted to double so this should always apply.
  * `d * d >= 0.0`?
    * True because sign bit is XORed so the resulting sign will always be positive.
  * `(d+f)-d == f`?
    * Not true because FP multiplication is not associative.
* C Primer - Highlights
  * Preprocessor
    * `#include <X.h>` or `#include "X.h"` includes header X.h
    * `#define X Y` replaces all instances of Y with X
  * Command line arguments
  * Arrays and structures
  * Pointers and dynamic memory
    * malloc
    * free
* Intel x86 Processors
  * Dominate laptop/desktop/server industry
  * Backwards compatible back to 8086 (1978)
  * Complex instruction set computer (CISC)
    * Many different instructions with many different formats
      * Only a few are regularly used
    * Hard to match performance of Reduced Instruction Set Computers (RISC)
      * But Intel has done just that.
* Intel's Competitor: Advanced Micro Devices (AMD)
  * Historically follows behind Intel: a little bit slower, a lot cheaper
  * Built Opteron: tough competitor to Pentium 4
  * Developed x86_64 successfully after Intel failed with its Itanium
  * Most processors support 64-bit mode these days, but many programs operate in only 32 bits.
* Definitions
  * Architecture (also Instruction Set Architecture - ISA):
    * Only details of processor design that an assembly code writer must understand
* Assembly/Machine Code View
  * CPU:
    * Program Counter (PC)
    * Registers
    * Condition Codes (what each instruction does)
    * Arithmetic Logic Unit (ALU)
  * Memory:
    * Code
    * Data
    * Stack
  * Bus connection:
    * Addresses
    * Data
    * Instructions
## 2018-02-06
* Turning C code into object code
  * C program (x.c) -> (Compiler `gcc -S`)
  * Asm program (GAS syntax) (x.s) -> (Assembler `gcc` or `as`)
  * Object program (x.o) -> (Linker `gcc` or `ld`) <~ Static Libraries (\*.a)
  * Executable program (x) <~ Dynamic Libraries (\*.so)
* Example of C -> Asm
  * C:
    ```C
    long plus(long x, long y);

    void sumstore(long x, long y, long *dest) {
      long t = plus(x, y);
      *dest = t; //*
    }
    ```
  * Assembly:
    ```
    sumstore:
      pushq   %rbx
      movq    %rdx, %rbx
      call    plus
      movq    %rax, (%rbx)
      popq    %rbx
      ret
    ```
  * Machine Code:
    ```
    0x0400595:
      0x53
      0x48
      ...
      0x5B
      0xC3
    ```
    * Missing everything except links between functions and things (provided by linker file)
* Assembly Characteristics:
  * Data Types
    * "Integer" data of 1, 2, 4, or 8 bytes
      * Data values
      * Addresses (untyped pointers)
    * Floating point data of 4, 8, or 10 bytes
    * Code: byte sequences encoding series of instructions
    * No aggregate types such as arrays or structures
      * Just continuously allocated bytes in memory
  * Operations
    * Perform arithmetic operations (ALU)
    * Move data between memory locations and registers
    * Conditional and unconditional branching
* Disassembler:
  * `objdump -d <executable>` converts machine code to more readable form - assembly
  * Also possible using `gdb`
  * **Note: Reverse engineering is forbidden by Microsoft End User License Agreement**
* x86-64 Integer Registers
  * 8-byte examples
    * %rax
    * %rbx
    * %rcx
    * %rdx
    * %rsi
    * %rdi
    * %rsp -> stack pointer
    * %rbp -> "frame" pointer
    * %r8
    * ...
    * %r15
  * %rax (8-byte) -> %eax (4-byte) -> %ax (2-byte) -> %ah/%al (1-byte)
* Moving Data
  * `movq` moves quad word (4 words = 8 bytes)
  * Operand Types
    * Constant integer data
      * Prefixed with $ (i.e. $0x400, $-533)
      * Encoded with 1, 2, or 4 bytes
    * Register
      * %rax and others seen previously
      * %rsp and sometimes %rbp reserved
      * Others sometimes have special purposes
    * Memory Address
      * Simplest example: (%rax) gives address stored in %rax
  * Details of movq (source -> dest):
    \[
    movq
    \begin{cases}
      Immediate \quad
      \begin{cases}
        Register \ \\
        Memory \\
      \end{cases} \\
      Register \quad\quad
      \begin{cases}
        Register \\
        Memory \quad\quad \\
      \end{cases} \\
      Memory
    \end{cases}
    \]
## 2018-02-08
* `leaq`
  * load effective address quad
  * equivalent to `p = &x[i];`
* Cool optimization:
  * C code:
    ```
    long m12(long x) {
      return x*12;
    }
    ```
  * Assembly:
    ```
    leaq (%rdi,%rdi,2), %rax  # t <- x + x*2 (equivalent to t = 3*x)
    salq $2, %rax             # return t << 2 (equivalent to return 4*t)
    ```
* Control flow
  * Temporary data `%rax`
  * Location of runtime stack `%rsp`
  * Location of current code control point `%rip`
  * Status of recent tests:
    * CF: carry flag - carry out from addition is set
    * ZF: zero flag - if t == 0
    * SF: sign flag (signed) - if t < 0
    * OF: overflow flag (signed) - sign bit is not what it should be (i.e. adding two positive numbers gives negative)
  * Control flags set by ALU ops but not by `leaq`
  * Jumps
    | jX  | Condition     | Description          |
    |-----|---------------|----------------------|
    | jmp | 1             | Unconditional        |
    | je  | ZF            | Equal / Zero         |
    | jne | ~ZF           | Not Equal / Not Zero |
    | js  | SF            | Negative             |
    | jns | ~SF           | Nonnegative          |
    | jg  | ~(SF^OF)\&~ZF | > (signed)           |
    | jge | ~(SF^OF)      | >= (signed)          |
    | jl  | (SF^OF)       | < (signed)           |
    | jle | (SF^OF)|ZF    | <= (signed)          |
    | ja  | ~CF\&~ZF      | Above (unsigned)     |
    | jb  | CF            | Below (unsigned)     |
  * Bad cases for conditional move
    * Expensive computations - `val = Test(x) ? Hard1(x) : Hard2(x);`
    * Risky computations - `val = p ? *p : 0;`
    * Computations with side effects - `val = x > 0 ? x*=7 : x+=3;`
## 2018-02-13
* Jump Tables (switch statement)
```
.section    .rodata
    .align
.L4:
    .quad   .L2
    .quad   .L1
    .quad   .L3
    .quad   .L5
```
* Stack
  * %rsp points to top of stack
  * Stack is 'upside down' because stack bottom is at highest address and stack grows down with %rsp at lowest address element
  * Heap is usually below stack
* Parameters to procedures
  * First 6 parameters are stored in %rdi, %rsi, %rdx, %rcx, %r8, %r9
  * Remaining parameters are pushed onto the stack if necessary with arg 7 on top of stack
  * Return of function is stored in %rax
  * Other registers should be pushed and popped unless it is known that the register can be clobbered without issue
* Recursive procedures
  * Any language supporting recursion must implement a stack
## 2018-02-15
* Parameters pneumonic
  * `rdi` - Dolly's
  * `rsi` - Special
  * `rdx` - Dress
  * `rcx` - Costs
  * `r8`  - $89
  * `r9`  - ^
* For stack parameters you do not pop them off, just address `(%rbp)` for param 7 and `8(%rbp)` for param 8 if param 7 is 8 bytes long
* Array allocation
  * `T A[l]` is array length l of elements type T named A
  * In memeory, this is just a strip of memory with `l*sizeof(T)` bytes which can be accessed using `A[2]` or just `*(A + 2*sizeof(T))`.

## 2018-02-20
* Arrays
  * C:
  ```C
  #define ZLEN 5
  typedef zip_dig char[];

  void zincr(zip_dig z) {
    size_t i;
    for (i = 0; i < ZLEN; i++) {
      z[i]++;
    }
  }
  ```
  * ASM:
  ```asm
    # %rdi = z
    movl  $0,  %eax           #   i = 0
    jmp   .L3                 #   goto middle
  .L4:                        # loop:
    addl  $1,  (%rdi,%rax,4)  #   z[i]++
    addq  $1,  %rax           #   i++
  .L3:                        # middle:
    cmpq  $4,  %rax           #   test i == 4
    jbe   .L4                 #   if <=, goto loop
    rep; ret
  ```
## 2018-02-22
* Nothing to see here.

## 2018-03-05
* Hardware Control Language (HCL)
  * Similar to Hardware Description Language (HDL)
    * Most prevalently Verilog and VHDL
  * Conventions
    * Single bits are low caps, word-level structures are caps
* Decoders
  * \(n \times 2^n\) Decoder takes $n$ inputs and has $2^n$ outputs where output i is 1 and all others are 0 when the inputs represent i in binary
  * Used to do correct action based on opcode section of instruction
* Multiplexers (MUX)
  * Has $n$ selector inputs and $2^n$ data inputs with 1 output where the output is equal to the $i^{th}$ data input when the selector inputs represent i
  * Used in selecting which register to use
  * HCL 1-bit multiplexer: `bool out = (s && a) || (!s && b);`
* HCL Word-Level Examples
  * Minimum of 3
    ```VCL
    int Min3 = [
      A < B && A < C : A;
      B < A && B < C : B;
      1              : C;
    ];
    ```
  * 4-Way Multiplexer
    ```VCL
    int Out4 = [
      !s1 && !s0 : D0;
      !s1        : D1;
      !s2        : D2;
      1          : D3;
    ];
    ```
* Arithmetic Logic Unit (ALU)
  * Uses a multiplexer to output the appropriate operator on inputs
  * Sets flags OF, ZF, CF, SF in the process
* Flip Flops
  * RS Flip Flop
    * Two NOR gates with back flow to opposite gate
    * Usually has a clock input to prevent race conditions
    * input R,S: 01 -> set to 1, 10 -> reset to 0, 00 -> no action (for reading)
    * 2 outputs Q, Q' (or Q+, Q-) where Q is value of
* Latching
  * Transparent 1-Bit Latch
    <img src="img/transparent-1-bit-latch.png" alt>
* Random Access Memory (RAM)
  * From CPU to Memory, there are three lines:
    * 1-way CPU to Memory: control bus (read or write, a few other things)
    * 1-way CPU to Memory: address bus (may be chunked)
    * 2-way: data bus carries values to and from memory
* HCL Overview
  * Data Types
    * bool : Boolean
      * a, b, c, ...
    * int : words (usually thinking 64 bit words)
      * A, B, C, ...
  * Operations
    * Logic operations: `&&, ||, !`
    * Word Comparisons: `==, !=, <, <=, >, >=`
    * Set Membership: `A in { B, C, D }`
  * Word Expressions:
    * Case Expressions:
      `int Z = [ (case 1) : (value); (case 2) : (value); 1 : (default case); ];`

## 2018-03-08
* Y86-64 Instruction Set
  * Instructions
    * halt: 00
    * nop: 10
    * cmovXX rA, rB : 2(fn) (rA) (rB)
    * irmovq V, rB : 30 (F) (rB) (--V--)   // Value into register
    * rmmovq rA, D(rB) : 40 (rA) (rB) (--D--)  // Reg to Mem
    * mrmovq D(rA) rB : 50  (rA) (rB) (--D--)  // Mem to reg
    * OPq rA rB : 6(fn) rA rB
    * jXX Dest : 7(fn) (Dest)
    * call Dest : 80 (Dest)
    * ret : 90
    * pushq rA : A0 (rA) (F)
    * popq rA : B0 (rA) (F)
  * Functions for OPq
    * addq : 60
    * subq : 61
    * andq : 62
    * xorq : 63
  * Jumps for jXX
    * jmp : 70
    * jle : 71
    * jl : 72
    * je : 73
    * jne : 74
    * jqe : 75
    * jg : 76
* Sequential Hardware Structure
  * Fetch: get value pointed by program counter (PC) (aka instruction pointer, IP) and move to instruction register (IR); increment PC
  * Decode: Figure out what the command is
  * Execute: Execute the command (in the case of ALU stuff)
  * Memory: Move store memory stuff
  * Write Back: write to registers
  * PC: Start back from top
* Pipelining is when multiple stages are occuring simultaneously (why the process is split into 5 subprocedures)
* Instruction example
  * For instruction `OPq rA, rB : 6(fn) (rA) (rB)`
    * Fetch: read two bytes
    * Decode: read operand registers
    * Execute: perform ALU operation, set condition codes
    * Memory: Do nothing
    * Write Back: Update register
    * PC Update: Increment PC by 2 (2 because the OPq has an extra byte of registers)
  * Exactly what happens in `OPq rA, rB : 6(fn) (rA) (rB)`
    * Fetch
      * \(\textrm{icode:ifun} \leftarrow M_1[PC]\) &emsp; // Read instruction byte
      * \(rA:rB \leftarrow M_1[PC+1]\)             &emsp; // Read register byte
      * \(valP \leftarrow PC+2\)                   &emsp; // Computer next PC
    * Decode
      * \(valA \leftarrow R[rA]\)                  &emsp; // Read operand A
      * \(valB \leftarrow R[rB]\)                  &emsp; // Read operand B
    * Execute
      * \(valE \leftarrow valB\textrm{ OP }valA\)  &emsp; // Perform ALU operation
      * \(\textrm{set CC}\)                        &emsp; // Set condition code register
    * Memory
                                                   &emsp; // NOP
    * Write Back
      * \(R[rB] \leftarrow valE\)                  &emsp; // Write back result
    * PC Update
      * \(PC \leftarrow valP\)                     &emsp; // Update PC
  * Notes on other opcodes
    * For ops like rmmovq which have large data parts (i.e. D is 8 bytes), use \(M_8\) instead of \(M_1\)
    * For stack operations, one register represents no register (represented F above and usually 0xF in 15-register system)
    * In popq Decode:
      * \(valA \leftarrow R[\%rsp]\) &emsp; // valA is used to put value at old %rsp into rA
      * \(valb \leftarrow R[\%rsp]\) &emsp; // valB is used to calculate %rsp + 8 for new %rsp
    * cmovXX moves rA into rB only when the condition is true
      * If condition is false, 0xF is used as destination

## 2018-03-22
* Y86-64 Instruction Mapping
  * `0x1000: call 0x12300`
    * Fetch: \(icode:ifun \leftarrow M_1[PC] \\ valC \leftarrow M_8[PC+1] \\ valP \leftarrow PC + 9\)
    * Decode: \(valA \leftarrow R[rsp] \\ valB \leftarrow R[rsp]\)
    * Execute: \(valE \leftarrow valB + (-8)\)
    * Memory: \(M_8[valE] \leftarrow valP\)
    * Write Back: \(R[rsp] \leftarrow valE\)
    * PC Update: \(PC \leftarrow valC\)
  * `ret` (0x90)
    * Fetch: \(icode:ifun \leftarrow M_1[PC]\)
    * Decode: \(valA \leftarrow R[rsp] \\ valB \leftarrow R[rsp]\)
    * Execute: \(valE \leftarrow valB + 8\)
    * Memory: \(valM \leftarrow M_8[valA]\)
    * Write Back: \(R[rsp] \leftarrow valE\)
    * PC Update: \(PC \leftarrow valM\)

## 2018-03-29
* (Insert stuff here)

## 2018-04-03
* iaddq V, rB
  * Fetch: \(icode:ifun \leftarrow M_1[PC] \\ rA:rB \leftarrow M_1[PC+1] \\ valC \leftarrow M_8[PC+2] \\ valP \leftarrow PC + 10\)
  * Decode: \(valB \leftarrow R[rB]\)
  * Execute: \(valE \leftarrow valB + valC\)
  * Memory:
  * Write Back: \(R[rB] \leftarrow valE\)
  * PC Update: \(PC \leftarrow valP\)

## 2018-04-05
* HW Convert to assembly:
  ```c
  void swappair(long a[], n) {
    // Assumes n is length of a and n is even
    int i;
    long swap;
    for (i = 0; i < n; i = i + 2) {
      swap = a[i];
      a[i] = a[i+1];
      a[i+1] = swap;
    }
  }
  ```

## 2018-04-24
* N/A

## 2018-04-26
* #TODO