Untitled

Temporal Locality  : Keep most recently accessed data items closer to the processor
Spatial Locality 	: Move blocks consisting of contiguous words closer to the processsor.

Hit Rate 	 : the fraction of memory accesses found in a level of the memory hierarchy.
Hit Time 	 : Time to access that level which consist of time to access the block + Time to determine hit/miss
Miss Rate    : the fraction of memory accesses not found in a level of the memory hierarchy (1 - Hit Rate)
Miss Penalty : Time to replace a block in that level with the corresponding block from a lower level.

Register <-> Memory   : by compiler
Cache <-> Main Memory : by cache controller hardware
Main Memory <-> Disk  : by operating system (virtual memory) ; virtual to physical address mapping assisted by the hardware (TLB) ; by the programmer (files)

Direct Mapped Cache : Location determined by address.
Index     - Use low-order address bits
Tag       - High-order address bits
Valid Bit - 1 if present, 0 otherwise. Initially 0

Sources of Cache Misses
Compulsory 			 : First access to a block (first reference); Solution: increase block size (increases miss penalty; very large blocks could increase miss rate)
Capacity             : Cache cannot contain all blocks accessed by the programa Solution: increase cache size ( may increase accesss time)
Conflict (collision) : Multiple memory locations mapped to the same cache location
Solution 			 : Increase cache size or increase associativity (both may increase access time)

Write Policies
Write-Through : On data-write hit , update the block in cache and also updates memory.
But makes writes take longer, solution is to use write buffer that holds data waiting to be written to memory so the cpu can continue immediately (only stalls on write if write buffer is already full)
Write-Back    : On data-write hit, only update the block in cache but keep track of whether each block is dirty. When a dirty block is replaced, write it back to memory.

Write Miss :
No Write Allocate : write directly to write buffer. (used with Write Through)
Write Allocate : check for hit and then write, pipeline writes via a delayed write buffer to cache. (used with Write Back)

Loading Policies
Blocking            : the requested word is only sent to the processor after the whole block has been loaded into cache
Non Blocking        : Early Restart : fetch the words in normal order, but as soon as the requested word of the block arrives, send it to the processor and let the processor continue execution.
Critical Word First : Request the missed word first from memory and send it to the processor as soon as it arrives; let the processor continue execution while filling the rest of the words in the block.

CPU Time = IC * CPIstall (accesses per programa * miss rate * miss penalty) * CC
Average Memory access time = Hit time + Miss rate * Miss Penalty

Replacement Policy

Direct Mapped       : no choice
Set Associative     : prefer non-valid entry, if there is one, otherwise choose among entries in the set.
Least Recently Used : Choose the one unused for the longest time. Only manageable till 4-way
Random              : Approximately the same performance as LRU for high associativity

Out of order CPUs can execute instructionss during cache miss, pending store stays in load/store unit (dependent instructions wait in reservation stations) and independente instructions continue.

Victim Cache, When replacing a block from the cache, keep it temporarily in the victim buffer so that in the subsequenent cache miss, the contentss of the buffer are checked to see if they have the desired data before accessing the lower level memory. (Small cache, Fully associative, particurly efficient for small direct mapped caches)

Multilevel Caches ;
Primary Cache L1 : Focus on minimal hit time
Level-2 Cache : Focus on low miss rate to avoid main memory access , hit time has less overall impact.
Results : L-1 cache usually smaller than a single cache, L-1 block size smaller than L-2 block size

taccess = thitL1 + pmissL1 * tpenaltyL1
tpenaltyL1 = thitL2 + pmissL2 * tpenaltyL2
taccess = thitL1 + pmissL1 * (thitL2 + pmissL2 * tpenaltyL2)

Improving Cache Performance :
1. Reduce the hit time in the cache : smaller cache, direct mapped cache, smaller blocks.
2. Reduce Miss rate : bigger cache, larger blocks, increase associativity, victim cache.
3. Reduce the miss penalty : smaller blocks, use a write buffer to hold dirty blocks being replaced so don't have to wait for the write to complete before reading.
Check write buffer (and/or victim cache) on read miss; For large block fetch critical word first ; Use multiple cache levels ; faster backing store/improved memory bandwidth (wider buses, memory interleaving, DDR SDRAMs)

Virtual Memory :
Use main memory as a "cache" for secondary memory (disk)
Each program share main memory but each is compiled into its own address space - a virtual address space.
CPU and OS translate virtual addresses to physical addresses.
- VM "block" is called a page
- VM "miss" is called a page fault

Page Tables : To reduce page fault rate, prefer least-recently used replacement. Reference bit (aka use bit) in PTE, is set to 1 on access to page (used recently) , periodically cleared to 0 by OS (has not been used recently).
Write blocks at once, use write-back (only write to cache on hit and keep track of dirty blocks) with dirty bit on PTE set when page is written

Page Table Dimension
First get page offset by calculating log2(page size in bytes)
Then, calculate Physical Page Number (PPN) size by subtracting page offset from total number of bits allocated for physical address.
Now you can calculate Page Table Entry (PTE) size by adding valid bit, protection bit, etc. to the calculated PPN.
One last piece of information we need is the number of page entries in the page table. We can get this by subtracting page offset from the total number of bits we have for the virtual page number;