Untitled

Falkervisor Q&A

brownie - Windows snapshotting user-mode based hypervisor fuzzer thingy (intel vt-x)
	- Debugger for Windows for snapshotting register and memory state of an application
	- Load up snapshot in a intel vt-x VM.
	- Modify memory and then execute

brownie v2 - Windows ..... fuzzer thingy (amd svm)
falkervisor - Cross-OS os-level fuzzer
	- Code coverage
	- Memory coverage (absolute and relative)
	- Debugging/single stepping
	- Networking

- Snapshots
	- PXE boot falkervisor in snapshot mode
	- Load up boot sector on disk and boot under falkervisor
	- Monitor hardware breakpoints
	- If 0x133713371337 is present in a debug register upon a hardware breakpoint
	  take a snapshot of register state and memory, and ship over the network.
	- Once the snapshot is done, the hypervisor resumes execution
	- Feel free to take more snapshots

- Execution
	- PXE boot falkervisor in fuzz mode
	- falkervisor will request snapshot images over the network
		- falkervisor fuzz module will also pull whatever needed

	10:
	- fuzzer must change memory to change input
	- the vm is launched!
	- vm ends exeuction on one of three conditions
		- timeout
		- i/o access (disk, display, context switch)
		- fault (memory violation, div by zero, etc)

		- #UD undefined opcode
		- #PF page fault
			- page faults occur A LOT under normal conditions
			  NtPageFault - page fault handler
			              - path for only 'true' page faults
		- #GP general purpose
			- on 64-bit bit systems, you get a #gp on non-canon memory accesses
			- 4141414141414141 - non-canon
			  mov rax, 0x4141414141414141
			  mov rdx, [rax] <- #gp

	- if the vm ended on a fault, the register state and input file are reported
	  over the network

	- DIFFERENTIAL RESTORE!!! :)
		- Only restore pages in the VM with the dirty flag set.
															vvvv
		- [512 GB pages] -> [1GB pages] -> [2MB pages] -> [4k pages]
			^ accessed       ^ A            ^A             ^A  ^D
		- 10-100x speedup

	- GOTO 10

- Code coverage
	- falkervisor uses interrupt/timer based code coverage
	- LBR is a intel and AMD feature that basicially records last branches
	  taken. Hence 'last branch recording'
		- LBR on intel stores the last 8(?) branches taken
		- LBR on AMD only stores the last branch taken
		  br_from and br_to. [0x1000] -> [0x2000]
	- IBS instuction based sampling. AMD only
		- Give it a number of 'ticks'. It counts down these ticks, and then fires
		  an interrupt after this counter hits zero.
		- Mainly for performance monitoring. Gives information on stalls, cache
		  hits and misses, branch mispredictions, etc.
		- IBS for free tells you the physical and virtual addresses for RIP
		- IBS also tells you whether the instruction was a load or a store (or neither)
		  as well as the physical AND virtual address of the load/store if it was one

	- Storage of code coverage
		- Initial falkervisor used bswap(br_to) ^ br_from
		  00001337, 00002335
		  37132335 <- code coverage 'hash'
		- Later falkervisor uses falkhash to properly hash (br_to, br_from)
		  falkhash is a 128-bit AES based hashing algorithm which is super duper fast
		  https://github.com/gamozolabs/falkhash

	- Use of code coverage
		- Each basic block has a counter associated with the number of times we've seen it
		- Early falkervisor
			- Sorted table of basic blocks based on frequency
			- Pick one of the least common 64 inputs, and use it as the base for the
			  next generation of mutation

		- Later falkervisor, sorted database was ditched!
			- Randomly select n (~16) inputs from the code covearge database.
			- Out of the 16 inputs, pick the least common one
			- Use this input as the base of the next fuzz case

- Crash coverage
	- Store each unique crash that I get
		- Initial falkervisor used unique PC
		- Later falkervisor used unique (PC, faulting address)
		- Later falkervisor used up to 10 unique faulting addresses for each PC
			- Bug shows up as null deref [0, 16KB) but then later in time shows up
			  as a non-null deref.
			- 0x0, 0x10, 0x20, ... 0x3414141414141

		- Current falkervisor stores 10 of each of the 5 groups of crashes
		  Classify the bug as one of five types of crashes.
			- Null deref        - #PF [0, 16KB)
			- Negative deref    - #PF [-16KB, 0]
			- Normal deref      - #PF any other address
			- 'ascii' deref     - #GP (non-canon memory access)
			- None of the above - !(#GP || #PF)

	- Similar to code coverage, randomly pick a crashing input to mutate with.

- Picking how to pick input base
	-  5% Original input file
	-  5% Corpus of input files (thousands if not millions of input files)
	- 80% Code coverage inputs
	- 10% Crash coverage inputs

	- Weights!

- How do I usually mutate
	- Corpus of inputs. Randomly pick data from the corpus, and inject it randomly
	  in the base input file. Splicing inputs.

- Minimization
	- Randomly delete parts/move parts/merge parts/change size of input file.
	- If it crashes in the same way as before, store this as the new 'minimal' input

- branch 'solving'
	- Look for compare instructions that have input file data present in a register.
	  cmp rax, rbx - rbx = 0x414141414141414141414141 <-- present in the input file
	      ^ equal, not equal, off by one (above and below), etc
	  movcc
	  jmpcc
	  cmp al, bl - al 0x41
	- memcmp solver

- Comparing results from different snapshots
	- I have 8 NUMA nodes which each get their own snapshot
	- Stack corruption bug that would show up in thousands of ways.
	- crashing input hash '1337'
	- broadcast to all other nodes to run '1337' through.
		node 0 - lib+32
		node 1 - lib+32
		node 2 - lib+24
		node 3 - ~~~

	- '3333'
		node 0 - lib+64
		node 1 - lib+32
		node 2 - ~~~
		node 3 - lib+8

		3333 and 1337 have lib+32 in common!
		[lib+32, lib+24, lib+64, lib+8, ~~~] -> [1337, 3333]

	- Faults of this concept
		- What if lib+32 is __stack_check(), DebugBreak(), RaiseException(), memcpy()
		- Workarounds to this fault, blacklist

	- Function flow [theory]
		- Lets say I have 2 crashes
		- minimize down the 2 crashes
		- Run each input through with full single stepping
		- What branches are taken, and what calls are made.
		- crash 1 - A() -> B() <- C() (crashes)
		- crash 2 - A() -> C() (crashes)
		- Make a set of all unique functions, and compare the sets.
		- 90% of functions match, assume the bugs are the same
		- A -> B -> C -> memcpy
		- A -> D -> F -> memcpy 2 differnt bugs

- Logic bugs
	- Speical tracing/breakpoints. LoadLibrary()
	- Put a breakpoint on LoadLibrary(), and see if user input is present in
	  the file name.
	- MmProbeAndLockPages(), the address is a user address, but access mode is KernelMode
	- Little things like this. Usually going to have to be be manually implemented as
	  plugins/modules.

- Memory covearge (relative and absolute)
	- IBS I get free load/store decodes
	- I can track what memory is being written and read from
		- I can track what blocks are making these accesses

	- relative memory coverage by using stack/heap awareness.
	- mov rax, [0x100+0x100] <- heap address is 0x100
	- mov rax, [0x200+0] <- heap address is 0x200
	- 0x100 belongs to allocation @ 0x100 of 0x20 length.
	- so this access is 0x0 bytes relative to the access

	- what if this address faults, and is out of bounds of our heap info, how can we tell?
		- page heap. make it so it would fault, and now we have the crash we wanted

	- how does this evolve?
	- mov rax, [rbx] (100 times rbx = +0x10, 10 times = +0x20, 1 time = +0x1000)

- Stack walking
	- 'kb' or 'bt' in windbg/gdb
		- With code covearge, store the stack walk that caused us to get here