SELF-PROPAGATING [HEAP MEMORY] CRAWLER in x86-64 Linux Assembly
Over the past couple days I've been playing with x86-64 Linux Assembly and after the obligatory "Hello World!" program and some C compilation and disassembly, I wanted to do something a bit more challenging and fun. In the spirit of "you don't own a system until you control it", and with the idea of possible future usability, I decided on a little program that would copy itself to (heap) memory,
direct execution there, and loop from there, writing itself below itself in higher addresses, then direct execution there, and so on.
I found lots of different things on the internet, but not too many x86-64 examples, and even less that wasn't either real simple stuff or too massive to follow for a newbie. So, maybe this helps someone else with a more hacky exercise. ;)
*** DISCLAIMER ***
This is just a little training/learning exercise or proof of concept to see what I could do and if I could abuse/hack the heap this way that I thought would be of interest to others. You should only run this in the debugger. If you just let this run - especially if you remove the debug interrupt (INT 3) - you probably shouldn't be doing this exercise… I also realize this code is probably neither smart or optimal to anyone who really knows assembly. I sometimes had to research and try out different instructions, and I am happy enough it works! Finally, if you learn assembly in a proper school, your teacher would probably consider this to be an example of what NOT to do.
WHAT YOU NEED
Everything you need should already be on your Linux x86-64 system (especially if you write code):
- nasm
- ld
- objdump
- gdb
OK. So, I didn't want to use the stack and instead use a memory section that was read/write, and then copy/run from there. That means I need a variable in the .bss section. Mostly for me to find my way through memory, I am writing the contents of a string there defined in the .data section (which is read only), but in hindsight, in the finished code that's probably not even necessary. This is to get the address of the heap, and where we will start copying.
The following registers are used for:
- RBX: "instruction copy pointer", i.e. keeps track of where to copy from
- RDI: "heap instruction pointer", i.e. keeps track of where copied instructions start
- RDX: "heap copy pointer", i.e. keeps track of where the next byte needs to be copied
- AL is used to copy the bytes
RCX is used in two different ways - just a feature, no particular reason - first in _start to copy the string into the heap, within the loop to count down the number of bytes to copy.
After figuring out where the instruction pointer is (RIP)+8 (we want to start copying at mov rdi, rdx), we copy the bytes one by one into the heap until the loop counter (RCX) is 0. The code then jumps to the first copied instruction on the heap (RDI), right after we set the instruction copy pointer to the start of the new code as well.
Here is the .asm code:
****************************
section .data
string1 db 'ACABACAB',10,0
section .bss
memry_block resq 1
section .text
global _start
_start:
int 3
int 3 ; pause for debugger
lea rdi, [rel string1] ; load address for our 'ACAB' into RDI
lea rdx, [rel memry_block] ; load address of our heap variable into RDX
mov rcx, [rdi] ; move the 'ACAB' into RCX
mov [rdx], rcx ; move the 'ACAB' into the heap
; 'ACAB' are just to start up - will copy itself... ;)
xor rcx,rcx ; blank out RCX
_writeheap:
lea rbx, [$+8] ; copy instruction pointer
mov rdi, rdx ; copy address of heap into heap instruction pointer
mov rcx,0x2e ; countdown for 46 bytes copied (starting with the instruction above)
_looper:
mov al, [rbx] ; move itself byte by byte to AL
mov [rdx], al ; move our opcode from AL to heap
inc rbx ; set the "ip" one ahead
inc rdx ; set the heap pointer one ahead
dec rcx ; decrease counter by 1
cmp rcx,0x0 ; check whether counter is 0 yet
jg _looper ; jump back to copier loop
mov rbx, rdi ; set the copy instruction pointer to start of copied code
int 3 ; debug halt
jmp rdi ; jump to heap newly written code
nop
nop
nop
nop
int 3
; exit from the application here
xor rdi,rdi
push 0x3c
pop rax
syscall
***********************************
The system exit syscall is never reached. The NOPs are just for padding and are also never reached. There is an INT 3 right before the jump to copied code so the debugger halts and you can inspect registers, inspect the bytes written, and print a list of instructions. There are also a couple at the start, so you can do a 'disas' in gdb and see initial state. Obviously, don't remove the INT 3 inside the loop...
I had the hardest time figuring out how to do a "lea rbx,[$]" from the heap to get the instruction pointer, once I was there, because it kept referring back to the original read-only code in the .text block we get in the setup before the loop. The rest of the code still worked: executing going to the heap, copying the code into the next block, redirecting instruction there, and so on, but it would copy again and again the _original_ code, not the last copy.
I then realized that I didn't actually have to use the 'lea' instruction at all, as long as I set RBX to RDI before the jump but after the copy loop ends. That means that I needed to add a +8 byte offset (lea rbx, [$+8] ) so that the copying starts at mov rdi, rdx. Now, execution literally jumps from the .text segment to the top of the heap, copies itself below itself, jumps there, again and again, and therefore crawl all the way till the end of memory or some sort of memory fault is encountered. (I never got one, as I quit out of gbd before that). That's pretty cool… Well, at least I think so. :)
You really have to run this to see what it does. Run the following commands (assuming you saved the above as crawler.asm). objdump gives us the opcodes and instructions (AT&T):
-----
root@:/mydev/assembly# nasm -f elf64 -o crawler.o crawler.asm
root@:/mydev/assembly# ld -o crawler crawler.o
root@:/mydev/assembly# objdump -D crawler
crawler: file format elf64-x86-64
Disassembly of section .text:
00000000004000b0 <_start>:
4000b0: cd 03 int $0x3
4000b2: cd 03 int $0x3
4000b4: 48 8d 3d 55 00 20 00 lea 0x200055(%rip),%rdi # 600110 <string1>
4000bb: 48 8d 15 5a 00 20 00 lea 0x20005a(%rip),%rdx # 60011c <memry_block>
4000c2: 48 8b 0f mov (%rdi),%rcx
4000c5: 48 89 0a mov %rcx,(%rdx)
4000c8: 48 31 c9 xor %rcx,%rcx
00000000004000cb <_writeheap>:
4000cb: 48 8d 1c 25 d3 00 40 lea 0x4000d3,%rbx
4000d2: 00
4000d3: 48 89 d7 mov %rdx,%rdi
4000d6: 48 b9 2e 00 00 00 00 mov $0x2e,%rcx
4000dd: 00 00 00
00000000004000e0 <_looper>:
4000e0: 8a 03 mov (%rbx),%al
4000e2: 88 02 mov %al,(%rdx)
4000e4: 48 ff c3 inc %rbx
4000e7: 48 ff c2 inc %rdx
4000ea: 48 ff c9 dec %rcx
4000ed: 48 81 f9 00 00 00 00 cmp $0x0,%rcx
4000f4: 7f ea jg 4000e0 <_looper>
4000f6: 48 89 fb mov %rdi,%rbx
4000f9: cd 03 int $0x3
4000fb: ff e7 jmpq *%rdi
4000fd: 90 nop
4000fe: 90 nop
4000ff: 90 nop
400100: 90 nop
400101: cd 03 int $0x3
400103: 48 31 ff xor %rdi,%rdi
400106: 68 3c 00 00 00 pushq $0x3c
40010b: 58 pop %rax
40010c: 0f 05 syscall
Disassembly of section .data:
0000000000600110 <string1>:
600110: 41 rex.B
600111: 43 rex.XB
600112: 41 rex.B
600113: 42 rex.X
600114: 41 rex.B
600115: 43 rex.XB
600116: 41 rex.B
600117: 42 0a 00 rex.X or (%rax),%al
Disassembly of section .bss:
000000000060011c <memry_block>:
------
Now run it in gdb. I set the disassembly-flavor to intel to match the format used in the assembly code, that's just a little easier. I then run the program and continue past the first INT 3 (for some reason I needed two for them to actual work at the start). If you want, you can check memory at the start of the program, of course. Here, we'll continue to the INT 3 in the first loop, while we're still in the original code in .text.
--------
root@:/mydev/assembly# gdb -q crawler
Reading symbols from /mydev/assembly/crawler...(no debugging symbols found)...done.
(gdb) run
Starting program: /mydev/assembly/crawler
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000000004000b4 in _start ()
(gdb) set disassembly-flavor intel
(gdb) c
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000000004000fb in _looper ()
(gdb) disas
Dump of assembler code for function _looper:
0x00000000004000e0 <+0>: mov al,BYTE PTR [rbx]
0x00000000004000e2 <+2>: mov BYTE PTR [rdx],al
0x00000000004000e4 <+4>: inc rbx
0x00000000004000e7 <+7>: inc rdx
0x00000000004000ea <+10>: dec rcx
0x00000000004000ed <+13>: cmp rcx,0x0
0x00000000004000f4 <+20>: jg 0x4000e0 <_looper>
0x00000000004000f6 <+22>: mov rbx,rdi
0x00000000004000f9 <+25>: int 0x3
=> 0x00000000004000fb <+27>: jmp rdi
0x00000000004000fd <+29>: nop
0x00000000004000fe <+30>: nop
0x00000000004000ff <+31>: nop
0x0000000000400100 <+32>: nop
0x0000000000400101 <+33>: int 0x3
0x0000000000400103 <+35>: xor rdi,rdi
0x0000000000400106 <+38>: push 0x3c
0x000000000040010b <+43>: pop rax
0x000000000040010c <+44>: syscall
End of assembler dump.
----------
We're just about to jump to the heap (jump rid). All of our code should now be written there, and we can check with x/48xb $rdi (which should be the start of the code):
---------
(gdb) x/48xb $rdi
0x60011c <memry_block>: 0x48 0x89 0xd7 0x48 0xb9 0x2e 0x00 0x00
0x600124 <memry_block+8>: 0x00 0x00 0x00 0x00 0x00 0x8a 0x03 0x88
0x60012c: 0x02 0x48 0xff 0xc3 0x48 0xff 0xc2 0x48
0x600134: 0xff 0xc9 0x48 0x81 0xf9 0x00 0x00 0x00
0x60013c: 0x00 0x7f 0xea 0x48 0x89 0xfb 0xcd 0x03
0x600144: 0xff 0xe7 0x90 0x90 0x90 0x90 0x00 0x00
----------
Those are our opcode bytes. Here's a fragment of the objdump for easier comparison:
---------
4000d3: 48 89 d7 mov %rdx,%rdi
4000d6: 48 b9 2e 00 00 00 00 mov $0x2e,%rcx
…..
4000f4: 7f ea jg 4000e0 <_looper>
4000f6: 48 89 fb mov %rdi,%rbx
4000f9: cd 03 int $0x3
4000fb: ff e7 jmpq *%rdi
4000fd: 90 nop
……
---------
If you run this the first time, you'll want to 'next i/n i' for the next couple of instructions to see execution jump to the heap, check the registers, see what changes, and watch the next couple of instructions get executed. Here we'll just continue, so we're right before the second jump. (Note that gdb has no idea anymore what the function might be called ;) ):
--------
(gdb) c
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x0000000000600144 in ?? ()
(gdb) x/16i $rip-24
0x60012c: add cl,BYTE PTR [rax-0x1]
0x60012f: ret
0x600130: inc rdx
0x600133: dec rcx
0x600136: cmp rcx,0x0
0x60013d: jg 0x600129
0x60013f: mov rbx,rdi
0x600142: int 0x3
=> 0x600144: jmp rdi
0x600146: nop
0x600147: nop
0x600148: nop
0x600149: nop
0x60014a: mov rdi,rdx
0x60014d: movabs rcx,0x2e
0x600157: mov al,BYTE PTR [rbx]
(gdb) i r
rax 0x90 144
rbx 0x60014a 6291786
rcx 0x0 0
rdx 0x600178 6291832
rsi 0x0 0
rdi 0x60014a 6291786
rbp 0x0 0x0
rsp 0x7fffffffe6e0 0x7fffffffe6e0
…
rip 0x600144 0x600144
-----------------
You should be able to see from the memory addresses that we're now executing on the heap. You can also see the new copy of itself starting at 0x60014a. RAX (AL) still contains the last 0x90 (NOP). In the previous instruction RBX has been set to the start of the copy code, RCX (loop counter) is 0, and RDX points to where the next copy will be written. In the jump we start the instruction that sets RDI to RDX, so we can repeat this process.
If we press 'c' a couple of times, we see that we're moving down in memory (note the snip and jump in memory address):
---------------
...
(gdb) c
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000000006002b4 in ?? ()
(gdb) x/128i 0x60011c
0x60011c <memry_block>: mov rdi,rdx
0x60011f <memry_block+3>: movabs rcx,0x2e
0x600129: mov al,BYTE PTR [rbx]
0x60012b: mov BYTE PTR [rdx],al
0x60012d: inc rbx
0x600130: inc rdx
0x600133: dec rcx
0x600136: cmp rcx,0x0
0x60013d: jg 0x600129
0x60013f: mov rbx,rdi
0x600142: int 0x3
0x600144: jmp rdi
0x600146: nop
0x600147: nop
0x600148: nop
0x600149: nop
0x60014a: mov rdi,rdx
0x60014d: movabs rcx,0x2e
0x600157: mov al,BYTE PTR [rbx]
0x600159: mov BYTE PTR [rdx],al
0x60015b: inc rbx
0x60015e: inc rdx
0x600161: dec rcx
0x600164: cmp rcx,0x0
0x60016b: jg 0x600157
0x60016d: mov rbx,rdi
0x600170: int 0x3
0x600172: jmp rdi
0x600174: nop
0x600175: nop
0x600176: nop
….
0x60028a: nop
0x60028b: nop
0x60028c: mov rdi,rdx
0x60028f: movabs rcx,0x2e
0x600299: mov al,BYTE PTR [rbx]
0x60029b: mov BYTE PTR [rdx],al
0x60029d: inc rbx
0x6002a0: inc rdx
0x6002a3: dec rcx
0x6002a6: cmp rcx,0x0
0x6002ad: jg 0x600299
0x6002af: mov rbx,rdi
0x6002b2: int 0x3
=> 0x6002b4: jmp rdi
0x6002b6: nop
0x6002b7: nop
0x6002b8: nop
0x6002b9: nop
0x6002ba: mov rdi,rdx
0x6002bd: movabs rcx,0x2e
0x6002c7: mov al,BYTE PTR [rbx]
0x6002c9: mov BYTE PTR [rdx],al
0x6002cb: inc rbx
0x6002ce: inc rdx
…
--------------
By now it should be clear that if you remove the INT 3 from the loop, this will just keep copying itself all the way through memory until….??? (not tested).
I enjoyed the hell out of this and was a lot more fun and instructive than reproducing a program that calculates some numbers from input, so hope this helps someone else starting out.