tari

decomp bomb notes

Aug 9th, 2010
90
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 8.97 KB | None | 0 0
  1.  
  2. Header
  3. GZIP magic numbers
  4. | Compression method (deflate)
  5. | | Flags (FNAME,FCOMMENT)
  6. | | | Timestamp (unix epoch)
  7. | | | | XFL. Unused for us.
  8. | | | | | Creating OS (FF=unknown)
  9. | | | | | |
  10. v v v v v v
  11. 1F 8B | 08 | 18 | 00 00 00 00 | 00 | FF
  12.  
  13. More header
  14. Original file name (zero-terminated, see FNAME)
  15. | File comment, zero-terminated (see FCOMMENT)
  16. | |
  17. v v
  18. "bomb\0" | "comment\0"
  19.  
  20. Deflated stream begins here
  21. Uncompressed, final block (woop, this one's binary)
  22. | Length of literal data block [LEN] (1 byte)
  23. | | One's complement of LEN [NLEN]
  24. | | | LEN bytes of garbage
  25. | | | |
  26. v v v v
  27. 100 | 01 00 | FE FF | 00 |
  28. ----+-------+-------+----+
  29. 010100FEFF00
  30.  
  31.  
  32. First compressed block
  33. Final block made with fixed Huffman codes (binary again)
  34. | Fixed huffman for 285 or 'length code 258'
  35. | | Distance code, 1 byte back
  36. | | | More like the previous two
  37. | | | | End of block
  38. | | | | |
  39. v v v v v
  40. 101 | 11000101 | 0 | ... | 0000000
  41.  
  42. File tail:
  43. CRC32 of uncompressed data [python -c"print hex(int(\"`cksum $(FILE)`\".split()[0]))"]
  44. | |
  45. v v
  46. CRC32 | FILESIZE
  47.  
  48. So, a somewhat minimal archive:
  49. 1F8B Magic numbers for a block
  50. 08 Compressed with DEFLATE
  51. 08 Contains original file name only
  52. 00000000 Timestamp
  53. 00 No extended flags
  54. 03 Created on NTFS
  55. "LittleBoy" Original file name
  56. ==Now is the deflated stream==
  57. 01 Terminal block, it's uncompressed (elements are packed into bytes
  58. from LSB to MSB and type 0 aligns the data to a byte boundary)
  59. 0100 LEN: Size of block (little-endian)
  60. FEFF ~LEN, integrity check apparently
  61. 00 Single byte of data
  62. ==Now the tail on the stream==
  63. 48E23EFB CRC32 of the block (little-endian)
  64. 01000000 Uncompressed size of block (little-endian)
  65.  
  66. Now for something a little more interesting:
  67. 1F8B08080000000000034100 Member header as above, name is "A"
  68. ==Time for a Huffman block, given as the bitfields to stream==
  69. 1 Final block in the stream
  70. 01 Static Huffman
  71. 00110000 -> literal 0
  72. 11000101 Literal length 285 -> 258
  73. 00000 Literal distance 0 -> 1
  74. 0000000 Literal 256- end of block
  75. ==Checksum/tail==
  76. 9bbdd589 0x89d5bd9b
  77. 03010000 0x0103
  78.  
  79. ..so we can pack that into a deflate stream, as RFC 1951 says:
  80. "print out the compressed data as a sequence of bytes, starting with the
  81. first byte at the right margin and proceeding to the left with the
  82. most-significant bit of each byte on the left as usual, (such that) one
  83. would be able to parse the result from right to left, with fixed-width
  84. elements in the correct MSB-to-LSB order and Huffman codes in bit-reversed
  85. order (i.e., with the first bit of the code in relative LSB position)."
  86.  
  87. So, here's the stream packed RTL, then split into bytes and made into the byte
  88. stream.
  89. +----------------------------------------------------------------------+
  90. | /--------+- Huffman codes are packed MSB-first |
  91. | | | |
  92. | v v |
  93. | 0000000 | 00000 | 10100011 | 00001100 | 01 | 1 |
  94. | \ / \ /\ /\ / |
  95. | \___/ \_______/ \_______/ \__________/ |
  96. | | | | | |
  97. | | \__________|________ | |
  98. | | _______|________\___/ |
  99. | \______________/_______|________|_____________ |
  100. | | | | \ |
  101. | | | | | |
  102. | v v v v |
  103. | 01100011 | 00011000 | 00000101 | 0000000 |
  104. +----------------------------------------------------------------------+
  105. That works out to be 63 18 05 00 with the MSB of the last byte unused. Throw
  106. in the CRC and final size, then..
  107.  
  108. ==Block tail==
  109. 9 b b d d 5 8 9 0 3 0 1 0 0 0 0
  110. 1001 1011 1011 1101 1101 0101 1000 1001 0000 0011 0000 0001 0000 0000 0000 0000
  111. The GZIP block tail is always byte-aligned, so this is pretty easy.
  112.  
  113. So the final byte stream works out to be...
  114. 63180500 9bbdd58903010000
  115.  
  116. =====A proper bomb=====
  117. Now that we've handmade a working gzip file, let's make it explode. The simple
  118. way to do that would be simply repeating the 258-repeat sequence used above,
  119. but that's not very good, since it yields only a 2064:13 compression ratio. To
  120. reach our target of 16 TB, that would require a gzip file weighing in > 100 GB.
  121. While maybe not exactly suspicious, it's far from useful. Looking back at RFC
  122. 1951, we can encode our own Huffman tree in the stream to hopefully improve the
  123. situation.
  124.  
  125. ==Constructing a tree==
  126. We use a total of four tokens in the tree (parenthesized values are the actual
  127. value contained- recall that the byte value and backreference length alphabets
  128. are joined into the range 0-285):
  129. literal 0 (0), end-of-block (256), length code 258 (285), distance code 1
  130. We'll call those 0, S (stop), L (length), and B (backreference), respectively
  131. for some clarity. Our stream will look something like this:
  132.  
  133. 0LBLB....LBLBLBS
  134.  
  135. So knowing that, it's pretty easy to generate Huffman codes for each of these.
  136. We know that symbols 0 and S will only appear once in the stream each (at the
  137. beginning and end, respectively), while L and B will have equal frequency
  138. with the exact value approaching infinity as the archive becomes larger. With
  139. that in mind, we can generate Huffman tree to give optimal coding. Simply by
  140. reasoning through it, L and B must be very short codes, while 0 and S could be
  141. nearly anything, since they act only as constant additions to the stream.
  142.  
  143. Symbol | Appearances | /\
  144. ------ | ----------- | 0 1
  145. 0 | 1 | / \
  146. L | n | L /\
  147. B | n | 0 1
  148. S | 1 | / \
  149. | B /\
  150. | 0 1
  151. | / \
  152. | 0 S
  153.  
  154. In this way, each LB pair requires only 3 bits (compared with 13 with static
  155. Huffman encoding), more than quadrupling the compression factor, which now
  156. asymptotically approaches 2064:3 as we make the final stream longer.
  157.  
  158. blahblahblah
  159.  
  160.  
  161. =====Rzip bomb=====
  162. Rzip is a nice easy compression scheme, well-suited to encoding heavy
  163. redundancy across long intervals. We don't care so much here about long
  164. intervals, but the rzip utility then runs its output through bzip2, which will
  165. hopefully yield similar results to a pure bzip2 bomb without the manual
  166. construction complexity of bzip2.
  167.  
  168. ==Constructing the rzip==
  169. The rzip format is pretty easy. There are two types of blocks: literal and
  170. match. A literal block just encodes some data to be streamed out with no
  171. change. A match block refers back to data appearing earlier in the plaintext,
  172. up to 2^32 bytes back. The end-of-stream marker is a literal block of length
  173. 0. A CRC32 of the plaintext is tacked on the end, then rzip feeds its output
  174. into bzip2 for further compression. The skeleton of such a bomb would be as
  175. follows:
  176.  
  177. +----------------------------------------------------+
  178. | Type of block. 0=literal, 1=match. |
  179. | | Size of literal (1 byte) |
  180. | | | Literal data |
  181. | | | | |
  182. | v v v |
  183. | 00 | 01 00 | 00 | |
  184. +----+-------+----+----------------------------------+
  185. | Match block |
  186. | | Size of block (65535) |
  187. | | | Offset to copy from (back 1 byte) |
  188. | | | | |
  189. | v v v |
  190. | 01 | FF FF | 01 00 00 00 | *match block repeats* |
  191. +----+-------+-------------+-------------------------+
  192. | Zero-length literal is end-of-stream |
  193. | | CRC32 of uncompressed data |
  194. | | | |
  195. | v v |
  196. | 00 | 00 00 | CRC32 | |
  197. +----+-------+-------+-------------------------------+
Add Comment
Please, Sign In to add comment