Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Header
- GZIP magic numbers
- | Compression method (deflate)
- | | Flags (FNAME,FCOMMENT)
- | | | Timestamp (unix epoch)
- | | | | XFL. Unused for us.
- | | | | | Creating OS (FF=unknown)
- | | | | | |
- v v v v v v
- 1F 8B | 08 | 18 | 00 00 00 00 | 00 | FF
- More header
- Original file name (zero-terminated, see FNAME)
- | File comment, zero-terminated (see FCOMMENT)
- | |
- v v
- "bomb\0" | "comment\0"
- Deflated stream begins here
- Uncompressed, final block (woop, this one's binary)
- | Length of literal data block [LEN] (1 byte)
- | | One's complement of LEN [NLEN]
- | | | LEN bytes of garbage
- | | | |
- v v v v
- 100 | 01 00 | FE FF | 00 |
- ----+-------+-------+----+
- 010100FEFF00
- First compressed block
- Final block made with fixed Huffman codes (binary again)
- | Fixed huffman for 285 or 'length code 258'
- | | Distance code, 1 byte back
- | | | More like the previous two
- | | | | End of block
- | | | | |
- v v v v v
- 101 | 11000101 | 0 | ... | 0000000
- File tail:
- CRC32 of uncompressed data [python -c"print hex(int(\"`cksum $(FILE)`\".split()[0]))"]
- | |
- v v
- CRC32 | FILESIZE
- So, a somewhat minimal archive:
- 1F8B Magic numbers for a block
- 08 Compressed with DEFLATE
- 08 Contains original file name only
- 00000000 Timestamp
- 00 No extended flags
- 03 Created on NTFS
- "LittleBoy" Original file name
- ==Now is the deflated stream==
- 01 Terminal block, it's uncompressed (elements are packed into bytes
- from LSB to MSB and type 0 aligns the data to a byte boundary)
- 0100 LEN: Size of block (little-endian)
- FEFF ~LEN, integrity check apparently
- 00 Single byte of data
- ==Now the tail on the stream==
- 48E23EFB CRC32 of the block (little-endian)
- 01000000 Uncompressed size of block (little-endian)
- Now for something a little more interesting:
- 1F8B08080000000000034100 Member header as above, name is "A"
- ==Time for a Huffman block, given as the bitfields to stream==
- 1 Final block in the stream
- 01 Static Huffman
- 00110000 -> literal 0
- 11000101 Literal length 285 -> 258
- 00000 Literal distance 0 -> 1
- 0000000 Literal 256- end of block
- ==Checksum/tail==
- 9bbdd589 0x89d5bd9b
- 03010000 0x0103
- ..so we can pack that into a deflate stream, as RFC 1951 says:
- "print out the compressed data as a sequence of bytes, starting with the
- first byte at the right margin and proceeding to the left with the
- most-significant bit of each byte on the left as usual, (such that) one
- would be able to parse the result from right to left, with fixed-width
- elements in the correct MSB-to-LSB order and Huffman codes in bit-reversed
- order (i.e., with the first bit of the code in relative LSB position)."
- So, here's the stream packed RTL, then split into bytes and made into the byte
- stream.
- +----------------------------------------------------------------------+
- | /--------+- Huffman codes are packed MSB-first |
- | | | |
- | v v |
- | 0000000 | 00000 | 10100011 | 00001100 | 01 | 1 |
- | \ / \ /\ /\ / |
- | \___/ \_______/ \_______/ \__________/ |
- | | | | | |
- | | \__________|________ | |
- | | _______|________\___/ |
- | \______________/_______|________|_____________ |
- | | | | \ |
- | | | | | |
- | v v v v |
- | 01100011 | 00011000 | 00000101 | 0000000 |
- +----------------------------------------------------------------------+
- That works out to be 63 18 05 00 with the MSB of the last byte unused. Throw
- in the CRC and final size, then..
- ==Block tail==
- 9 b b d d 5 8 9 0 3 0 1 0 0 0 0
- 1001 1011 1011 1101 1101 0101 1000 1001 0000 0011 0000 0001 0000 0000 0000 0000
- The GZIP block tail is always byte-aligned, so this is pretty easy.
- So the final byte stream works out to be...
- 63180500 9bbdd58903010000
- =====A proper bomb=====
- Now that we've handmade a working gzip file, let's make it explode. The simple
- way to do that would be simply repeating the 258-repeat sequence used above,
- but that's not very good, since it yields only a 2064:13 compression ratio. To
- reach our target of 16 TB, that would require a gzip file weighing in > 100 GB.
- While maybe not exactly suspicious, it's far from useful. Looking back at RFC
- 1951, we can encode our own Huffman tree in the stream to hopefully improve the
- situation.
- ==Constructing a tree==
- We use a total of four tokens in the tree (parenthesized values are the actual
- value contained- recall that the byte value and backreference length alphabets
- are joined into the range 0-285):
- literal 0 (0), end-of-block (256), length code 258 (285), distance code 1
- We'll call those 0, S (stop), L (length), and B (backreference), respectively
- for some clarity. Our stream will look something like this:
- 0LBLB....LBLBLBS
- So knowing that, it's pretty easy to generate Huffman codes for each of these.
- We know that symbols 0 and S will only appear once in the stream each (at the
- beginning and end, respectively), while L and B will have equal frequency
- with the exact value approaching infinity as the archive becomes larger. With
- that in mind, we can generate Huffman tree to give optimal coding. Simply by
- reasoning through it, L and B must be very short codes, while 0 and S could be
- nearly anything, since they act only as constant additions to the stream.
- Symbol | Appearances | /\
- ------ | ----------- | 0 1
- 0 | 1 | / \
- L | n | L /\
- B | n | 0 1
- S | 1 | / \
- | B /\
- | 0 1
- | / \
- | 0 S
- In this way, each LB pair requires only 3 bits (compared with 13 with static
- Huffman encoding), more than quadrupling the compression factor, which now
- asymptotically approaches 2064:3 as we make the final stream longer.
- blahblahblah
- =====Rzip bomb=====
- Rzip is a nice easy compression scheme, well-suited to encoding heavy
- redundancy across long intervals. We don't care so much here about long
- intervals, but the rzip utility then runs its output through bzip2, which will
- hopefully yield similar results to a pure bzip2 bomb without the manual
- construction complexity of bzip2.
- ==Constructing the rzip==
- The rzip format is pretty easy. There are two types of blocks: literal and
- match. A literal block just encodes some data to be streamed out with no
- change. A match block refers back to data appearing earlier in the plaintext,
- up to 2^32 bytes back. The end-of-stream marker is a literal block of length
- 0. A CRC32 of the plaintext is tacked on the end, then rzip feeds its output
- into bzip2 for further compression. The skeleton of such a bomb would be as
- follows:
- +----------------------------------------------------+
- | Type of block. 0=literal, 1=match. |
- | | Size of literal (1 byte) |
- | | | Literal data |
- | | | | |
- | v v v |
- | 00 | 01 00 | 00 | |
- +----+-------+----+----------------------------------+
- | Match block |
- | | Size of block (65535) |
- | | | Offset to copy from (back 1 byte) |
- | | | | |
- | v v v |
- | 01 | FF FF | 01 00 00 00 | *match block repeats* |
- +----+-------+-------------+-------------------------+
- | Zero-length literal is end-of-stream |
- | | CRC32 of uncompressed data |
- | | | |
- | v v |
- | 00 | 00 00 | CRC32 | |
- +----+-------+-------+-------------------------------+
Add Comment
Please, Sign In to add comment