sewer56lol

Some PRS Documentation

Jul 12th, 2018
729
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. /*
  2.     || A Brief History
  3.  
  4.     It's 2018 and PRS; a rather commonly used non-supertrivial compression format has either
  5.     never seen a piece of documentation or the documentation was lost through time.
  6.  
  7.     This has happened despite the first decompressor appearing around in the early 2000s (Nemesis' prsdec),
  8.     which (did not?) appear to have a release of the source code and also shipped an anti-compressing
  9.     compressor (basically an encoder that produced format compatible files, albeit larger as it did not attempt compression).
  10.  
  11.     A few years later, one named fuzziqer has also produced a compressor and decompressor combo, this time with a source release
  12.     and a compressor which did actual compression. While the source did not ship with a license, and the author stated that
  13.     using the source freely was fine as long as he would have been credited for the compression and decompression code,
  14.     the source was not very well documented - as to truely describe the format and was not trivial to follow.
  15.  
  16.     Aaaand well... FraGag ported fuzziqer's utility to .NET and that's about it... nothing...
  17. */ 
  18.  
  19. /*
  20.     || Why?
  21.  
  22.     Well, after a good while, the speed of the .NET implementation of PRS Compression/Decompression
  23.     has started to bother me when using existing modding tools for Sonic Heroes and Sonic Adventure,
  24.     two of SEGA published titles which have used this compression scheme employed since the era of the
  25.     failed SEGA Saturn.
  26.  
  27.     Quite frankly, I thought that they could do with just a tiiny bit of optimization and/or perhaps
  28.     simply more options for compression intensity - but the moment I started researching... well.. above.
  29.  
  30.     No documentation, only one piece of not yet fully optimized, uncommented source code and a port of it...
  31.     and not much more - which surprised me especially given how much these libraries have been used over the
  32.     overall course of time.
  33.  
  34.     I originally only planned to write a PRS Compressor/Decompressor in D as a small challenge, and to learn
  35.     a bit of the programming language by writing my first program with it... However after I've seen the whole situation,
  36.     and began to see possible optimizations as I dug through the non-trivial to understand source code - I thought
  37.     it would just be straight up better to document the format entirely, while providing a new, better implementation
  38.     for the communities.
  39. */
  40.  
  41. /*
  42.     || PRS Specifics: Summary
  43.  
  44.     PRS is a LZ77 based compression algorithm with some RLE shenanigans whereby the file is split into blocks
  45.     consisting of control bytes + data.
  46.        
  47.     (ctrlByte, byte[]) (ctrlByte, byte[]) (ctrlByte, byte[])
  48.  
  49.     The control bytes contain variable length codes stored as bits (max length: 2 bits) which tell the decoder how
  50.     to treat the individual bytes of data up till the next control byte. They are read in a right to left order,
  51.     i.e. with `controlByte & 1` and bitshifted right in order to acquire the next entry.
  52.  
  53.     With that said, each block has an undefined data length in that there are no headers in any of the blocks in question.
  54.     As variable length codes are added onto the control bytes from right to left, the control byte is written,
  55.     following by the data related to it once the individual control byte is full.
  56.  
  57.     The variable length codes themselves consist of 3 modes, a direct byte copy, short copy and two variants of long copy as is
  58.     to be seen below using run length encoding (size & offset). The dictionary from which the copying is performed is actually
  59.     the last N bytes of data, hence this is also an algorithm using a sliding window for compression.
  60. */
  61.  
  62. /*
  63.     || PRS Specifics: Control Byte Variable Length Codes
  64.  
  65.     The following is a listing of the various variable length codes packed into the individual control bytes:
  66.  
  67.     Mode 0 (Direct Byte)           : 1
  68.     Mode 1 (Short Copy)            : 00XX           (XX stores the size)
  69.     Mode 2 (Long Copy, Small Size) : 01
  70.     Mode 3 (Long Copy, Large Size) : 01             (This is not a typo, you will learn why later)
  71.  
  72.     The individual variable length codes also contain some additional information appended as regular bytes,
  73.     these vary on the individual mode used.
  74. */
  75.  
  76. /*
  77.     || PRS Specifics: Decoding a control byte example.
  78.  
  79.     As specified before, the bits in the control bytes are read from right to left order, therefore using the table
  80.     of variable length codes above we can decode a control byte as such:
  81.  
  82.     Example control byte (binary) | 0101 1001
  83.  
  84.     1    - Direct Byte
  85.     00XX - Short Copy
  86.     01   - Long Copy
  87.     0.   - Short or Long Copy (the first bit of the next control byte would decide)
  88.    
  89.     In the case of the last opcode, the actual next control byte would be the next byte in a valid PRS file,
  90.     assuming the byte array/file pointer has been correctly moved after each successive short/long
  91.     and direct byte copy.
  92. */
  93.  
  94. /*
  95.     || PRS Mode 0: Direct Byte (Opcode 1)
  96.    
  97.     A single byte is to be read from the data stream/array at the current pointer position.
  98.     That's it.
  99.  
  100.     Decoding (Simplified):
  101.         if (bit == 1)
  102.             read and append byte to file stream
  103.        
  104.     Encoding (Simplified):
  105.         Append Control Byte Variable Length Code: `1` and a byte to the file stream.
  106. */
  107.  
  108. /*
  109.     || PRS Mode 1: Short Copy (Opcode 00)
  110.        
  111.     The length of the copy is in the range of 2-5 inclusive (2, 3, 4, 5) and stored directly inside the
  112.     opcode read from the control byte (albeit as a 0-3 value, as the minimum copy length implicitly is 2,
  113.     thus the value stored is `actual length - 2`).
  114.  
  115.     The actual offset for where to copy from is appended as a singular byte and is stored as 256 - positive offset,
  116.     i.e. assuming the offset is `-5`, it would be stored as 256 - (-5*-1) = 251.
  117.  
  118.     Quick ways to convert to `256 - positive offset` format include `(offset & 0xFF)` and simply adding 256 to the number
  119.     `offset + 256`.
  120.  
  121.     || || Example :
  122.  
  123.     Opcode  : 00XX
  124.     Could be: 0010
  125.  
  126.     Which yields a size of 10 (binary) which is 2 (decimal).
  127.     Offset the value back by 2 to get the actual length; 2 + 2 = 4.
  128.  
  129.     Then to obtain the offset, you would just read the next byte from
  130.     the stream/file/byte array.
  131. */
  132.  
  133. /*
  134.     || PRS Mode 2: Long Copy, Small Size (Opcode 01)
  135.    
  136.     An offset and size combination is written directly as 2 bytes
  137.     to the array/file/stream with the offset taking 13 bits and size 3 bits.
  138.  
  139.     The maximum offset is thus 0x1FFF and size 0-7 bytes (albeit actually 2-9, as the
  140.     2 offset is applied here like before).
  141.  
  142.     The actual 2 bytes are packed as such:
  143.     XXXX XXXX XXXX XYYY
  144.  
  145.     Where the X represent the offset and Y represent the size.
  146.  
  147.     The two bytes are written with the rightmost 8 bits written first and the
  148.     leftmost 8 bits written last, thus is in Big Endian order.
  149.  
  150.     ----------
  151.    
  152.     If the size is greater than 9, then you fall back to Mode 3 for compression
  153.     as a greater size could not fit the 3 bits.
  154.  
  155.     In this mode, you cannot also encode a size of 2, as taking away 2 and writing it
  156.     would cause 000 to be written to the size part of the short - which is reserved
  157.     for falling back to PRS Mode 3 (See below).
  158.  
  159.     || || Example :
  160.    
  161.     Offset of 2100 and Size of 5:
  162.  
  163.         2100 (decimal) = 0000 ‭1100 0001 1100‬
  164.         3    (decimal) = 0000 0000 0000 ‭0011‬                  // 5 - 2, offset applied
  165.  
  166.     Packed in the XXXX XXXX XXXX XYYY format:
  167.         0‭110 0000 1110 0000‬           (2100 left shifted by 3)
  168.         0000 0000 0000 ‭0011‬           (3)
  169.  
  170.     Final result:
  171.         0‭110 0000 1110 0011
  172.  
  173.     First written byte would equal 1110 0011 and the second written byte would equal 0‭110 0000.
  174. */
  175.        
  176. /*
  177.     || PRS Mode 3: Long Copy, Large Size
  178.    
  179.     A fallback of PRS Mode 2 where the size is greater than 9.
  180.  
  181.     Used by the decoder when the size in the first (right to left) 3 bits of mode 2
  182.     reads 0 (i.e. would be 2 (decimal) after incrementing).
  183.  
  184.     An additional byte is appended onto the stream/file/byte array which contains
  185.     the length of the match (size) minus 1.
  186. */
  187.  
  188. /*
  189.     The file is terminated with the variable length code 01 (long copy) and bytes 00 00
  190.     (no offset, no size).
  191.  
  192.     Nothing more needs to be explained that the reader could not come into conclusion with
  193.     themselves or figure out from the source code.
  194. */
RAW Paste Data

Adblocker detected! Please consider disabling it...

We've detected AdBlock Plus or some other adblocking software preventing Pastebin.com from fully loading.

We don't have any obnoxious sound, or popup ads, we actively block these annoying types of ads!

Please add Pastebin.com to your ad blocker whitelist or disable your adblocking software.

×