Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- The ebx are obviously weakly compressed whereas they used to be uncompressed throughout bf3 and in the bf4 alpha. Compare the keyword sections:
- levellistreport.ebx:
- alpha:
- DataContainer.Asset.$.Name.LevelReportingAsset.array.member.BuiltLevels
- beta:
- DataContainer.Asset.$.Name.LevelReporting....array.member.Built&..s.
- In detail with the dots replaced with the actual hex:
- LevelReporting 1B00F103 array
- => 1B00F103 equals Asset\x00
- It's just simple substitution stuff, try the common weak compressions. lzo or snappy, or maybe zlib on very low compression (or rather not lol, very unlikely)
- Offset of Asset: 0x78 (0x78 refers to 78 in hex, i.e. 120 decimal)
- Offset of where Asset would start after LevelReporting (before 1b): 0x93
- Difference: 0x1b
- dataversion.ebx:
- alpha:
- DataContainer.Asset.$.Name.VersionData.disclaimer.Version.DateTime.BranchId.GameName
- beta:
- DataContainer.Asset.$.Name.Version"...disclaimer...:...eTime.BranchId.Game:..]
- In detail:
- Version 2200B400 disclaimer
- => 2200B400 can mean either:
- 2200b4 (which somehow equals Data) and a nullbyte
- or 2200b400 on its own (which somehow equals Data\x00).
- Note that levellistreport required 4 bytes too but already got the nullbyte from the Asset\x00 string.
- Offset of Data: 0x3f
- Offset of where Data would start after Version: 0x61
- Difference: 0x22
- So it might be 2200B400 which means:
- move back 0x22 bytes and grab 5 bytes
- or it could be 2200B4 which means:
- move back 0x22 bytes and grab 4 bytes (with the final byte given directly)
- snappy compression algorithm says:
- Copies are references back into previous decompressed data, telling
- the decompressor to reuse data it has previously decoded.
- They encode two values: The _offset_, saying how many bytes back
- from the current position to read, and the _length_, how many bytes
- to copy.
- =>looks promising
- snappy also mentions (zlib, LZO, LZF, FastLZ, and QuickLZ), might be worth checking out the various LZs
- LZO says:
- LZO is a block compression algorithm - it compresses and decompresses
- a block of data. Block size must be the same for compression
- and decompression.
- LZO compresses a block of data into matches (a sliding dictionary)
- and runs of non-matching literals.
- This is basically the entire documentation regarding how it actually works, which is probably(?) similar to snappy (I don't understand the documentation).
- Ignore the compressed parts for a moment and consider the header (the very first few bytes in the file).
- The payload is compressed and has a header of a few bytes placed in front of it.
- When compression is involved the header is pretty much guaranteed to contain both the decompressed size and the compressed size.
- Compression header (i.e. before the actual compressed payload):
- 4 bytes: 0000 01d0, decompressed size?
- 2 bytes: 09 70??
- 2 bytes: 015b, size of everything after it till EOF (end-of-file) for small files
- 2 bytes: f102?
- 0970 was the same in a couple of files, check for constancy of 0970 @offset 4:
- import os
- #go through all files in the ebx folder
- for dir0, dirs, ff in os.walk("ebx"):
- for fnames in ff:
- fname=dir0+"\\"+fnames
- f=open(fname,"rb")
- f.read(4)
- if f.read(2)!="\x09\x70":
- print fname
- NO HIT => CONSTANT FOR ALL BETA EBX
- The sum of the 2 integers after the ebx magic (ced1b20f) is the (decompressed) file size. Usually the ints
- are untouched as the files are only weakly compressed. The sum of them is equal to the first 4 bytes in the file, confirming that this indeed is the decompressed size.
- Compression header in big endian (big endian is when a hex number is written in a normal way from left to right, little endian inverts the order of the bytes):
- 4 bytes: decompressed size
- 2 bytes: 0970
- 2 bytes: 015b, size of everything after it till EOF (for small files)
- 2 bytes: f102, possibly part of the compressed payload
- What happens when is file too large for 2 byte size?
- Two possibilities:
- 1) Some varint stuff (with pairs of 2 bytes?), so when the first bit is 1, then read two more bytes. Rather unlikely, never seen any varints working with pairs of 2 bytes before.
- 2) Compressed in small blocks with max ffff bytes, one block after another. Could be that 0970 is the start of one package which would also align the start of the first section to a multiple of 4.
- materialgrid contains 0970 eight times, spaced apart by the the block size => option 2.
- This is very similar to the fb2 zlib format (or maybe it is even zlib with low compression).
- The last two bytes are really part of the payload and not part of the header. Some files only have one byte there before the ebx magic.
- => The file consists of several blocks, with no global metadata.
- The blocks are set to have a size of 0x010000 when decompressed, except for the last one which is usually smaller.
- Compressed block (big endian):
- 4 bytes: decompressed size (0x10000 or less)
- 2 bytes: 0970
- 2 bytes: compressed size
- compressed payload
- Decompress each block and glue the decompressed parts together to obtain the file.
- Maybe it is zlib at weak compression, try compressing a string at the various compression levels (from 0 to 9) to get an idea what it looks like:
- import zlib
- from binascii import hexlify
- string="adgfasdfavasdfasdf00000000"
- for i in xrange(10):
- hexlify(zlib.compress(string,i))
- '7801011a00e5ff6164676661736466617661736466617364663030303030303030858708c4'
- '78014b4c494f4b2c4e494b2c0393409601140000858708c4'
- '785e4b4c494f4b2c4e494b2c0393406c000500858708c4'
- '785e4b4c494f4b2c4e494b2c0393406c000500858708c4'
- '785e4b4c494f4b2c4e494b2c0393406c000500858708c4'
- '785e4b4c494f4b2c4e494b2c0393406c000500858708c4'
- '789c4b4c494f4b2c4e494b2c0393406c000500858708c4'
- '78da4b4c494f4b2c4e494b2c0393406c000500858708c4'
- '78da4b4c494f4b2c4e494b2c0393406c000500858708c4'
- '78da4b4c494f4b2c4e494b2c0393406c000500858708c4'
- Nope, zlib starts with 78 (usually 78da because default compression is set pretty high). You may want to connect 78da with zlib in your mind. It's used in many archives.
- Try to decompress anyway, run through all possible slices of a small file and see if it can decompress. If it cannot then the file is probably not zlib.
- import zlib
- f=open("levellistreport.ebx","rb")
- size=355 #size of the file
- for i in xrange(size):
- for j in xrange(size):
- f.seek(i)
- data=f.read(j) #when j is greater than the number of the remaining bytes in the file,
- #it doesn't cause an error but just gives back everything till the end of the file
- try:
- data2=zlib.decompress(data) #try to decompress it (usually it will complain about the format being invalid
- if len(data2)!=0: #make sure that there's actually something there when decompressed
- print i,j,len(data2)
- except: continue
- No output at all, so disregard zlib.
- snappy (has only one compression level):
- Grabbed the libraries from http://www.lfd.uci.edu/~gohlke/pythonlibs/
- Take the script from before and replace the "zlib" with "snappy" (Python is simple), yielding
- 15 3 1
- 22 5 2
- 62 3 1
- 216 7 3
- 256 4 2
- 319 35 32
- 347 3 1
- So it only gives back small random segments out of it with a size of max 32 bytes. No snappy.
- lzo:
- Exactly the same script as before but with lzo instead. Always fails. No lzo.
- wiki mentions some lossless algorithms
- Run-length encoding (RLE) – a simple scheme that provides good compression of data containing lots of runs of the same value.
- Lempel-Ziv 1978 (LZ78), Lempel-Ziv-Welch (LZW) – used by GIF images and compress among many other applications
- DEFLATE – used by gzip, ZIP (since version 2.0), and as part of the compression process of Portable Network Graphics (PNG), Point-to-Point Protocol (PPP), HTTP, SSH
- bzip2 – using the Burrows–Wheeler transform, this provides slower but higher compression than DEFLATE
- Lempel–Ziv–Markov chain algorithm (LZMA) – used by 7zip, xz, and other programs; higher compression than bzip2 as well as much faster decompression.
- Lempel–Ziv–Oberhumer (LZO) – designed for compression/decompression speed at the expense of compression ratios
- Statistical Lempel Ziv – a combination of statistical method and dictionary-based method; better compression ratio than using single method.
- Run-length encoding (RLE) => nope, too simple
- Lempel-Ziv 1978 (LZ78), Lempel-Ziv-Welch (LZW) => probably LZ77; LZ78 and LZW are different again
- DEFLATE – Each block is preceded by a 3-bit header. => nope, there is no 3 bit header.
- bzip2 => unlikely (high compression)
- Lempel–Ziv–Markov chain algorithm (LZMA) => unlikely (even higher compression)
- Lempel–Ziv–Oberhumer (LZO) => nope, just tried it
- Statistical Lempel Ziv => very novel and definitely uncommon; nope
- wiki on LZ77: In the implementation used for many games by Electronic Arts,[4] the size in bytes of a length-distance pair can be specified inside
- the first byte of the length-distance pair itself; depending on if the first byte begins with a 0, 10, 110, or 111 (when read in big-endian bit orientation),
- the length of the entire length-distance pair can be 1 to 4 bytes large.
- [4]: http://wiki.niotso.org/QFS_compression (Niotso is a semi-collaborative effort to re-implement the engine used in The Sims Online.)
- Googling ea and LZ77 got me here http://www.vgleaks.com/world-exclusive-durangos-move-engines/
- "The Xbox One (Durango) GPU includes a number of fixed-function accelerators. Move engines are one of them.
- Xbox One (Durango) hardware has four move engines for fast direct memory access (DMA)
- This accelerators are truly fixed-function, in the sense that their algorithms are embedded in hardware.
- They can usually be considered black boxes with no intermediate results that are visible to software.
- When used for their designed purpose, however, they can offload work from the rest of the system and obtain useful results at minimal cost."
- The Xbox One has one move engine for encoding and one for decoding LZ77.
- So, some LZ77 variant was probably chosen in preparation for the Xbox One. The Xbox finally gets rid of that proprietary XMA audio codec implemented via hardware
- that made it impossible for a long time for anyone to decode Xbox audio (until some russians managed to get a hold of some code IIRC), but now it has an
- LZ77 variant that is done via hardware and will most likely never be documented anywhere. meh.
- Apply the info from niotso to the levellistreport.ebx:
- Recall that the string is: DataContainer.Asset.$.Name.LevelReporting....array.member.Built&..s.
- Detail:
- LevelReporting 1B00F103 array
- => 1B00F103 equals Asset\x00
- 1b00f103 is a 4 bytes opcode.
- in binary: 00011011 00000000 11110001 00000011
- niotso says for the individual bits in a 4byte opcode:
- 110ORRPP OOOOOOOO OOOOOOOO RRRRRRRR
- Note the O in the first byte. Looks like some attempt at obfuscation to me (would make more sense to have it on the right in the first bit close to the other Os).
- O: Offset, move backwards by this amount of bytes and start copying a certain number of bytes following that position.
- R: Length, how many bytes to copy. If the length is larger than the offset, start at the offset again and copy the same values again.
- P: The engine needs a way to know if what it sees is data (that may happen to look similar to an opcode) and what is opcode. So this value
- tells the distance to the next opcode, with everything in between being ordinary uncompressed data. Proceed this distance.
- This can only be a value up to 3, which is far too small. Therefore it's possible to add just one more byte after the opcode to increase that distance.
- As the offset 1b is in fact on the left in the code, it does not match the niotso 4byte opcode which requires the offset in the middle.
- Maybe it is a 3byte opcode with an extra byte. Will investigate this later.
- What's more, the offset is the very first thing to appear, so the engine must have some idea how many bytes to read.
- I'm not sure if the niotso format is used. It's more likely that it is indeed some custom format for the Xbox One. Still, most of the info here applies in either case.
- Well, a custom LZ77 it is then. Not awfully surprising. Initially I thought the binary XML used
- by the game was some already existing format and spent hours to find nothing (of course).
- So let's not waste any time and get going.
- It might be a good idea to consider the distance between one opcode and the next, i.e. the distance between the positions after LevelReporting and after Built.
- Offset after LevelReporting: 0x93
- Offset after Built: 0xA9
- Difference: 0x16, could be up to 4 bytes lower because I'm not sure if it starts counting before or after the opcode. The value may also be constantly shifted by a small value.
- It's still not conclusive.
- Note that the 1b in 1b00f103 is probably part of a 2 byte sequence, 1b00, written in little endian (i.e. 001b in big endian).
- Anyway, go to the start of the file after the header. There are two bytes before the ebx magic, f102, while other times just one byte is enough.
- This indicates a varint, so the first byte has a bit to indicate that the number has reached its end or if the next byte is part of the number too.
- There are many other ebx files having two bytes before the magic and the first byte being f1, f2, etc.
- As a rough estimate then, when the first half of the first byte is f, then another byte follows.
- One half of a byte is 4 bits, so only the 4 remaining bits actually contain information about the number.
- Anyway, check this theory by adjusting the script. As the header structure is already known the script could read every block and not just the first one.
- However, the first block is known to contain the ebx magic which can be used as a landmark, so it's not necessary yet to implement that.
- import os
- from struct import unpack,pack #convert a sequence of bytes into ints or floats
- #create some sets, they can contain every element only once; perfect for this kind of analysis
- oneset=set()
- twoset1=set()
- twoset2=set()
- #go through all files in the ebx folder
- for dir0, dirs, ff in os.walk("ebx"):
- for fnames in ff:
- fname=dir0+"\\"+fnames
- f=open(fname,"rb")
- #grab the header
- decompressedSize=unpack(">I",f.read(4)) #read big endian unsigned int
- constant=f.read(2) #0970
- compressedSize=unpack(">H",f.read(2)) #read big endian unsigned half int
- #check the bytes after the header, using the ebx magic as a reference
- ebxMagic="\xce\xd1\xb2\x0f"
- sample = f.read(10) #read 10 bytes, even the smallest file is several times larger than that
- #now find the position of the magic in it and then analyze the bytes before it
- magicPos=sample.find(ebxMagic)
- if magicPos==-1: asdf #could not find the magic at all
- elif magicPos>2: fdsa #more than 2 bytes before the magic
- if len(sample)<2: tooshorttoanalyze
- if magicPos==1:
- oneset.add(ord(sample[0])) #this set will contain all possible bytes that appear when there is only one byte before the magic
- if magicPos==2:
- twoset1.add(ord(sample[0]))
- twoset2.add(ord(sample[1]))
- print oneset
- print twoset1
- print twoset2
- results in:
- set([224, 192, 162, 161, 209, 194, 144, 177, 128])
- set([240, 241, 242, 243, 244])
- set([2, 4, 40, 41, 42, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 10])
- Later on it might be useful to collect the specific combinations of the 2 bytes that are possible. For now however, there are some useful results already:
- There are always either one or two bytes before the magic.
- When there are two bytes before the magic, the first byte can be 240 to 244, which is 0xf0 to 0xf4.
- When there is only one byte the values range from 0x80 to 0xe0.
- Now comes the time to directly compare between small alpha and beta files that remain unchanged.
- As a rough indicator for that, use the two ints after the magic and make sure they are the same. As they appear so early in the file and the compression
- can only look behind to copy stuff from there, and not into the future, these ints should always be written out.
- levellistreport.ebx and many others fail the test.
- import os
- from struct import unpack,pack
- #go through all files in the ebx folder
- for dir0, dirs, ff in os.walk("ebx"):
- for fnames in ff:
- fname=dir0+"\\"+fnames
- f=open(fname,"rb")
- #grab the header
- decompressedSize=unpack(">I",f.read(4))
- constant=f.read(2) #0970
- compressedSize=unpack(">H",f.read(2))
- #check the bytes after the header, using the ebx magic as a reference
- ebxMagic="\xce\xd1\xb2\x0f"
- sample = f.read(10)
- magicPos=sample.find(ebxMagic)
- f.seek(-10+magicPos,1) #move back to where the ebx magic starts, then grab 12 bytes and compare to alpha
- betabytes=f.read(12)
- try: f2=open("D:/hexing/bf4 alpha dump/bundles/"+dir0+"/"+fnames,"rb")
- except: continue #some files do not exist in the alpha
- alphabytes=f2.read(12) #not compressed, so no header or other trouble
- if alphabytes==betabytes:
- print fname
- which gives back a whopping two files that satisfy the condition:
- ebx\sound\mixers\impairedhearing_soundstate_mixer.ebx
- ebx\sound\mixers\mandown_soundstate_mixer.ebx
- Both of these still differ directly after those 12 bytes, so ignore them.
- Keep working with levellistreport.ebx instead, trying to match the metadata:
- from my ebx script (these 11 ints appear directly after the ebx magic, they are little endian):
- class Header:
- def __init__(self,varList): ##all 4byte unsigned integers
- self.absStringOffset = varList[0] ## absolute offset for string section start
- self.lenStringToEOF = varList[1] ## length from string section start to EOF
- self.numGUID = varList[2] ## number of external GUIDs
- self.null = varList[3] ## 00000000
- self.numInstanceRepeater = varList[4]
- self.numComplex = varList[5] ## number of complex entries
- self.numField = varList[6] ## number of field entries
- self.lenName = varList[7] ## length of name section including padding
- self.lenString = varList[8] ## length of string section including padding
- self.numArrayRepeater = varList[9]
- self.lenPayload = varList[10] ## length of normal payload section; the start of the array payload section is absStringOffset+lenString+lenPayload
- alpha (from here on, all nums in hex even without 0x prefix):
- self.absStringOffset = 180
- self.lenStringToEOF = 50
- self.numGUID = 1
- self.null = null, lol
- self.numInstanceRepeater = 1
- self.numComplex = 4
- self.numField = 5
- self.lenName = 50
- self.lenString = 10
- self.numArrayRepeater = 2
- self.lenPayload = 30
- beta (all nums in hex; with some guessing involved):
- self.absStringOffset = 190
- self.lenStringToEOF = 40
- self.numGUID = 2
- self.null = null, maybe? It's compressed; it seems that it was removed
- self.numInstanceRepeater = 1
- self.numComplex = 4
- self.numField = 5
- self.lenName = 50
- self.lenString = 10
- self.numArrayRepeater = maybe 6? unlikely
- self.lenPayload = 20
- The null part is the first occurrence of compression I think.
- The file starts with f102, then reads 10 bytes, and then compresses the nulls with
- 01020073? Or the null was removed from the header altogether. Skim through some larger files
- as larger files mean larger values in the metadata so they aren't compressed as easily.
- Consider the beta materialgrid:
- CED1B20F 70250200 10D20400 BF000000
- CD05CD05 10002000 5B007006 30000000
- A9280000 A012010063
- self.absStringOffset = 22570
- self.lenStringToEOF = 4d210
- self.numGUID = bf
- self.null = null, maybe?
- self.numInstanceRepeater = 210?
- self.numComplex = 75b?
- self.numField = 306?
- self.lenName = ?
- self.lenString = ??
- self.numArrayRepeater = ???
- self.lenPayload = ????
- Just make sure that the header indeed remains the same, and figure out if the null entry is still there or not.
- The first two entries were already checked. Move on to numGUID.
- get length of the guid section. It is made up of guid pairs with a length of 20 each.
- The guids are random bytes and thus should be almost impossible to compress.
- The size is roughly 17f5, starting @32 and ending before the section containing the
- string keywords.
- 17f5/20 = bf, good
- The keyword section (which I called Names for some reason) starts @1827 and has a size of about 3bd
- The next sections in the file are:
- fieldDescriptors #10 (i.e. sixteen) bytes long, the 9. byte is pretty much always null
- complexDescriptors #look the same as fieldDescriptors, but the byte is not null
- instanceRepeaters #consist of three ints, the first int used to be always null
- arrayRepeaters #look the same as instanceRepeaters, but the first int is not null
- These characteristics can be used to identify the length of the sections and thereby the number of entries.
- fieldDescriptors and complexDescriptors: the section for both is 500 bytes
- instanceRepeaters: about d80 bytes; d80/c = 120
- arrayRepeaters: about a788 bytes; a788/c = df6
- The length of the string section is about 30 bytes compressed.
- The length of the non-array payload after that section c080.
- Can't figure out anything, return to a simpler file.
- Analyze the keyword sections of various files to get an idea of how the offset works.
- More files = more accurate results.
- levellayerinclusion:
- alpha:
- DataContainer.Asset.$.Name.SubWorldInclusion.array.member.Criteria.WorldPartInclusion.SubWorldInclusionCriterion.Options.WorldPartInclusionCriterion
- beta:
- DataContainer.Asset.$.Name.SubWorldInclusion.array.member.Criteria.%.FPart)..;..-..on.Options6
- in detail:
- WorldPartInclusion.SubWorldInclusionCriterion
- vs
- 250046 Part 29000D 3B0003 2D00AF on
- moving 25 backwards gives back the offset of WorldInclusion
- 46 means to copy 5 bytes, then read Part as uncompressed data and then read
- another opcode.
- So 46 => copy 5 and proceed by 4?
- With the 29 byte afterwards I end up at the ldInclusion, two bytes too early.
- At least 2d00af refers back to Inclusion. But it should be SubWorldInclusion... meh
- uiawardsoverlaylogic:
- alpha:
- DataContainer.GameDataContainer.$.DataBusPeer.Flags.
- beta:
- DataContainer.Game.. $....BusPeer.Flags&.`
- in detail:
- GameDataContainer.$.DataBusPeer.Flags.
- vs
- Game 120020 $ 00 1000D1 BusPeer.Flags 260060
- 120020: move back by 12 and copy 0e bytes, then proceed by 2 bytes.
- 1000d1: move back by 10 and copy 4, then proceed 0d.
- Ah, so it does it sequentially.
- It first converts Game 120020 to GameDataContainer.
- then when it reaches the next code it uses this replacement already,
- so when it moves back by 10 bytes it actually arrives it the Data from
- GameDataContainer, not the normal DataContainer. This is then copied again, etc.
- It only makes sense, it would be odd if it copied opcodes too. Silly me.
- 20 in binary: 00100000, copy 0e, proceed 2
- d1 in binary: 11010001, copy 4, proceed 0d
- Manually analyze a whole lot more, I'll only jot down the results; long keyword sections are ideal for this:
- using sound/master.ebx. Made a copy of the compressed file. After every step I figured out I replaced
- the part with the uncompressed data so the subsequent compressed parts make sense.
- 0F0057, move 0f, copy 5, proceed 5
- 370057, move 37, copy 0b, proceed 5
- Well isn't that interesting. The same 57 is used for two different things.
- This might be an indicator that there is some obfuscation. E.g. it could be implemented
- that the program adds the current offset in the file to some number to obtain 57
- when compressing.
- So I would need to subtract that offset before analyzing the number. That of course
- requires an even greater number of samples, with their offset documented.
- Let's ignore that for a moment and go on.
- 100040, move 10, copy 0b, proceed 4
- 630091, move 63, copy 4, proceed 9
- 0D00F205, move 0d, copy 5, proceed 14
- Note that the keywords are ordered slightly differently, which hopefully explains how
- the same code meant two different things above. The words all sound the same here,
- so it's hard to recognize a different order. Go on anyway, keeping that in mind.
- 220002, move 22, copy 6, proceed 0 (the next opcode is directly after this one)
- 7C00F306, move 7c, copy 6, proceed 15
- 6C0002, move 6c, copy 7, proceed 0. How is that even possible? Is this some sort of ruse?
- The first two bytes seem pretty reliable, don't type out how far it moves.
- 1D0002, copy 6, proceed 0
- 0C0062, copy 6, proceed 6
- 120001, copy 6, proceed 0
- 550004, copy 4, proceed 0
- 6E0006, copy 9 or 8, proceed 0
- C80000, copy 0a or 9, proceed 0
- 8100F101, copy 4, proceed 10
- Not conclusive at all.
- However, it seems that that rule about f0 to f4 is true even later in the file.
- So identifying compressed parts is not that difficult. Look out for 1 byte with a value
- (move back by this amount), then one byte is almost always null because the last time the
- string appeared is usually closer. At least, when a file has just ff bytes this always works.
- As there's not much point in looking at compressed parts later in the file anyway
- unless either the previous parts have been decompressed or I can directly compare against
- an alpha file (even then, the keyword section comes pretty early and is the most useful),
- just assume it is null. When the first half of the final byte is f, then make sure
- to look one byte ahead, if that byte is extremely low, these 4 bytes form
- one compressed unit. If the first half is not f it's a bit harder.
- Just make sure to look back to see if the position to copy from actually makes sense.
- Take another look at levellistreport.ebx with the goal of fully decompressing it:
- decompressed size: 1d0
- Starts with F102, then some bytes.
- The first guid D6076D4B4DF8DD11BE32C64EACA26B06 is the same in the alpha.
- In particular, the file guid is very similar to the instance guid, so
- parts are compressed the second time.
- Likewise, the first guid of the second guid pair, A4E429350D405687DE5E6EFF3347F7ED
- is similar to the second part of the pair.
- I can't really figure that part out yet; moving on to the keyword section which
- shouldn't be too hard to decompress.
- 1B00F103, copy 6, proceed 12
- 260013, copy 5, proceed 0
- It's the end of the keyword section, so it might not seem trivial that it proceeds 0.
- However, every string has to end with a nullbyte which has not appeared yet.
- Furthermore, the keyword section is padded with nullbytes to a multiple of 10 (sixteen). So
- while it may be possible that the section needs no padding by chance (or only 2-3 bytes),
- it's very likely that the next upcode fills in the nulls.
- Indeed, as the next section starts with 81b5 (the first two bytes of the hash of DataContainer)
- there is some padding here.
- 8E0040, copy some nulls, proceed 4
- In the alpha (and in bf3) the first fieldDescriptor
- is just the hash and lots of nulls: 81B50200000000000000000000000000
- In the beta, knowing that the next fieldDescriptor starts with the hash 82D8827C,
- the entire entry reads: 81B5 C60030000008050092000000
- This is fundamentally different. Even assuming that C60030 is an opcode,
- there is definitely a non-null byte later on. It would have been too easy anyway.
- So, with every descriptor starting with a hash (which is hard to compress) and every
- descriptor having a fixed 16 bytes size (hopefully), try to count the entries to get
- an idea of the final size to check if the ebx file header remains correct.
- Never mind, that's impossible.
- Fix the string later on, LevelListReport
- CF0042 should equal Level; copy 5, proceed 4
- F90020 shoudl equal Report; copy 6,
- current minimum distance between CF0042 and Level: 8f, so there are 40 bytes missing
- current distance between F90020 and Report (assuming that Level was replaced by now):
- b9, so there are 40 bytes missing here too. So there truly are 40 bytes missing.
- This means that the difference between the keywords and the strings section
- is a multiple of 10. Hopefully that means that the padding remains there as it used to.
- The keyword section should be 50 bytes long then.
- That makes 8E00400000 become 9 nulls, so basically 8E0040 is 7 nulls.
- So from CF0042, move backwards until Level is reached. That distance is actually cf.
- Now just add the size of the keywords before that: 41 bytes
- => 110 bytes for keywords + descriptors/repeaters.
- Assume that keywords need 50 bytes (they are padded at the end).
- That leaves c0 for the descriptors/repeaters
- So what's the size of all metadata:
- Meta size (as given by the header):
- 190
- Meta size (by summing up the parts):
- 30? for the header itself
- +60 for the guids (2 external guid pairs, one file guid pair, each 20 bytes)
- +50 for the keywords
- +c0 for the descriptors/repeaters
- = 1a0
- I suspect either the header lost 10 bytes or one half of a guid pair has been dropped.
- Moving on to the payload itself. The payload always starts with a 10 byte guid, and more
- guids appear later on in the file. These correspond to instances in the xml file.
- What's interesting to note is that the guid section at the top contains (among others)
- the guid of the primary instance. This guid must appear once in the payload, written out
- exactly like at the top. In the case of levellistreport.ebx, there is only one instance in
- total. Which means that the guid that must appear is known (from the alpha):
- D7076D4B4DFEDD11A232C64E4C926B06
- It's interesting to note that it is still written out for the most part at the bottom,
- while it is somewhere at the top (in compressed form). This indicates that the window
- was too small so the compressor did not see the guid at the top anymore.
- Or that indeed this is the half of a guid pair that was dropped.
- Got to properly decompress before dealing with that.
- Assume that the offset in the file is indeed relevant. Have a script sort files by their very
- first proceed-number, right before the ebx magic. Then manually analyze them. Would be good
- to find files with the number just varying by 1 and with the number to copy varying by 1 too.
- Of course that requires still a lot of understand of the header which I don't really have.
- Will try anyway.
- import os
- from struct import unpack,pack #convert a sequence of bytes into ints or floats
- from binascii import hexlify #converts several bytes into a string of their hex representation,
- #e.g. "\x00\xab"=>"00ab": and similarly "doc"=>"646f63"
- #utility function, by default Python gives back an error when trying to create a file in a nonexistent folder
- #this creates the folder and then the file; requires another function for long pathnames
- def open2(path,mode="rb"):
- if mode=="wb":
- #create folders if necessary and return the file handle
- #first of all, create one folder level manully because makedirs might fail
- path=path.replace("/","\\")
- pathParts=path.split("\\")
- manualPart="\\".join(pathParts[:2])
- if not os.path.isdir(manualPart):
- os.makedirs(manualPart)
- #now handle the rest, including extra long path names
- folderPath=lp(os.path.dirname(path))
- if not os.path.isdir(folderPath): os.makedirs(folderPath)
- return open(lp(path),mode)
- def lp(path): #long pathnames
- if len(path)<=247 or path=="" or path[:4]=='\\\\?\\': return path
- return unicode('\\\\?\\' + os.path.normpath(path))
- #go through all files in the ebx folder
- for dir0, dirs, ff in os.walk("ebx"):
- for fnames in ff:
- fname=dir0+"\\"+fnames
- f=open(fname,"rb")
- #grab the header
- decompressedSize=unpack(">I",f.read(4)) #read big endian unsigned int
- constant=f.read(2) #0970
- compressedSize=unpack(">H",f.read(2)) #read big endian unsigned half int
- #check the bytes after the header, using the ebx magic as a reference
- ebxMagic="\xce\xd1\xb2\x0f"
- sample = f.read(10)
- magicPos=sample.find(ebxMagic)
- f.seek(-10,1) #move back to where the start of the number I want, then grab it to order the files
- proceedNum=hexlify(f.read(magicPos))
- f2=open2("D:/hexing/bf4 beta dump sorted/bundles/"+proceedNum+"/"+fnames,"wb")
- f.seek(0)
- f2.write(f.read()) #copy the entire file to f2
- f2.close()
- #this overwrites if files in different beta folders happen
- #to have the same name, but there should be enough files anyway
- Beautiful: http://i.imgur.com/8ntBZPv.jpg
- Well, that was easy:
- type 80:
- proceed 8 bytes in all 6 files of that type.
- In those files the metadata has the same size as the payload, so the
- payload integer is compressed already.
- Because I know from the decompressed size exactly that this is the case,
- and it's extremely unlikely that after one compressed integer comes another
- number that happens to look the same (so the same number would be copied twice),
- I can assume that the number of bytes to copy is indeed exactly 4.
- There are two different opcodes after proceeding 8 (absolute offset: 11; @11):
- 040020, copy 4, proceed 2
- 040051, copy 4, proceed 5
- type 90:
- There is just one file; as you might imagine this proceeds 9 bytes.
- Which is really odd. The last digit/byte of the integer happens to be
- the same (remember that it's little endian so the last byte is on the left) and the
- compressor saw an option to optimize this.
- @12: 040001, copy 4?, proceed 3
- type a1:
- proceed 0a bytes.
- type a2:
- same wtf
- type b1:
- proceed 0b
- Can't really compare like I'd like to if I don't know the header structure,
- on the other hand how do I figure out the header structure if it is compressed.
- Well, simple, try to get the header structure by looking at the longer types,
- e.g. f429 probably proceeds past the first 20-30 bytes:
- 12gflechette_bpb.ebx header structure:
- absStringOffset = 4bytes
- lenStringToEOF = 4bytes
- numGUID = 4bytes
- 2bytes, numInstanceRepeater?
- 2bytes, numComplex?
- 2bytes, numField?
- 2bytes, 19
- 2bytes, 58
- 2bytes, size of keyword section?
- 4bytes, 40
- 4bytes, 2a
- 4bytes, 20f0, slightly smaller than lenStringToEOF (2320)
- Try another file to fill in the gaps, bd_buildingskyscrapermatteyellow_top_01.ebx
- has a corresponding alpha file too in contrast to 12gflechette_bpb:
- alpha (all fields are all 4 bytes long):
- self.absStringOffset = 1f40
- self.lenStringToEOF = 510
- self.numGUID = 1
- self.null = null
- self.numInstanceRepeater = 8
- self.numComplex = 36
- self.numField = c8
- self.lenName = e30
- self.lenString = 70
- self.numArrayRepeater = 8
- self.lenPayload = 430
- beta (2 bytes or 4 bytes length):
- self.absStringOffset = 1e40
- self.lenStringToEOF = 430
- self.numGUID = 2
- 2bytes, 08, probably numInstanceRepeater
- 2bytes, 03, no idea
- 2bytes, 08, probably numArrayRepeater
- 2bytes, 34, probably numComplex
- 2bytes, c2, probably numField
- 2bytes, de0, probably size of keyword section
- 4bytes, 40, probably size of string section
- 4bytes, 07, no idea
- 4bytes, 380, probably size of payload (without arrays)
- and the guids are directly after this header
- in particular, the first half of the first guid pair (the file guid) is still there, unchanged
- the second pair is replaced with nulls or something, which is then of course compressed
- This is too confusing as it is, have the script cut off the compression header so it is easier
- to measure the right distances. While I'm at it, replace the ebx magic with the decompressed size.
- That way I can calculate the payload size even when compressed.
- import os
- from struct import unpack,pack #convert a sequence of bytes into ints or floats
- from binascii import hexlify #converts several bytes into a string of their hex representation,
- #e.g. "\x00\xab"=>"00ab": and similarly "doc"=>"646f63"
- #utility function, by default Python gives back an error when trying to create a file in a nonexistent folder
- #this creates the folder and then the file; requires another function for long pathnames
- def open2(path,mode="rb"):
- if mode=="wb":
- #create folders if necessary and return the file handle
- #first of all, create one folder level manully because makedirs might fail
- path=path.replace("/","\\")
- pathParts=path.split("\\")
- manualPart="\\".join(pathParts[:2])
- if not os.path.isdir(manualPart):
- os.makedirs(manualPart)
- #now handle the rest, including extra long path names
- folderPath=lp(os.path.dirname(path))
- if not os.path.isdir(folderPath): os.makedirs(folderPath)
- return open(lp(path),mode)
- def lp(path): #long pathnames
- if len(path)<=247 or path=="" or path[:4]=='\\\\?\\': return path
- return unicode('\\\\?\\' + os.path.normpath(path))
- #go through all files in the ebx folder
- for dir0, dirs, ff in os.walk("ebx"):
- for fnames in ff:
- fname=dir0+"\\"+fnames
- f=open(fname,"rb")
- #grab the header
- #totally forgot about indexing, need to take element 0; peculiarity of the unpack libary.
- decompressedSize=unpack(">I",f.read(4))[0] #read big endian unsigned int
- constant=f.read(2) #0970
- compressedSize=unpack(">H",f.read(2))[0] #read big endian unsigned half int
- #check the bytes after the header, using the ebx magic as a reference
- ebxMagic="\xce\xd1\xb2\x0f"
- sample = f.read(10)
- magicPos=sample.find(ebxMagic)
- f.seek(-10,1) #move back to where the start of the number I want, then grab it to order the files
- proceedNum=hexlify(f.read(magicPos))
- f2=open2("D:/hexing/bf4 beta dump sorted/bundles/"+proceedNum+"/"+fnames,"wb")
- f.seek(4,1) #don't go back to the start, instead move 4 bytes too (past the ebx magic)
- f2.write(pack("I",decompressedSize)) #write as little endian so it is easier to read when next to the other LE stuff
- f2.write(f.read()) #copy the entire file to f2
- f2.close()
- #this overwrites if files in different beta folders happen
- #to have the same name, but there should be enough files anyway
- As a reference, a valid ebx header of a f429 file (without the compression header):
- CED1B20F 000F0000 20230000 01000000
- 0400 0200 0400 1900 5800 8005 40000000
- 2A000000 F0200000
- give them some letters
- a b c d
- e f g h i j k
- l m
- a) ebx magic
- b) meta size
- c) payload size (meta + payload = file size)
- d) number of external guid pairs (each pair is 20 bytes in total;
- and there is one internal guid for the file itself)
- e) numInstanceRepeater
- f) ?
- g) numArrayRepeater
- h) numComplex
- i) numField
- j) keyword section size
- k) string section size
- l) ?
- m) payload size without arrays
- => 8 bytes less compared to the bf3 ebx header
- (a1 type) fontcollection_zh_fontmapcollectionwin32:
- proceed 0a
- 70030000 70020000 0001 030021 00010200F314
- 030021 must be a substitute for something that's 4 bytes or longer.
- It refers to three bytes in the past. So at least one byte must be copied twice.
- I.e. 030021 refers to 000001 + at least 00 (and would then continue 0001, 000001, 000001)
- 70030000 70020000 00010000 0100 (,0001,000001,...) 0001020200F314
- Ignore the first three parts, so the next entry is the number of ext guids
- 0100 (,0001,000001,...) 0001020200F314
- the number of ext guids is obviously very small
- so 01000001 does not work, because it is a gigantic number (16777217 in decimal)
- it must be 010000, then another null is added from the byte afterward
- 01000000 01 0200F314
- Therefore:
- a1: proceed 0a
- 030021: move 3, copy 5, proceed 1
- (a2 type) commanderkit:
- proceed 0a too
- fuck, fuck, fuck
- There are 11576 files with a2, and just 125 with a1.
- So it's at least not seeded by the filename or crap like that.
- A01C0000 E01B0000 C000 010011 02 0200F213
- 010011, move 1, copy 4 or more, proceed 1
- This file has no external guids so decompressed it is
- A01C0000 E01B0000 C0000000 00000000 02 0200F213
- i.e. 010011, move 1, copy 6, proceed 1
- So basically these 11576 files are all files with no external guid.
- c0 type:
- proceed 0c
- lav25_mesh:
- 070011, copy 4, proceed 1
- layer9_homebase_ch:
- 0700F31A, copy 4?, proceed?
- no idea
- proceeds:
- f011: proceed 20
- f012: proceed 21
- f015: proceed 24
- f016: proceed 25
- f004: proceed 13
- by that logic, f000: proceed 0f
- or more generally:
- f0xy: proceed f + xy
- and any values below that can be written directly as 80, e0, etc. to proceed 8, e
- yup, indeed varint.
- f429: 38
- f42a: 39
- So the second half of the first byte is not relevant for the proceed, it must have
- another purpose!
- Oh man. This looks promising.
- When an entry says 0100F23C, it means:
- move 1, copy f(2) with some function f yet to be determined, proceed f+3c=4b
- yup, works correctly along several entries.
- Still got to figure out the function. And why does the very first opcode not always have 0.
- The most reasonable guess is to say f(x)=x+4, because it takes an opcode is 3 bytes at least.
- The first opcode probably contains the bytes to copy for the next opcode.
- Err... I had
- a1, 030021: move 3, copy 5, proceed 1
- a2, 010011, move 1, copy 6, proceed 1
- So the 1 in a1 means copy 5, the 2 in a2 means copy 6.
- So it was the typical "these elements are placed together so they belong together" syndrome; fool me twice :(
- Yup, have tested a small sample and it works. That fully explains it, well almost. The question
- is what happens when the first byte is ff. I assume it then extends over three bytes or something like that.
- It's not important yet to figure out the details, just make sure the script will
- give an error when that happens so I can look into it.
- Write a script to decompress the all blocks of each file. Need to see what happens at ff.
- Also tidy up the script a lot. Decompressing the files until an error occurs. Don't create
- the decompressed files yet; only when an error occurs, create a debug file to get the
- decompressed data until the error.
- Script with some improvements made due to errors (that will be stated shortly):
- import os
- from struct import unpack,pack #convert a sequence of bytes into ints or floats
- from binascii import hexlify #converts several bytes into a string of their hex representation,
- from cStringIO import StringIO #create something that has the same functions as a file but is in memory only
- def readNum(f): #when byte is ff, read one more byte until not ff, add all
- total=0
- while 1:
- byte=ord(f.read(1))
- total+=byte
- if byte!=0xff: return total
- #go through all files in the ebx folder
- for dir0, dirs, ff in os.walk("ebx"):
- for fname in ff:
- f=open(dir0+"\\"+fname,"rb")
- #cheap way to get the end-of-file, i.e. the size of the file
- f.seek(0,2)
- EOF=f.tell()
- f.seek(0)
- decompressedStream=StringIO() #write the decompressed data into memory only
- while f.tell()<EOF:
- #grab the header of a compressed block
- decompressedSize, constant, compressedSize = unpack(">IHH",f.read(8)) #const 0970
- blockOffset=f.tell()
- if constant!=0x970: print "Block header constant is not 970, the script may or may not work correctly."
- #go from one opcode to the next and write the decompressed data into the stream until the block is done
- while f.tell()-blockOffset<compressedSize:
- #read the length byte as a number, then split it in two numbers (from 0 to f each) with bitmasking and shifting
- lengthByte=ord(f.read(1)) #e.g. 9e
- proceedSize=lengthByte>>4 #=> 09
- copySize =lengthByte&0xf #=> 0e
- #the revised version to deal with larger numbers; the original one is in the comment below
- if proceedSize==0xf:
- proceedSize+=readNum(f)
- ## #add the next byte to the proceedSize if the half-byte is f.
- ## #Raise some errors if something new pops up so I can investigate.
- ## if proceedSize==0xf:
- ## nextByte=ord(f.read(1))
- ## if nextByte==0xff: #the byte behaved normally when reaching a number larger than 128, so now check ff
- ## print dir0, fname, f.tell()
- ## f2=open("debug","wb")
- ## f2.write(decompressedStream.getvalue())
- ## f2.close()
- ## proceedSizeIsFF
- ## else: proceedSize+=nextByte
- #### if copySize==0xf:
- #### print dir0, fname, f.tell()
- #### f2=open("debug","wb")
- #### f2.write(decompressedStream.getvalue())
- #### f2.close()
- #### copySizeIsF
- decompressedStream.write(f.read(proceedSize))
- pos0=decompressedStream.tell()
- #this check was added later on, the very last bytes in the block are not compressed so there is no offset to read
- if f.tell()-blockOffset==compressedSize:
- break
- offset=unpack("H",f.read(2))[0]
- #the revised version to deal with larger numbers; the original one is in the comment below
- if copySize==0xf:
- copySize+=readNum(f)
- ## #might be a varint. Not sure if this case happens even once
- ## if copySize==0xf:
- ## print "#########"
- ## copySummand=ord(f.read(1))
- #### print copySummand, dir0, fname, f.tell()
- ## if copySummand>>7: #what happens if the first bit is set
- ## print dir0, fname, f.tell()
- ## f2=open("debug","wb")
- ## f2.write(decompressedStream.getvalue())
- ## f2.close()
- ## asdfasdf
- ## copySize+=copySummand
- copySize+=4
- decompressedStream.seek(-offset,1) #go back to copy the data
- #make several copies if necessary
- if offset<copySize:
- times=copySize/offset
- rest=copySize%offset
- copy=decompressedStream.read(copySize)
- decompressedStream.seek(pos0)
- for i in xrange(times): decompressedStream.write(copy)
- decompressedStream.write(copy[:rest])
- else:
- copy=decompressedStream.read(copySize)
- decompressedStream.seek(pos0)
- decompressedStream.write(copy)
- f.close()
- Errors encountered:
- ebx crowsweaponhudlogic.ebx 373
- NameError: name 'copySizeIsF' is not defined
- i.e. at offset 373 (decimal) in that file in the main ebx folder, an entry has copysize f.
- So what happens in that case?
- the most recent bytes:
- FieldAccessType.FieldAccessType_Source.FieldAccessType_
- the problematic expression:
- 6f Target 2E0004 And
- which the alpha files say become \00FieldAccessType_SourceAnd (note that the Target here belongs to
- the FieldAccessType_Target after removing 6f (opcodes are not decompressed after all).
- Therefore, 17 bytes are needed to produce \x00FieldAccessType_Source
- And the info is to proceed 6 bytes (past Target), and copy f (whatever that means) with an
- offset of 2e.
- Alright, assume the same model as before, with copy = f + 4 = 13.
- So it seems that it has to a single bye is placed directly behind the offset which is
- then added to the number of bytes to copy.
- Therefore it copies f+4+4 = 17 bytes.
- So the proceed length is directly extended after the length byte, whereas the
- copy length is extended at the end, even after the offset.
- The next question is how the lengths are extended in detail. Either the extended bytes
- are some sort of varint (so the first bit specifies whether to read one more byte or not)
- or they just go from 0 to ff, with ff meaning that the next byte will be read too.
- Another error occured towards the end of the file.
- The very last few bytes in that file were not compressed and there was no offset given afterwards.
- Which makes means half a byte was wasted because it specified the number of bytes to copy,
- while nothing was copied at all.
- Anyway, don't try to read an offset then.
- Though I wonder what happens if the last few bytes in a file are indeed compressed.
- Next error (in another file, so apparently one file was fully decompressed already):
- The second byte of the proceed size had its first bit set (could have been variant).
- However when investigating the file behaved normally. So it should
- behave normally at least until it reaches ff (thus not varint).
- Next error:
- uiawardsoverlaylogic.ebx has its extra copy byte with its first bit set @8796 (decimal).
- Can't make any sense of it directly though because it's deep in the data.
- However,the copy byte is set to exactly ff, followed by fb. If it was a varint,
- the number would actually be way longer than the entire file in its
- alpha version. Therefore read the number fffb as ff + fb (with ff also indicating that
- the next byte, fb, is to be read). And apply that system to the proceed size too.
- As a function:
- def readNum(f): #when byte is ff, read one more byte until not ff, add all
- total=0
- while 1:
- byte=ord(f.read(1))
- total+=byte
- if byte!=0xff: return total
- And with that, the script can handle thousands of files.
- One file remains:
- ebx\ui\static\sharedicons.ebx
- Part of the file there give me back a warning about the block header not being 970.
- This file is also the largest beta ebx file there is, being 347 kb.
- The header of the second block: 00010000 00710000 3878
- It's in some odd section containing random ascii letters. Skipping 7100 does not
- get me to the start of the next section, however skipping 10000 does. So 00710000
- says that this block is not compressed (the 3878 is part of the payload already).
- Summarized, then:
- As of the beta of Battlefield 4, the ebx files (containing binary XML) are compressed with an LZ77 algorithm.
- A compressed file consists of several blocks, with no global metadata.
- The blocks are set to have a size of 0x010000 when decompressed, except for the last one which is usually smaller.
- Structure of a compressed block (big endian):
- 4 bytes: decompressed size (0x10000 or less)
- 2 bytes: compression type (0970 for LZ77, 0071 for uncompressed data)
- 2 bytes: compressed size (0000 for uncompressed data) of the payload (i.e. without the header)
- compressed payload
- Decompress each block and glue the decompressed parts together to obtain the file.
- The compression is an LZ77 variant. It requires 3 parameters:
- Copy offset: Move backwards by this amount of bytes and start copying a certain number of bytes following that position.
- Copy length: How many bytes to copy. If the length is larger than the offset, start at the offset again and copy the same values again.
- Proceed length: The number of bytes that were not compressed and can be read directly.
- Note that the offset is defined in regards to the already decompressed data which e.g. does not contain any compression metadata.
- The three values are split up however; while the copy length and proceed length are
- stated together in a single byte, before an uncompressed section, the relevant offset
- is given after the uncompressed section:
- Use the proceed length to read the uncompressed data, at which point you arrive at the start of the offset value.
- Read this value, then move to the offset and copy a number of bytes (given by copy length)
- to the decompressed data. Afterwards, the next copy and proceed length are given and the process starts anew.
- The offset has a constant size of 2 bytes, in little endian.
- The two lengths share the same byte. The first half of the byte belongs to the proceed length,
- whereas the second half belongs to the copy length.
- When the half-byte of the proceed length is f, then the length is extended by another byte,
- which is placed directly after the byte that contains both lengths. The value of that byte
- is added to the value of the proceed length (i.e. f). However, if the extra byte is ff, one more
- byte is read (and so on) and all values are added together.
- The copy length can be extended in the same manner. However, the possible extra bytes are
- located at the end, right after the offset.
- Additionally, a constant value of 4 is added to obtain the actual copy length.
- Finally, it is possible that a file ends without specifying an offset (as the last few bytes
- in the file were not compressed). The proceed length is not affected by that (and the copy
- length is of no relevance).
- As an example, consider the length byte B2:
- Proceed length: B
- Copy length: 2 + 4 = 6
- Another example, F23C:
- Proceed length: F + 3C = 4B
- Copy length: 2 + 4 = 6
- A full example (the whitespace is there to separate hex from ascii; it doesn't count):
- 0000001a 0970 0018 80 minimap. 0800 51 ature 0a00 40 mize
- Header:
- Decompressed size 1a
- LZ77 compression (due to 0970)
- Compressed size 18
- Payload:
- Compressed stream: 80 minimap. 0800 51 ature 0a00 40 mize
- Decompressed stream: *empty*
- The decompression is sequential, start with the left part:
- 80 minimap. 0800
- Read 8 uncompressed bytes into the decompressed stream.
- Decompressed stream: minimap.
- Move back by 8 bytes in the decompressed stream (to the start)
- and copy 4 bytes (mini) to the decompressed stream.
- Compressed stream: 51 ature 0a00 40 mize
- Decompressed stream: minimap.mini
- Perform the same step again:
- 51 ature 0a00
- Read 5 uncompressed bytes into the decompressed stream.
- Decompressed stream: minimap.miniature
- Move back by 0a bytes in the decompressed stream
- and copy 5 bytes (.mini) to the decompressed stream.
- Compressed stream: 40 mize
- Decompressed stream: minimap.miniature.mini
- Read 4 uncompressed bytes into the decompressed stream (with no offset specified).
- Decompressed stream: minimap.miniature.minimize
- Clean up the script. Have it create a new folder with all decompressed ebx
- to investigate the changes to the ebx format:
- import os
- from struct import unpack,pack
- from cStringIO import StringIO
- def open2(path,mode="rb"): #when used to write, create folders too
- if mode=="wb":
- #create folders if necessary and return the file handle
- #first of all, create one folder level manully because makedirs might fail
- path=path.replace("/","\\")
- pathParts=path.split("\\")
- manualPart="\\".join(pathParts[:2])
- if not os.path.isdir(manualPart):
- os.makedirs(manualPart)
- #now handle the rest, including extra long path names
- folderPath=lp(os.path.dirname(path))
- if not os.path.isdir(folderPath): os.makedirs(folderPath)
- return open(lp(path),mode)
- def lp(path): #long pathnames
- if len(path)<=247 or path=="" or path[:4]=='\\\\?\\': return path
- return unicode('\\\\?\\' + os.path.normpath(path))
- def readNum(f): #when byte is ff, read one more byte until not ff, add all
- total=0
- while 1:
- byte=ord(f.read(1))
- total+=byte
- if byte!=0xff: return total
- def decompressLZ77(f,fileSize=None):
- #takes a file handle, gives back a decompressed string
- #allow file size to be specified to work from within archives
- #if file size not specified, get it now (will only work correctly on single files)
- if fileSize==None:
- f.seek(0,2)
- fileSize=f.tell()
- f.seek(0)
- fileOffset=f.tell() #0 for single files, much greater for archives
- #write the decompressed data into memory only, eventually return it
- decompressedStream=StringIO()
- #go through each block, filling the decompressed stream with data
- while f.tell()-fileOffset<fileSize:
- #grab the header of a compressed block
- decompressedSize, compressionType, compressedSize = unpack(">IHH",f.read(8))
- if compressionType==0x71:
- decompressedStream.write(f.read(decompressedSize))
- continue
- elif compressionType!=0x970: print "Unknown compression type: "+str(compressionType)
- #from here on, LZ77
- #go from one opcode to the next and write the decompressed data into the stream until the block is done
- blockOffset=f.tell()
- while f.tell()-blockOffset<compressedSize:
- #retrieve the two sizes from a single byte
- lengthByte=ord(f.read(1)) #e.g. 9e
- proceedSize=lengthByte>>4 #=> 09
- copySize =lengthByte&0xf #=> 0e
- if proceedSize==0xf: proceedSize+=readNum(f)
- #add the uncompressed data to the stream
- decompressedStream.write(f.read(proceedSize))
- #it's possible that the very last bytes in the block are not compressed
- #so there is no offset to read; handle this case
- if f.tell()-blockOffset==compressedSize: break
- pos0=decompressedStream.tell() #data will be written to the end of the stream, so take note of it
- offset=unpack("H",f.read(2))[0]
- if copySize==0xf: copySize+=readNum(f)
- copySize+=4
- decompressedStream.seek(-offset,1) #go back to copy the data
- #make several copies if necessary
- if offset<copySize:
- times=copySize/offset
- rest=copySize%offset
- copy=decompressedStream.read(offset) #either read offset or copySize; read() will yield the same
- decompressedStream.seek(pos0)
- for i in xrange(times): decompressedStream.write(copy)
- decompressedStream.write(copy[:rest])
- else:
- copy=decompressedStream.read(copySize)
- decompressedStream.seek(pos0)
- decompressedStream.write(copy)
- return decompressedStream.getvalue()
- #go through all files in the ebx folder
- for dir0, dirs, ff in os.walk(r"D:\hexing\bf4 beta dump\bundles\ebx"):
- for fname in ff:
- fullPath=dir0+"\\"+fname
- writePath=fullPath.replace(r"bf4 beta dump\bundles\ebx","bf4 decompressed ebx")
- f=open(dir0+"\\"+fname,"rb")
- data=decompressLZ77(f)
- f.close()
- f2=open2(writePath,"wb")
- f2.write(data)
- f2.close()
- The files certainly look more tolerable now:
- Before: http://i.imgur.com/RwjMdgi.png
- After: http://i.imgur.com/xHkGjOD.jpg
- Supplement:
- While dealing with the new patched cas-enabled sbtoc I've stumbled upon two more compression types.
- Type 0070 is almost the same as type 0071, but for 0070 the compressed size equals the decompressed size
- whereas the compressed size is zero for type 0071.
- Another type is 0000, which only occurs when decompressed and compressed size are null too.
- Basically there are 8 nullbytes. In this case, return an empty string.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement