Advertisement
Guest User

Untitled

a guest
Nov 6th, 2013
1,040
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 58.67 KB | None | 0 0
  1. The ebx are obviously weakly compressed whereas they used to be uncompressed throughout bf3 and in the bf4 alpha. Compare the keyword sections:
  2.  
  3. levellistreport.ebx:
  4. alpha:
  5. DataContainer.Asset.$.Name.LevelReportingAsset.array.member.BuiltLevels
  6. beta:
  7. DataContainer.Asset.$.Name.LevelReporting....array.member.Built&..s.
  8.  
  9. In detail with the dots replaced with the actual hex:
  10. LevelReporting 1B00F103 array
  11. => 1B00F103 equals Asset\x00
  12.  
  13. It's just simple substitution stuff, try the common weak compressions. lzo or snappy, or maybe zlib on very low compression (or rather not lol, very unlikely)
  14.  
  15. Offset of Asset: 0x78 (0x78 refers to 78 in hex, i.e. 120 decimal)
  16. Offset of where Asset would start after LevelReporting (before 1b): 0x93
  17. Difference: 0x1b
  18.  
  19.  
  20. dataversion.ebx:
  21. alpha:
  22. DataContainer.Asset.$.Name.VersionData.disclaimer.Version.DateTime.BranchId.GameName
  23. beta:
  24. DataContainer.Asset.$.Name.Version"...disclaimer...:...eTime.BranchId.Game:..]
  25.  
  26. In detail:
  27. Version 2200B400 disclaimer
  28. => 2200B400 can mean either:
  29. 2200b4 (which somehow equals Data) and a nullbyte
  30. or 2200b400 on its own (which somehow equals Data\x00).
  31.  
  32. Note that levellistreport required 4 bytes too but already got the nullbyte from the Asset\x00 string.
  33.  
  34.  
  35. Offset of Data: 0x3f
  36. Offset of where Data would start after Version: 0x61
  37. Difference: 0x22
  38.  
  39. So it might be 2200B400 which means:
  40. move back 0x22 bytes and grab 5 bytes
  41. or it could be 2200B4 which means:
  42. move back 0x22 bytes and grab 4 bytes (with the final byte given directly)
  43.  
  44.  
  45.  
  46. snappy compression algorithm says:
  47. Copies are references back into previous decompressed data, telling
  48. the decompressor to reuse data it has previously decoded.
  49. They encode two values: The _offset_, saying how many bytes back
  50. from the current position to read, and the _length_, how many bytes
  51. to copy.
  52.  
  53. =>looks promising
  54.  
  55. snappy also mentions (zlib, LZO, LZF, FastLZ, and QuickLZ), might be worth checking out the various LZs
  56.  
  57. LZO says:
  58. LZO is a block compression algorithm - it compresses and decompresses
  59. a block of data. Block size must be the same for compression
  60. and decompression.
  61.  
  62. LZO compresses a block of data into matches (a sliding dictionary)
  63. and runs of non-matching literals.
  64.  
  65. This is basically the entire documentation regarding how it actually works, which is probably(?) similar to snappy (I don't understand the documentation).
  66.  
  67.  
  68.  
  69. Ignore the compressed parts for a moment and consider the header (the very first few bytes in the file).
  70. The payload is compressed and has a header of a few bytes placed in front of it.
  71. When compression is involved the header is pretty much guaranteed to contain both the decompressed size and the compressed size.
  72.  
  73. Compression header (i.e. before the actual compressed payload):
  74. 4 bytes: 0000 01d0, decompressed size?
  75. 2 bytes: 09 70??
  76. 2 bytes: 015b, size of everything after it till EOF (end-of-file) for small files
  77. 2 bytes: f102?
  78.  
  79. 0970 was the same in a couple of files, check for constancy of 0970 @offset 4:
  80. import os
  81.  
  82. #go through all files in the ebx folder
  83. for dir0, dirs, ff in os.walk("ebx"):
  84. for fnames in ff:
  85. fname=dir0+"\\"+fnames
  86. f=open(fname,"rb")
  87. f.read(4)
  88. if f.read(2)!="\x09\x70":
  89. print fname
  90.  
  91. NO HIT => CONSTANT FOR ALL BETA EBX
  92.  
  93. The sum of the 2 integers after the ebx magic (ced1b20f) is the (decompressed) file size. Usually the ints
  94. are untouched as the files are only weakly compressed. The sum of them is equal to the first 4 bytes in the file, confirming that this indeed is the decompressed size.
  95.  
  96.  
  97. Compression header in big endian (big endian is when a hex number is written in a normal way from left to right, little endian inverts the order of the bytes):
  98. 4 bytes: decompressed size
  99. 2 bytes: 0970
  100. 2 bytes: 015b, size of everything after it till EOF (for small files)
  101. 2 bytes: f102, possibly part of the compressed payload
  102.  
  103.  
  104.  
  105. What happens when is file too large for 2 byte size?
  106. Two possibilities:
  107. 1) Some varint stuff (with pairs of 2 bytes?), so when the first bit is 1, then read two more bytes. Rather unlikely, never seen any varints working with pairs of 2 bytes before.
  108. 2) Compressed in small blocks with max ffff bytes, one block after another. Could be that 0970 is the start of one package which would also align the start of the first section to a multiple of 4.
  109.  
  110. materialgrid contains 0970 eight times, spaced apart by the the block size => option 2.
  111. This is very similar to the fb2 zlib format (or maybe it is even zlib with low compression).
  112.  
  113. The last two bytes are really part of the payload and not part of the header. Some files only have one byte there before the ebx magic.
  114.  
  115. => The file consists of several blocks, with no global metadata.
  116. The blocks are set to have a size of 0x010000 when decompressed, except for the last one which is usually smaller.
  117.  
  118. Compressed block (big endian):
  119. 4 bytes: decompressed size (0x10000 or less)
  120. 2 bytes: 0970
  121. 2 bytes: compressed size
  122. compressed payload
  123.  
  124. Decompress each block and glue the decompressed parts together to obtain the file.
  125.  
  126.  
  127.  
  128.  
  129.  
  130.  
  131. Maybe it is zlib at weak compression, try compressing a string at the various compression levels (from 0 to 9) to get an idea what it looks like:
  132. import zlib
  133. from binascii import hexlify
  134. string="adgfasdfavasdfasdf00000000"
  135. for i in xrange(10):
  136. hexlify(zlib.compress(string,i))
  137.  
  138. '7801011a00e5ff6164676661736466617661736466617364663030303030303030858708c4'
  139. '78014b4c494f4b2c4e494b2c0393409601140000858708c4'
  140. '785e4b4c494f4b2c4e494b2c0393406c000500858708c4'
  141. '785e4b4c494f4b2c4e494b2c0393406c000500858708c4'
  142. '785e4b4c494f4b2c4e494b2c0393406c000500858708c4'
  143. '785e4b4c494f4b2c4e494b2c0393406c000500858708c4'
  144. '789c4b4c494f4b2c4e494b2c0393406c000500858708c4'
  145. '78da4b4c494f4b2c4e494b2c0393406c000500858708c4'
  146. '78da4b4c494f4b2c4e494b2c0393406c000500858708c4'
  147. '78da4b4c494f4b2c4e494b2c0393406c000500858708c4'
  148.  
  149. Nope, zlib starts with 78 (usually 78da because default compression is set pretty high). You may want to connect 78da with zlib in your mind. It's used in many archives.
  150.  
  151. Try to decompress anyway, run through all possible slices of a small file and see if it can decompress. If it cannot then the file is probably not zlib.
  152.  
  153. import zlib
  154. f=open("levellistreport.ebx","rb")
  155. size=355 #size of the file
  156. for i in xrange(size):
  157. for j in xrange(size):
  158. f.seek(i)
  159. data=f.read(j) #when j is greater than the number of the remaining bytes in the file,
  160. #it doesn't cause an error but just gives back everything till the end of the file
  161. try:
  162. data2=zlib.decompress(data) #try to decompress it (usually it will complain about the format being invalid
  163. if len(data2)!=0: #make sure that there's actually something there when decompressed
  164. print i,j,len(data2)
  165. except: continue
  166.  
  167. No output at all, so disregard zlib.
  168.  
  169. snappy (has only one compression level):
  170. Grabbed the libraries from http://www.lfd.uci.edu/~gohlke/pythonlibs/
  171.  
  172. Take the script from before and replace the "zlib" with "snappy" (Python is simple), yielding
  173.  
  174. 15 3 1
  175. 22 5 2
  176. 62 3 1
  177. 216 7 3
  178. 256 4 2
  179. 319 35 32
  180. 347 3 1
  181.  
  182.  
  183. So it only gives back small random segments out of it with a size of max 32 bytes. No snappy.
  184.  
  185. lzo:
  186. Exactly the same script as before but with lzo instead. Always fails. No lzo.
  187.  
  188.  
  189.  
  190.  
  191.  
  192. wiki mentions some lossless algorithms
  193.  
  194. Run-length encoding (RLE) – a simple scheme that provides good compression of data containing lots of runs of the same value.
  195. Lempel-Ziv 1978 (LZ78), Lempel-Ziv-Welch (LZW) – used by GIF images and compress among many other applications
  196. DEFLATE – used by gzip, ZIP (since version 2.0), and as part of the compression process of Portable Network Graphics (PNG), Point-to-Point Protocol (PPP), HTTP, SSH
  197. bzip2 – using the Burrows–Wheeler transform, this provides slower but higher compression than DEFLATE
  198. Lempel–Ziv–Markov chain algorithm (LZMA) – used by 7zip, xz, and other programs; higher compression than bzip2 as well as much faster decompression.
  199. Lempel–Ziv–Oberhumer (LZO) – designed for compression/decompression speed at the expense of compression ratios
  200. Statistical Lempel Ziv – a combination of statistical method and dictionary-based method; better compression ratio than using single method.
  201.  
  202. Run-length encoding (RLE) => nope, too simple
  203. Lempel-Ziv 1978 (LZ78), Lempel-Ziv-Welch (LZW) => probably LZ77; LZ78 and LZW are different again
  204. DEFLATE – Each block is preceded by a 3-bit header. => nope, there is no 3 bit header.
  205. bzip2 => unlikely (high compression)
  206. Lempel–Ziv–Markov chain algorithm (LZMA) => unlikely (even higher compression)
  207. Lempel–Ziv–Oberhumer (LZO) => nope, just tried it
  208. Statistical Lempel Ziv => very novel and definitely uncommon; nope
  209.  
  210.  
  211.  
  212. wiki on LZ77: In the implementation used for many games by Electronic Arts,[4] the size in bytes of a length-distance pair can be specified inside
  213. the first byte of the length-distance pair itself; depending on if the first byte begins with a 0, 10, 110, or 111 (when read in big-endian bit orientation),
  214. the length of the entire length-distance pair can be 1 to 4 bytes large.
  215.  
  216. [4]: http://wiki.niotso.org/QFS_compression (Niotso is a semi-collaborative effort to re-implement the engine used in The Sims Online.)
  217.  
  218.  
  219. Googling ea and LZ77 got me here http://www.vgleaks.com/world-exclusive-durangos-move-engines/
  220.  
  221. "The Xbox One (Durango) GPU includes a number of fixed-function accelerators. Move engines are one of them.
  222.  
  223. Xbox One (Durango) hardware has four move engines for fast direct memory access (DMA)
  224.  
  225. This accelerators are truly fixed-function, in the sense that their algorithms are embedded in hardware.
  226. They can usually be considered black boxes with no intermediate results that are visible to software.
  227. When used for their designed purpose, however, they can offload work from the rest of the system and obtain useful results at minimal cost."
  228.  
  229. The Xbox One has one move engine for encoding and one for decoding LZ77.
  230.  
  231. So, some LZ77 variant was probably chosen in preparation for the Xbox One. The Xbox finally gets rid of that proprietary XMA audio codec implemented via hardware
  232. that made it impossible for a long time for anyone to decode Xbox audio (until some russians managed to get a hold of some code IIRC), but now it has an
  233. LZ77 variant that is done via hardware and will most likely never be documented anywhere. meh.
  234.  
  235.  
  236.  
  237. Apply the info from niotso to the levellistreport.ebx:
  238. Recall that the string is: DataContainer.Asset.$.Name.LevelReporting....array.member.Built&..s.
  239.  
  240. Detail:
  241. LevelReporting 1B00F103 array
  242. => 1B00F103 equals Asset\x00
  243.  
  244. 1b00f103 is a 4 bytes opcode.
  245. in binary: 00011011 00000000 11110001 00000011
  246.  
  247. niotso says for the individual bits in a 4byte opcode:
  248. 110ORRPP OOOOOOOO OOOOOOOO RRRRRRRR
  249.  
  250. Note the O in the first byte. Looks like some attempt at obfuscation to me (would make more sense to have it on the right in the first bit close to the other Os).
  251.  
  252. O: Offset, move backwards by this amount of bytes and start copying a certain number of bytes following that position.
  253. R: Length, how many bytes to copy. If the length is larger than the offset, start at the offset again and copy the same values again.
  254. P: The engine needs a way to know if what it sees is data (that may happen to look similar to an opcode) and what is opcode. So this value
  255. tells the distance to the next opcode, with everything in between being ordinary uncompressed data. Proceed this distance.
  256. This can only be a value up to 3, which is far too small. Therefore it's possible to add just one more byte after the opcode to increase that distance.
  257.  
  258. As the offset 1b is in fact on the left in the code, it does not match the niotso 4byte opcode which requires the offset in the middle.
  259. Maybe it is a 3byte opcode with an extra byte. Will investigate this later.
  260.  
  261. What's more, the offset is the very first thing to appear, so the engine must have some idea how many bytes to read.
  262. I'm not sure if the niotso format is used. It's more likely that it is indeed some custom format for the Xbox One. Still, most of the info here applies in either case.
  263.  
  264. Well, a custom LZ77 it is then. Not awfully surprising. Initially I thought the binary XML used
  265. by the game was some already existing format and spent hours to find nothing (of course).
  266.  
  267. So let's not waste any time and get going.
  268.  
  269. It might be a good idea to consider the distance between one opcode and the next, i.e. the distance between the positions after LevelReporting and after Built.
  270.  
  271. Offset after LevelReporting: 0x93
  272. Offset after Built: 0xA9
  273. Difference: 0x16, could be up to 4 bytes lower because I'm not sure if it starts counting before or after the opcode. The value may also be constantly shifted by a small value.
  274.  
  275. It's still not conclusive.
  276.  
  277. Note that the 1b in 1b00f103 is probably part of a 2 byte sequence, 1b00, written in little endian (i.e. 001b in big endian).
  278.  
  279.  
  280.  
  281. Anyway, go to the start of the file after the header. There are two bytes before the ebx magic, f102, while other times just one byte is enough.
  282. This indicates a varint, so the first byte has a bit to indicate that the number has reached its end or if the next byte is part of the number too.
  283.  
  284. There are many other ebx files having two bytes before the magic and the first byte being f1, f2, etc.
  285. As a rough estimate then, when the first half of the first byte is f, then another byte follows.
  286. One half of a byte is 4 bits, so only the 4 remaining bits actually contain information about the number.
  287.  
  288. Anyway, check this theory by adjusting the script. As the header structure is already known the script could read every block and not just the first one.
  289. However, the first block is known to contain the ebx magic which can be used as a landmark, so it's not necessary yet to implement that.
  290.  
  291. import os
  292. from struct import unpack,pack #convert a sequence of bytes into ints or floats
  293.  
  294. #create some sets, they can contain every element only once; perfect for this kind of analysis
  295. oneset=set()
  296. twoset1=set()
  297. twoset2=set()
  298.  
  299. #go through all files in the ebx folder
  300. for dir0, dirs, ff in os.walk("ebx"):
  301. for fnames in ff:
  302. fname=dir0+"\\"+fnames
  303. f=open(fname,"rb")
  304.  
  305. #grab the header
  306. decompressedSize=unpack(">I",f.read(4)) #read big endian unsigned int
  307. constant=f.read(2) #0970
  308. compressedSize=unpack(">H",f.read(2)) #read big endian unsigned half int
  309.  
  310. #check the bytes after the header, using the ebx magic as a reference
  311. ebxMagic="\xce\xd1\xb2\x0f"
  312.  
  313. sample = f.read(10) #read 10 bytes, even the smallest file is several times larger than that
  314. #now find the position of the magic in it and then analyze the bytes before it
  315.  
  316. magicPos=sample.find(ebxMagic)
  317. if magicPos==-1: asdf #could not find the magic at all
  318. elif magicPos>2: fdsa #more than 2 bytes before the magic
  319.  
  320. if len(sample)<2: tooshorttoanalyze
  321.  
  322. if magicPos==1:
  323. oneset.add(ord(sample[0])) #this set will contain all possible bytes that appear when there is only one byte before the magic
  324. if magicPos==2:
  325. twoset1.add(ord(sample[0]))
  326. twoset2.add(ord(sample[1]))
  327.  
  328. print oneset
  329. print twoset1
  330. print twoset2
  331.  
  332. results in:
  333. set([224, 192, 162, 161, 209, 194, 144, 177, 128])
  334. set([240, 241, 242, 243, 244])
  335. set([2, 4, 40, 41, 42, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 10])
  336.  
  337.  
  338. Later on it might be useful to collect the specific combinations of the 2 bytes that are possible. For now however, there are some useful results already:
  339. There are always either one or two bytes before the magic.
  340. When there are two bytes before the magic, the first byte can be 240 to 244, which is 0xf0 to 0xf4.
  341. When there is only one byte the values range from 0x80 to 0xe0.
  342.  
  343.  
  344.  
  345. Now comes the time to directly compare between small alpha and beta files that remain unchanged.
  346. As a rough indicator for that, use the two ints after the magic and make sure they are the same. As they appear so early in the file and the compression
  347. can only look behind to copy stuff from there, and not into the future, these ints should always be written out.
  348.  
  349. levellistreport.ebx and many others fail the test.
  350. import os
  351. from struct import unpack,pack
  352.  
  353. #go through all files in the ebx folder
  354. for dir0, dirs, ff in os.walk("ebx"):
  355. for fnames in ff:
  356. fname=dir0+"\\"+fnames
  357. f=open(fname,"rb")
  358.  
  359. #grab the header
  360. decompressedSize=unpack(">I",f.read(4))
  361. constant=f.read(2) #0970
  362. compressedSize=unpack(">H",f.read(2))
  363.  
  364. #check the bytes after the header, using the ebx magic as a reference
  365. ebxMagic="\xce\xd1\xb2\x0f"
  366. sample = f.read(10)
  367. magicPos=sample.find(ebxMagic)
  368.  
  369. f.seek(-10+magicPos,1) #move back to where the ebx magic starts, then grab 12 bytes and compare to alpha
  370. betabytes=f.read(12)
  371.  
  372. try: f2=open("D:/hexing/bf4 alpha dump/bundles/"+dir0+"/"+fnames,"rb")
  373. except: continue #some files do not exist in the alpha
  374.  
  375. alphabytes=f2.read(12) #not compressed, so no header or other trouble
  376.  
  377. if alphabytes==betabytes:
  378. print fname
  379.  
  380. which gives back a whopping two files that satisfy the condition:
  381. ebx\sound\mixers\impairedhearing_soundstate_mixer.ebx
  382. ebx\sound\mixers\mandown_soundstate_mixer.ebx
  383.  
  384.  
  385. Both of these still differ directly after those 12 bytes, so ignore them.
  386.  
  387. Keep working with levellistreport.ebx instead, trying to match the metadata:
  388. from my ebx script (these 11 ints appear directly after the ebx magic, they are little endian):
  389. class Header:
  390. def __init__(self,varList): ##all 4byte unsigned integers
  391. self.absStringOffset = varList[0] ## absolute offset for string section start
  392. self.lenStringToEOF = varList[1] ## length from string section start to EOF
  393. self.numGUID = varList[2] ## number of external GUIDs
  394. self.null = varList[3] ## 00000000
  395. self.numInstanceRepeater = varList[4]
  396. self.numComplex = varList[5] ## number of complex entries
  397. self.numField = varList[6] ## number of field entries
  398. self.lenName = varList[7] ## length of name section including padding
  399. self.lenString = varList[8] ## length of string section including padding
  400. self.numArrayRepeater = varList[9]
  401. self.lenPayload = varList[10] ## length of normal payload section; the start of the array payload section is absStringOffset+lenString+lenPayload
  402.  
  403.  
  404. alpha (from here on, all nums in hex even without 0x prefix):
  405. self.absStringOffset = 180
  406. self.lenStringToEOF = 50
  407. self.numGUID = 1
  408. self.null = null, lol
  409. self.numInstanceRepeater = 1
  410. self.numComplex = 4
  411. self.numField = 5
  412. self.lenName = 50
  413. self.lenString = 10
  414. self.numArrayRepeater = 2
  415. self.lenPayload = 30
  416.  
  417. beta (all nums in hex; with some guessing involved):
  418. self.absStringOffset = 190
  419. self.lenStringToEOF = 40
  420. self.numGUID = 2
  421. self.null = null, maybe? It's compressed; it seems that it was removed
  422. self.numInstanceRepeater = 1
  423. self.numComplex = 4
  424. self.numField = 5
  425. self.lenName = 50
  426. self.lenString = 10
  427. self.numArrayRepeater = maybe 6? unlikely
  428. self.lenPayload = 20
  429.  
  430. The null part is the first occurrence of compression I think.
  431. The file starts with f102, then reads 10 bytes, and then compresses the nulls with
  432. 01020073? Or the null was removed from the header altogether. Skim through some larger files
  433. as larger files mean larger values in the metadata so they aren't compressed as easily.
  434.  
  435. Consider the beta materialgrid:
  436. CED1B20F 70250200 10D20400 BF000000
  437. CD05CD05 10002000 5B007006 30000000
  438. A9280000 A012010063
  439.  
  440. self.absStringOffset = 22570
  441. self.lenStringToEOF = 4d210
  442. self.numGUID = bf
  443. self.null = null, maybe?
  444. self.numInstanceRepeater = 210?
  445. self.numComplex = 75b?
  446. self.numField = 306?
  447. self.lenName = ?
  448. self.lenString = ??
  449. self.numArrayRepeater = ???
  450. self.lenPayload = ????
  451.  
  452.  
  453. Just make sure that the header indeed remains the same, and figure out if the null entry is still there or not.
  454. The first two entries were already checked. Move on to numGUID.
  455. get length of the guid section. It is made up of guid pairs with a length of 20 each.
  456. The guids are random bytes and thus should be almost impossible to compress.
  457.  
  458. The size is roughly 17f5, starting @32 and ending before the section containing the
  459. string keywords.
  460.  
  461. 17f5/20 = bf, good
  462.  
  463. The keyword section (which I called Names for some reason) starts @1827 and has a size of about 3bd
  464.  
  465. The next sections in the file are:
  466. fieldDescriptors #10 (i.e. sixteen) bytes long, the 9. byte is pretty much always null
  467. complexDescriptors #look the same as fieldDescriptors, but the byte is not null
  468. instanceRepeaters #consist of three ints, the first int used to be always null
  469. arrayRepeaters #look the same as instanceRepeaters, but the first int is not null
  470. These characteristics can be used to identify the length of the sections and thereby the number of entries.
  471.  
  472. fieldDescriptors and complexDescriptors: the section for both is 500 bytes
  473. instanceRepeaters: about d80 bytes; d80/c = 120
  474. arrayRepeaters: about a788 bytes; a788/c = df6
  475.  
  476. The length of the string section is about 30 bytes compressed.
  477. The length of the non-array payload after that section c080.
  478.  
  479. Can't figure out anything, return to a simpler file.
  480.  
  481.  
  482.  
  483. Analyze the keyword sections of various files to get an idea of how the offset works.
  484. More files = more accurate results.
  485. levellayerinclusion:
  486. alpha:
  487. DataContainer.Asset.$.Name.SubWorldInclusion.array.member.Criteria.WorldPartInclusion.SubWorldInclusionCriterion.Options.WorldPartInclusionCriterion
  488. beta:
  489. DataContainer.Asset.$.Name.SubWorldInclusion.array.member.Criteria.%.FPart)..;..-..on.Options6
  490.  
  491. in detail:
  492. WorldPartInclusion.SubWorldInclusionCriterion
  493. vs
  494. 250046 Part 29000D 3B0003 2D00AF on
  495.  
  496. moving 25 backwards gives back the offset of WorldInclusion
  497. 46 means to copy 5 bytes, then read Part as uncompressed data and then read
  498. another opcode.
  499. So 46 => copy 5 and proceed by 4?
  500.  
  501. With the 29 byte afterwards I end up at the ldInclusion, two bytes too early.
  502.  
  503.  
  504. At least 2d00af refers back to Inclusion. But it should be SubWorldInclusion... meh
  505.  
  506.  
  507. uiawardsoverlaylogic:
  508. alpha:
  509. DataContainer.GameDataContainer.$.DataBusPeer.Flags.
  510. beta:
  511. DataContainer.Game.. $....BusPeer.Flags&.`
  512.  
  513. in detail:
  514. GameDataContainer.$.DataBusPeer.Flags.
  515. vs
  516. Game 120020 $ 00 1000D1 BusPeer.Flags 260060
  517.  
  518. 120020: move back by 12 and copy 0e bytes, then proceed by 2 bytes.
  519. 1000d1: move back by 10 and copy 4, then proceed 0d.
  520.  
  521. Ah, so it does it sequentially.
  522. It first converts Game 120020 to GameDataContainer.
  523. then when it reaches the next code it uses this replacement already,
  524. so when it moves back by 10 bytes it actually arrives it the Data from
  525. GameDataContainer, not the normal DataContainer. This is then copied again, etc.
  526.  
  527. It only makes sense, it would be odd if it copied opcodes too. Silly me.
  528.  
  529. 20 in binary: 00100000, copy 0e, proceed 2
  530. d1 in binary: 11010001, copy 4, proceed 0d
  531.  
  532.  
  533. Manually analyze a whole lot more, I'll only jot down the results; long keyword sections are ideal for this:
  534. using sound/master.ebx. Made a copy of the compressed file. After every step I figured out I replaced
  535. the part with the uncompressed data so the subsequent compressed parts make sense.
  536.  
  537. 0F0057, move 0f, copy 5, proceed 5
  538. 370057, move 37, copy 0b, proceed 5
  539.  
  540. Well isn't that interesting. The same 57 is used for two different things.
  541. This might be an indicator that there is some obfuscation. E.g. it could be implemented
  542. that the program adds the current offset in the file to some number to obtain 57
  543. when compressing.
  544. So I would need to subtract that offset before analyzing the number. That of course
  545. requires an even greater number of samples, with their offset documented.
  546. Let's ignore that for a moment and go on.
  547.  
  548. 100040, move 10, copy 0b, proceed 4
  549. 630091, move 63, copy 4, proceed 9
  550. 0D00F205, move 0d, copy 5, proceed 14
  551.  
  552. Note that the keywords are ordered slightly differently, which hopefully explains how
  553. the same code meant two different things above. The words all sound the same here,
  554. so it's hard to recognize a different order. Go on anyway, keeping that in mind.
  555.  
  556. 220002, move 22, copy 6, proceed 0 (the next opcode is directly after this one)
  557. 7C00F306, move 7c, copy 6, proceed 15
  558. 6C0002, move 6c, copy 7, proceed 0. How is that even possible? Is this some sort of ruse?
  559.  
  560. The first two bytes seem pretty reliable, don't type out how far it moves.
  561. 1D0002, copy 6, proceed 0
  562. 0C0062, copy 6, proceed 6
  563. 120001, copy 6, proceed 0
  564. 550004, copy 4, proceed 0
  565. 6E0006, copy 9 or 8, proceed 0
  566. C80000, copy 0a or 9, proceed 0
  567. 8100F101, copy 4, proceed 10
  568.  
  569.  
  570. Not conclusive at all.
  571.  
  572. However, it seems that that rule about f0 to f4 is true even later in the file.
  573. So identifying compressed parts is not that difficult. Look out for 1 byte with a value
  574. (move back by this amount), then one byte is almost always null because the last time the
  575. string appeared is usually closer. At least, when a file has just ff bytes this always works.
  576. As there's not much point in looking at compressed parts later in the file anyway
  577. unless either the previous parts have been decompressed or I can directly compare against
  578. an alpha file (even then, the keyword section comes pretty early and is the most useful),
  579. just assume it is null. When the first half of the final byte is f, then make sure
  580. to look one byte ahead, if that byte is extremely low, these 4 bytes form
  581. one compressed unit. If the first half is not f it's a bit harder.
  582. Just make sure to look back to see if the position to copy from actually makes sense.
  583.  
  584.  
  585.  
  586.  
  587.  
  588. Take another look at levellistreport.ebx with the goal of fully decompressing it:
  589. decompressed size: 1d0
  590.  
  591. Starts with F102, then some bytes.
  592.  
  593.  
  594. The first guid D6076D4B4DF8DD11BE32C64EACA26B06 is the same in the alpha.
  595. In particular, the file guid is very similar to the instance guid, so
  596. parts are compressed the second time.
  597.  
  598. Likewise, the first guid of the second guid pair, A4E429350D405687DE5E6EFF3347F7ED
  599. is similar to the second part of the pair.
  600.  
  601. I can't really figure that part out yet; moving on to the keyword section which
  602. shouldn't be too hard to decompress.
  603.  
  604. 1B00F103, copy 6, proceed 12
  605. 260013, copy 5, proceed 0
  606.  
  607. It's the end of the keyword section, so it might not seem trivial that it proceeds 0.
  608. However, every string has to end with a nullbyte which has not appeared yet.
  609. Furthermore, the keyword section is padded with nullbytes to a multiple of 10 (sixteen). So
  610. while it may be possible that the section needs no padding by chance (or only 2-3 bytes),
  611. it's very likely that the next upcode fills in the nulls.
  612. Indeed, as the next section starts with 81b5 (the first two bytes of the hash of DataContainer)
  613. there is some padding here.
  614.  
  615. 8E0040, copy some nulls, proceed 4
  616.  
  617. In the alpha (and in bf3) the first fieldDescriptor
  618. is just the hash and lots of nulls: 81B50200000000000000000000000000
  619.  
  620. In the beta, knowing that the next fieldDescriptor starts with the hash 82D8827C,
  621. the entire entry reads: 81B5 C60030000008050092000000
  622.  
  623. This is fundamentally different. Even assuming that C60030 is an opcode,
  624. there is definitely a non-null byte later on. It would have been too easy anyway.
  625.  
  626.  
  627. So, with every descriptor starting with a hash (which is hard to compress) and every
  628. descriptor having a fixed 16 bytes size (hopefully), try to count the entries to get
  629. an idea of the final size to check if the ebx file header remains correct.
  630.  
  631. Never mind, that's impossible.
  632.  
  633.  
  634. Fix the string later on, LevelListReport
  635.  
  636. CF0042 should equal Level; copy 5, proceed 4
  637. F90020 shoudl equal Report; copy 6,
  638.  
  639. current minimum distance between CF0042 and Level: 8f, so there are 40 bytes missing
  640. current distance between F90020 and Report (assuming that Level was replaced by now):
  641. b9, so there are 40 bytes missing here too. So there truly are 40 bytes missing.
  642.  
  643. This means that the difference between the keywords and the strings section
  644. is a multiple of 10. Hopefully that means that the padding remains there as it used to.
  645.  
  646. The keyword section should be 50 bytes long then.
  647. That makes 8E00400000 become 9 nulls, so basically 8E0040 is 7 nulls.
  648.  
  649. So from CF0042, move backwards until Level is reached. That distance is actually cf.
  650. Now just add the size of the keywords before that: 41 bytes
  651. => 110 bytes for keywords + descriptors/repeaters.
  652. Assume that keywords need 50 bytes (they are padded at the end).
  653. That leaves c0 for the descriptors/repeaters
  654.  
  655. So what's the size of all metadata:
  656. Meta size (as given by the header):
  657. 190
  658. Meta size (by summing up the parts):
  659. 30? for the header itself
  660. +60 for the guids (2 external guid pairs, one file guid pair, each 20 bytes)
  661. +50 for the keywords
  662. +c0 for the descriptors/repeaters
  663. = 1a0
  664.  
  665. I suspect either the header lost 10 bytes or one half of a guid pair has been dropped.
  666.  
  667. Moving on to the payload itself. The payload always starts with a 10 byte guid, and more
  668. guids appear later on in the file. These correspond to instances in the xml file.
  669. What's interesting to note is that the guid section at the top contains (among others)
  670. the guid of the primary instance. This guid must appear once in the payload, written out
  671. exactly like at the top. In the case of levellistreport.ebx, there is only one instance in
  672. total. Which means that the guid that must appear is known (from the alpha):
  673. D7076D4B4DFEDD11A232C64E4C926B06
  674.  
  675. It's interesting to note that it is still written out for the most part at the bottom,
  676. while it is somewhere at the top (in compressed form). This indicates that the window
  677. was too small so the compressor did not see the guid at the top anymore.
  678. Or that indeed this is the half of a guid pair that was dropped.
  679.  
  680.  
  681. Got to properly decompress before dealing with that.
  682.  
  683. Assume that the offset in the file is indeed relevant. Have a script sort files by their very
  684. first proceed-number, right before the ebx magic. Then manually analyze them. Would be good
  685. to find files with the number just varying by 1 and with the number to copy varying by 1 too.
  686. Of course that requires still a lot of understand of the header which I don't really have.
  687. Will try anyway.
  688. import os
  689. from struct import unpack,pack #convert a sequence of bytes into ints or floats
  690. from binascii import hexlify #converts several bytes into a string of their hex representation,
  691. #e.g. "\x00\xab"=>"00ab": and similarly "doc"=>"646f63"
  692.  
  693. #utility function, by default Python gives back an error when trying to create a file in a nonexistent folder
  694. #this creates the folder and then the file; requires another function for long pathnames
  695. def open2(path,mode="rb"):
  696. if mode=="wb":
  697. #create folders if necessary and return the file handle
  698.  
  699. #first of all, create one folder level manully because makedirs might fail
  700. path=path.replace("/","\\")
  701. pathParts=path.split("\\")
  702. manualPart="\\".join(pathParts[:2])
  703. if not os.path.isdir(manualPart):
  704. os.makedirs(manualPart)
  705.  
  706. #now handle the rest, including extra long path names
  707. folderPath=lp(os.path.dirname(path))
  708. if not os.path.isdir(folderPath): os.makedirs(folderPath)
  709. return open(lp(path),mode)
  710. def lp(path): #long pathnames
  711. if len(path)<=247 or path=="" or path[:4]=='\\\\?\\': return path
  712. return unicode('\\\\?\\' + os.path.normpath(path))
  713.  
  714. #go through all files in the ebx folder
  715. for dir0, dirs, ff in os.walk("ebx"):
  716. for fnames in ff:
  717. fname=dir0+"\\"+fnames
  718. f=open(fname,"rb")
  719.  
  720. #grab the header
  721. decompressedSize=unpack(">I",f.read(4)) #read big endian unsigned int
  722. constant=f.read(2) #0970
  723. compressedSize=unpack(">H",f.read(2)) #read big endian unsigned half int
  724.  
  725. #check the bytes after the header, using the ebx magic as a reference
  726. ebxMagic="\xce\xd1\xb2\x0f"
  727. sample = f.read(10)
  728. magicPos=sample.find(ebxMagic)
  729.  
  730. f.seek(-10,1) #move back to where the start of the number I want, then grab it to order the files
  731. proceedNum=hexlify(f.read(magicPos))
  732.  
  733. f2=open2("D:/hexing/bf4 beta dump sorted/bundles/"+proceedNum+"/"+fnames,"wb")
  734. f.seek(0)
  735. f2.write(f.read()) #copy the entire file to f2
  736. f2.close()
  737. #this overwrites if files in different beta folders happen
  738. #to have the same name, but there should be enough files anyway
  739.  
  740.  
  741. Beautiful: http://i.imgur.com/8ntBZPv.jpg
  742.  
  743.  
  744.  
  745.  
  746. Well, that was easy:
  747. type 80:
  748. proceed 8 bytes in all 6 files of that type.
  749. In those files the metadata has the same size as the payload, so the
  750. payload integer is compressed already.
  751.  
  752. Because I know from the decompressed size exactly that this is the case,
  753. and it's extremely unlikely that after one compressed integer comes another
  754. number that happens to look the same (so the same number would be copied twice),
  755. I can assume that the number of bytes to copy is indeed exactly 4.
  756.  
  757. There are two different opcodes after proceeding 8 (absolute offset: 11; @11):
  758. 040020, copy 4, proceed 2
  759. 040051, copy 4, proceed 5
  760.  
  761. type 90:
  762. There is just one file; as you might imagine this proceeds 9 bytes.
  763. Which is really odd. The last digit/byte of the integer happens to be
  764. the same (remember that it's little endian so the last byte is on the left) and the
  765. compressor saw an option to optimize this.
  766. @12: 040001, copy 4?, proceed 3
  767.  
  768. type a1:
  769. proceed 0a bytes.
  770. type a2:
  771. same wtf
  772.  
  773. type b1:
  774. proceed 0b
  775.  
  776. Can't really compare like I'd like to if I don't know the header structure,
  777. on the other hand how do I figure out the header structure if it is compressed.
  778.  
  779.  
  780. Well, simple, try to get the header structure by looking at the longer types,
  781. e.g. f429 probably proceeds past the first 20-30 bytes:
  782. 12gflechette_bpb.ebx header structure:
  783. absStringOffset = 4bytes
  784. lenStringToEOF = 4bytes
  785. numGUID = 4bytes
  786. 2bytes, numInstanceRepeater?
  787. 2bytes, numComplex?
  788. 2bytes, numField?
  789. 2bytes, 19
  790. 2bytes, 58
  791. 2bytes, size of keyword section?
  792. 4bytes, 40
  793. 4bytes, 2a
  794. 4bytes, 20f0, slightly smaller than lenStringToEOF (2320)
  795.  
  796. Try another file to fill in the gaps, bd_buildingskyscrapermatteyellow_top_01.ebx
  797. has a corresponding alpha file too in contrast to 12gflechette_bpb:
  798. alpha (all fields are all 4 bytes long):
  799. self.absStringOffset = 1f40
  800. self.lenStringToEOF = 510
  801. self.numGUID = 1
  802. self.null = null
  803. self.numInstanceRepeater = 8
  804. self.numComplex = 36
  805. self.numField = c8
  806. self.lenName = e30
  807. self.lenString = 70
  808. self.numArrayRepeater = 8
  809. self.lenPayload = 430
  810.  
  811. beta (2 bytes or 4 bytes length):
  812. self.absStringOffset = 1e40
  813. self.lenStringToEOF = 430
  814. self.numGUID = 2
  815. 2bytes, 08, probably numInstanceRepeater
  816. 2bytes, 03, no idea
  817. 2bytes, 08, probably numArrayRepeater
  818. 2bytes, 34, probably numComplex
  819. 2bytes, c2, probably numField
  820. 2bytes, de0, probably size of keyword section
  821. 4bytes, 40, probably size of string section
  822. 4bytes, 07, no idea
  823. 4bytes, 380, probably size of payload (without arrays)
  824.  
  825. and the guids are directly after this header
  826. in particular, the first half of the first guid pair (the file guid) is still there, unchanged
  827. the second pair is replaced with nulls or something, which is then of course compressed
  828.  
  829.  
  830. This is too confusing as it is, have the script cut off the compression header so it is easier
  831. to measure the right distances. While I'm at it, replace the ebx magic with the decompressed size.
  832. That way I can calculate the payload size even when compressed.
  833. import os
  834. from struct import unpack,pack #convert a sequence of bytes into ints or floats
  835. from binascii import hexlify #converts several bytes into a string of their hex representation,
  836. #e.g. "\x00\xab"=>"00ab": and similarly "doc"=>"646f63"
  837.  
  838. #utility function, by default Python gives back an error when trying to create a file in a nonexistent folder
  839. #this creates the folder and then the file; requires another function for long pathnames
  840. def open2(path,mode="rb"):
  841. if mode=="wb":
  842. #create folders if necessary and return the file handle
  843.  
  844. #first of all, create one folder level manully because makedirs might fail
  845. path=path.replace("/","\\")
  846. pathParts=path.split("\\")
  847. manualPart="\\".join(pathParts[:2])
  848. if not os.path.isdir(manualPart):
  849. os.makedirs(manualPart)
  850.  
  851. #now handle the rest, including extra long path names
  852. folderPath=lp(os.path.dirname(path))
  853. if not os.path.isdir(folderPath): os.makedirs(folderPath)
  854. return open(lp(path),mode)
  855. def lp(path): #long pathnames
  856. if len(path)<=247 or path=="" or path[:4]=='\\\\?\\': return path
  857. return unicode('\\\\?\\' + os.path.normpath(path))
  858.  
  859. #go through all files in the ebx folder
  860. for dir0, dirs, ff in os.walk("ebx"):
  861. for fnames in ff:
  862. fname=dir0+"\\"+fnames
  863. f=open(fname,"rb")
  864.  
  865. #grab the header
  866.  
  867. #totally forgot about indexing, need to take element 0; peculiarity of the unpack libary.
  868. decompressedSize=unpack(">I",f.read(4))[0] #read big endian unsigned int
  869. constant=f.read(2) #0970
  870. compressedSize=unpack(">H",f.read(2))[0] #read big endian unsigned half int
  871.  
  872. #check the bytes after the header, using the ebx magic as a reference
  873. ebxMagic="\xce\xd1\xb2\x0f"
  874. sample = f.read(10)
  875. magicPos=sample.find(ebxMagic)
  876.  
  877. f.seek(-10,1) #move back to where the start of the number I want, then grab it to order the files
  878. proceedNum=hexlify(f.read(magicPos))
  879.  
  880. f2=open2("D:/hexing/bf4 beta dump sorted/bundles/"+proceedNum+"/"+fnames,"wb")
  881. f.seek(4,1) #don't go back to the start, instead move 4 bytes too (past the ebx magic)
  882. f2.write(pack("I",decompressedSize)) #write as little endian so it is easier to read when next to the other LE stuff
  883. f2.write(f.read()) #copy the entire file to f2
  884. f2.close()
  885. #this overwrites if files in different beta folders happen
  886. #to have the same name, but there should be enough files anyway
  887.  
  888.  
  889.  
  890.  
  891.  
  892. As a reference, a valid ebx header of a f429 file (without the compression header):
  893. CED1B20F 000F0000 20230000 01000000
  894. 0400 0200 0400 1900 5800 8005 40000000
  895. 2A000000 F0200000
  896.  
  897. give them some letters
  898. a b c d
  899. e f g h i j k
  900. l m
  901.  
  902. a) ebx magic
  903. b) meta size
  904. c) payload size (meta + payload = file size)
  905. d) number of external guid pairs (each pair is 20 bytes in total;
  906. and there is one internal guid for the file itself)
  907. e) numInstanceRepeater
  908. f) ?
  909. g) numArrayRepeater
  910. h) numComplex
  911. i) numField
  912. j) keyword section size
  913. k) string section size
  914. l) ?
  915. m) payload size without arrays
  916.  
  917. => 8 bytes less compared to the bf3 ebx header
  918.  
  919. (a1 type) fontcollection_zh_fontmapcollectionwin32:
  920. proceed 0a
  921. 70030000 70020000 0001 030021 00010200F314
  922.  
  923. 030021 must be a substitute for something that's 4 bytes or longer.
  924. It refers to three bytes in the past. So at least one byte must be copied twice.
  925. I.e. 030021 refers to 000001 + at least 00 (and would then continue 0001, 000001, 000001)
  926.  
  927. 70030000 70020000 00010000 0100 (,0001,000001,...) 0001020200F314
  928.  
  929. Ignore the first three parts, so the next entry is the number of ext guids
  930. 0100 (,0001,000001,...) 0001020200F314
  931. the number of ext guids is obviously very small
  932. so 01000001 does not work, because it is a gigantic number (16777217 in decimal)
  933. it must be 010000, then another null is added from the byte afterward
  934.  
  935. 01000000 01 0200F314
  936.  
  937.  
  938. Therefore:
  939. a1: proceed 0a
  940. 030021: move 3, copy 5, proceed 1
  941.  
  942.  
  943. (a2 type) commanderkit:
  944. proceed 0a too
  945.  
  946. fuck, fuck, fuck
  947.  
  948. There are 11576 files with a2, and just 125 with a1.
  949. So it's at least not seeded by the filename or crap like that.
  950.  
  951.  
  952. A01C0000 E01B0000 C000 010011 02 0200F213
  953.  
  954. 010011, move 1, copy 4 or more, proceed 1
  955.  
  956. This file has no external guids so decompressed it is
  957. A01C0000 E01B0000 C0000000 00000000 02 0200F213
  958.  
  959. i.e. 010011, move 1, copy 6, proceed 1
  960.  
  961. So basically these 11576 files are all files with no external guid.
  962.  
  963.  
  964.  
  965.  
  966. c0 type:
  967. proceed 0c
  968.  
  969. lav25_mesh:
  970. 070011, copy 4, proceed 1
  971.  
  972. layer9_homebase_ch:
  973. 0700F31A, copy 4?, proceed?
  974. no idea
  975.  
  976.  
  977. proceeds:
  978. f011: proceed 20
  979. f012: proceed 21
  980. f015: proceed 24
  981. f016: proceed 25
  982. f004: proceed 13
  983.  
  984. by that logic, f000: proceed 0f
  985. or more generally:
  986. f0xy: proceed f + xy
  987.  
  988. and any values below that can be written directly as 80, e0, etc. to proceed 8, e
  989.  
  990. yup, indeed varint.
  991.  
  992.  
  993. f429: 38
  994. f42a: 39
  995.  
  996. So the second half of the first byte is not relevant for the proceed, it must have
  997. another purpose!
  998.  
  999.  
  1000. Oh man. This looks promising.
  1001.  
  1002. When an entry says 0100F23C, it means:
  1003. move 1, copy f(2) with some function f yet to be determined, proceed f+3c=4b
  1004.  
  1005. yup, works correctly along several entries.
  1006.  
  1007. Still got to figure out the function. And why does the very first opcode not always have 0.
  1008. The most reasonable guess is to say f(x)=x+4, because it takes an opcode is 3 bytes at least.
  1009.  
  1010. The first opcode probably contains the bytes to copy for the next opcode.
  1011. Err... I had
  1012. a1, 030021: move 3, copy 5, proceed 1
  1013. a2, 010011, move 1, copy 6, proceed 1
  1014. So the 1 in a1 means copy 5, the 2 in a2 means copy 6.
  1015.  
  1016. So it was the typical "these elements are placed together so they belong together" syndrome; fool me twice :(
  1017.  
  1018.  
  1019. Yup, have tested a small sample and it works. That fully explains it, well almost. The question
  1020. is what happens when the first byte is ff. I assume it then extends over three bytes or something like that.
  1021. It's not important yet to figure out the details, just make sure the script will
  1022. give an error when that happens so I can look into it.
  1023.  
  1024. Write a script to decompress the all blocks of each file. Need to see what happens at ff.
  1025. Also tidy up the script a lot. Decompressing the files until an error occurs. Don't create
  1026. the decompressed files yet; only when an error occurs, create a debug file to get the
  1027. decompressed data until the error.
  1028.  
  1029. Script with some improvements made due to errors (that will be stated shortly):
  1030. import os
  1031. from struct import unpack,pack #convert a sequence of bytes into ints or floats
  1032. from binascii import hexlify #converts several bytes into a string of their hex representation,
  1033. from cStringIO import StringIO #create something that has the same functions as a file but is in memory only
  1034.  
  1035. def readNum(f): #when byte is ff, read one more byte until not ff, add all
  1036. total=0
  1037. while 1:
  1038. byte=ord(f.read(1))
  1039. total+=byte
  1040. if byte!=0xff: return total
  1041.  
  1042. #go through all files in the ebx folder
  1043. for dir0, dirs, ff in os.walk("ebx"):
  1044. for fname in ff:
  1045. f=open(dir0+"\\"+fname,"rb")
  1046.  
  1047. #cheap way to get the end-of-file, i.e. the size of the file
  1048. f.seek(0,2)
  1049. EOF=f.tell()
  1050. f.seek(0)
  1051.  
  1052. decompressedStream=StringIO() #write the decompressed data into memory only
  1053.  
  1054. while f.tell()<EOF:
  1055. #grab the header of a compressed block
  1056. decompressedSize, constant, compressedSize = unpack(">IHH",f.read(8)) #const 0970
  1057. blockOffset=f.tell()
  1058. if constant!=0x970: print "Block header constant is not 970, the script may or may not work correctly."
  1059.  
  1060. #go from one opcode to the next and write the decompressed data into the stream until the block is done
  1061. while f.tell()-blockOffset<compressedSize:
  1062.  
  1063. #read the length byte as a number, then split it in two numbers (from 0 to f each) with bitmasking and shifting
  1064. lengthByte=ord(f.read(1)) #e.g. 9e
  1065. proceedSize=lengthByte>>4 #=> 09
  1066. copySize =lengthByte&0xf #=> 0e
  1067.  
  1068. #the revised version to deal with larger numbers; the original one is in the comment below
  1069. if proceedSize==0xf:
  1070. proceedSize+=readNum(f)
  1071.  
  1072. ## #add the next byte to the proceedSize if the half-byte is f.
  1073. ## #Raise some errors if something new pops up so I can investigate.
  1074. ## if proceedSize==0xf:
  1075. ## nextByte=ord(f.read(1))
  1076. ## if nextByte==0xff: #the byte behaved normally when reaching a number larger than 128, so now check ff
  1077. ## print dir0, fname, f.tell()
  1078. ## f2=open("debug","wb")
  1079. ## f2.write(decompressedStream.getvalue())
  1080. ## f2.close()
  1081. ## proceedSizeIsFF
  1082. ## else: proceedSize+=nextByte
  1083. #### if copySize==0xf:
  1084. #### print dir0, fname, f.tell()
  1085. #### f2=open("debug","wb")
  1086. #### f2.write(decompressedStream.getvalue())
  1087. #### f2.close()
  1088. #### copySizeIsF
  1089.  
  1090. decompressedStream.write(f.read(proceedSize))
  1091. pos0=decompressedStream.tell()
  1092.  
  1093. #this check was added later on, the very last bytes in the block are not compressed so there is no offset to read
  1094. if f.tell()-blockOffset==compressedSize:
  1095. break
  1096.  
  1097. offset=unpack("H",f.read(2))[0]
  1098.  
  1099. #the revised version to deal with larger numbers; the original one is in the comment below
  1100. if copySize==0xf:
  1101. copySize+=readNum(f)
  1102.  
  1103. ## #might be a varint. Not sure if this case happens even once
  1104. ## if copySize==0xf:
  1105. ## print "#########"
  1106. ## copySummand=ord(f.read(1))
  1107. #### print copySummand, dir0, fname, f.tell()
  1108. ## if copySummand>>7: #what happens if the first bit is set
  1109. ## print dir0, fname, f.tell()
  1110. ## f2=open("debug","wb")
  1111. ## f2.write(decompressedStream.getvalue())
  1112. ## f2.close()
  1113. ## asdfasdf
  1114. ## copySize+=copySummand
  1115.  
  1116. copySize+=4
  1117. decompressedStream.seek(-offset,1) #go back to copy the data
  1118.  
  1119. #make several copies if necessary
  1120. if offset<copySize:
  1121. times=copySize/offset
  1122. rest=copySize%offset
  1123. copy=decompressedStream.read(copySize)
  1124. decompressedStream.seek(pos0)
  1125. for i in xrange(times): decompressedStream.write(copy)
  1126. decompressedStream.write(copy[:rest])
  1127. else:
  1128. copy=decompressedStream.read(copySize)
  1129. decompressedStream.seek(pos0)
  1130. decompressedStream.write(copy)
  1131.  
  1132. f.close()
  1133.  
  1134.  
  1135.  
  1136. Errors encountered:
  1137. ebx crowsweaponhudlogic.ebx 373
  1138. NameError: name 'copySizeIsF' is not defined
  1139. i.e. at offset 373 (decimal) in that file in the main ebx folder, an entry has copysize f.
  1140.  
  1141. So what happens in that case?
  1142.  
  1143. the most recent bytes:
  1144. FieldAccessType.FieldAccessType_Source.FieldAccessType_
  1145.  
  1146. the problematic expression:
  1147. 6f Target 2E0004 And
  1148.  
  1149. which the alpha files say become \00FieldAccessType_SourceAnd (note that the Target here belongs to
  1150. the FieldAccessType_Target after removing 6f (opcodes are not decompressed after all).
  1151.  
  1152. Therefore, 17 bytes are needed to produce \x00FieldAccessType_Source
  1153.  
  1154. And the info is to proceed 6 bytes (past Target), and copy f (whatever that means) with an
  1155. offset of 2e.
  1156.  
  1157. Alright, assume the same model as before, with copy = f + 4 = 13.
  1158.  
  1159. So it seems that it has to a single bye is placed directly behind the offset which is
  1160. then added to the number of bytes to copy.
  1161. Therefore it copies f+4+4 = 17 bytes.
  1162.  
  1163. So the proceed length is directly extended after the length byte, whereas the
  1164. copy length is extended at the end, even after the offset.
  1165. The next question is how the lengths are extended in detail. Either the extended bytes
  1166. are some sort of varint (so the first bit specifies whether to read one more byte or not)
  1167. or they just go from 0 to ff, with ff meaning that the next byte will be read too.
  1168.  
  1169.  
  1170. Another error occured towards the end of the file.
  1171. The very last few bytes in that file were not compressed and there was no offset given afterwards.
  1172. Which makes means half a byte was wasted because it specified the number of bytes to copy,
  1173. while nothing was copied at all.
  1174.  
  1175. Anyway, don't try to read an offset then.
  1176. Though I wonder what happens if the last few bytes in a file are indeed compressed.
  1177.  
  1178. Next error (in another file, so apparently one file was fully decompressed already):
  1179. The second byte of the proceed size had its first bit set (could have been variant).
  1180. However when investigating the file behaved normally. So it should
  1181. behave normally at least until it reaches ff (thus not varint).
  1182.  
  1183. Next error:
  1184. uiawardsoverlaylogic.ebx has its extra copy byte with its first bit set @8796 (decimal).
  1185.  
  1186. Can't make any sense of it directly though because it's deep in the data.
  1187. However,the copy byte is set to exactly ff, followed by fb. If it was a varint,
  1188. the number would actually be way longer than the entire file in its
  1189. alpha version. Therefore read the number fffb as ff + fb (with ff also indicating that
  1190. the next byte, fb, is to be read). And apply that system to the proceed size too.
  1191.  
  1192. As a function:
  1193. def readNum(f): #when byte is ff, read one more byte until not ff, add all
  1194. total=0
  1195. while 1:
  1196. byte=ord(f.read(1))
  1197. total+=byte
  1198. if byte!=0xff: return total
  1199.  
  1200. And with that, the script can handle thousands of files.
  1201. One file remains:
  1202. ebx\ui\static\sharedicons.ebx
  1203. Part of the file there give me back a warning about the block header not being 970.
  1204. This file is also the largest beta ebx file there is, being 347 kb.
  1205.  
  1206. The header of the second block: 00010000 00710000 3878
  1207.  
  1208. It's in some odd section containing random ascii letters. Skipping 7100 does not
  1209. get me to the start of the next section, however skipping 10000 does. So 00710000
  1210. says that this block is not compressed (the 3878 is part of the payload already).
  1211.  
  1212.  
  1213.  
  1214.  
  1215. Summarized, then:
  1216. As of the beta of Battlefield 4, the ebx files (containing binary XML) are compressed with an LZ77 algorithm.
  1217.  
  1218. A compressed file consists of several blocks, with no global metadata.
  1219. The blocks are set to have a size of 0x010000 when decompressed, except for the last one which is usually smaller.
  1220.  
  1221. Structure of a compressed block (big endian):
  1222. 4 bytes: decompressed size (0x10000 or less)
  1223. 2 bytes: compression type (0970 for LZ77, 0071 for uncompressed data)
  1224. 2 bytes: compressed size (0000 for uncompressed data) of the payload (i.e. without the header)
  1225. compressed payload
  1226.  
  1227. Decompress each block and glue the decompressed parts together to obtain the file.
  1228.  
  1229. The compression is an LZ77 variant. It requires 3 parameters:
  1230. Copy offset: Move backwards by this amount of bytes and start copying a certain number of bytes following that position.
  1231. Copy length: How many bytes to copy. If the length is larger than the offset, start at the offset again and copy the same values again.
  1232. Proceed length: The number of bytes that were not compressed and can be read directly.
  1233.  
  1234. Note that the offset is defined in regards to the already decompressed data which e.g. does not contain any compression metadata.
  1235.  
  1236. The three values are split up however; while the copy length and proceed length are
  1237. stated together in a single byte, before an uncompressed section, the relevant offset
  1238. is given after the uncompressed section:
  1239. Use the proceed length to read the uncompressed data, at which point you arrive at the start of the offset value.
  1240. Read this value, then move to the offset and copy a number of bytes (given by copy length)
  1241. to the decompressed data. Afterwards, the next copy and proceed length are given and the process starts anew.
  1242.  
  1243. The offset has a constant size of 2 bytes, in little endian.
  1244.  
  1245. The two lengths share the same byte. The first half of the byte belongs to the proceed length,
  1246. whereas the second half belongs to the copy length.
  1247.  
  1248. When the half-byte of the proceed length is f, then the length is extended by another byte,
  1249. which is placed directly after the byte that contains both lengths. The value of that byte
  1250. is added to the value of the proceed length (i.e. f). However, if the extra byte is ff, one more
  1251. byte is read (and so on) and all values are added together.
  1252.  
  1253. The copy length can be extended in the same manner. However, the possible extra bytes are
  1254. located at the end, right after the offset.
  1255. Additionally, a constant value of 4 is added to obtain the actual copy length.
  1256.  
  1257. Finally, it is possible that a file ends without specifying an offset (as the last few bytes
  1258. in the file were not compressed). The proceed length is not affected by that (and the copy
  1259. length is of no relevance).
  1260.  
  1261. As an example, consider the length byte B2:
  1262. Proceed length: B
  1263. Copy length: 2 + 4 = 6
  1264.  
  1265. Another example, F23C:
  1266. Proceed length: F + 3C = 4B
  1267. Copy length: 2 + 4 = 6
  1268.  
  1269. A full example (the whitespace is there to separate hex from ascii; it doesn't count):
  1270. 0000001a 0970 0018 80 minimap. 0800 51 ature 0a00 40 mize
  1271.  
  1272. Header:
  1273. Decompressed size 1a
  1274. LZ77 compression (due to 0970)
  1275. Compressed size 18
  1276.  
  1277. Payload:
  1278. Compressed stream: 80 minimap. 0800 51 ature 0a00 40 mize
  1279. Decompressed stream: *empty*
  1280.  
  1281. The decompression is sequential, start with the left part:
  1282. 80 minimap. 0800
  1283.  
  1284. Read 8 uncompressed bytes into the decompressed stream.
  1285. Decompressed stream: minimap.
  1286.  
  1287. Move back by 8 bytes in the decompressed stream (to the start)
  1288. and copy 4 bytes (mini) to the decompressed stream.
  1289.  
  1290. Compressed stream: 51 ature 0a00 40 mize
  1291. Decompressed stream: minimap.mini
  1292.  
  1293. Perform the same step again:
  1294. 51 ature 0a00
  1295.  
  1296. Read 5 uncompressed bytes into the decompressed stream.
  1297. Decompressed stream: minimap.miniature
  1298.  
  1299. Move back by 0a bytes in the decompressed stream
  1300. and copy 5 bytes (.mini) to the decompressed stream.
  1301.  
  1302.  
  1303. Compressed stream: 40 mize
  1304. Decompressed stream: minimap.miniature.mini
  1305.  
  1306. Read 4 uncompressed bytes into the decompressed stream (with no offset specified).
  1307.  
  1308. Decompressed stream: minimap.miniature.minimize
  1309.  
  1310.  
  1311. Clean up the script. Have it create a new folder with all decompressed ebx
  1312. to investigate the changes to the ebx format:
  1313. import os
  1314. from struct import unpack,pack
  1315. from cStringIO import StringIO
  1316.  
  1317. def open2(path,mode="rb"): #when used to write, create folders too
  1318. if mode=="wb":
  1319. #create folders if necessary and return the file handle
  1320.  
  1321. #first of all, create one folder level manully because makedirs might fail
  1322. path=path.replace("/","\\")
  1323. pathParts=path.split("\\")
  1324. manualPart="\\".join(pathParts[:2])
  1325. if not os.path.isdir(manualPart):
  1326. os.makedirs(manualPart)
  1327.  
  1328. #now handle the rest, including extra long path names
  1329. folderPath=lp(os.path.dirname(path))
  1330. if not os.path.isdir(folderPath): os.makedirs(folderPath)
  1331. return open(lp(path),mode)
  1332. def lp(path): #long pathnames
  1333. if len(path)<=247 or path=="" or path[:4]=='\\\\?\\': return path
  1334. return unicode('\\\\?\\' + os.path.normpath(path))
  1335.  
  1336.  
  1337. def readNum(f): #when byte is ff, read one more byte until not ff, add all
  1338. total=0
  1339. while 1:
  1340. byte=ord(f.read(1))
  1341. total+=byte
  1342. if byte!=0xff: return total
  1343.  
  1344. def decompressLZ77(f,fileSize=None):
  1345. #takes a file handle, gives back a decompressed string
  1346. #allow file size to be specified to work from within archives
  1347.  
  1348. #if file size not specified, get it now (will only work correctly on single files)
  1349. if fileSize==None:
  1350. f.seek(0,2)
  1351. fileSize=f.tell()
  1352. f.seek(0)
  1353. fileOffset=f.tell() #0 for single files, much greater for archives
  1354.  
  1355. #write the decompressed data into memory only, eventually return it
  1356. decompressedStream=StringIO()
  1357.  
  1358. #go through each block, filling the decompressed stream with data
  1359. while f.tell()-fileOffset<fileSize:
  1360. #grab the header of a compressed block
  1361. decompressedSize, compressionType, compressedSize = unpack(">IHH",f.read(8))
  1362.  
  1363. if compressionType==0x71:
  1364. decompressedStream.write(f.read(decompressedSize))
  1365. continue
  1366. elif compressionType!=0x970: print "Unknown compression type: "+str(compressionType)
  1367.  
  1368. #from here on, LZ77
  1369. #go from one opcode to the next and write the decompressed data into the stream until the block is done
  1370. blockOffset=f.tell()
  1371. while f.tell()-blockOffset<compressedSize:
  1372.  
  1373. #retrieve the two sizes from a single byte
  1374. lengthByte=ord(f.read(1)) #e.g. 9e
  1375. proceedSize=lengthByte>>4 #=> 09
  1376. copySize =lengthByte&0xf #=> 0e
  1377.  
  1378. if proceedSize==0xf: proceedSize+=readNum(f)
  1379.  
  1380. #add the uncompressed data to the stream
  1381. decompressedStream.write(f.read(proceedSize))
  1382.  
  1383. #it's possible that the very last bytes in the block are not compressed
  1384. #so there is no offset to read; handle this case
  1385. if f.tell()-blockOffset==compressedSize: break
  1386.  
  1387.  
  1388. pos0=decompressedStream.tell() #data will be written to the end of the stream, so take note of it
  1389. offset=unpack("H",f.read(2))[0]
  1390.  
  1391. if copySize==0xf: copySize+=readNum(f)
  1392. copySize+=4
  1393.  
  1394. decompressedStream.seek(-offset,1) #go back to copy the data
  1395.  
  1396. #make several copies if necessary
  1397. if offset<copySize:
  1398. times=copySize/offset
  1399. rest=copySize%offset
  1400.  
  1401. copy=decompressedStream.read(offset) #either read offset or copySize; read() will yield the same
  1402. decompressedStream.seek(pos0)
  1403. for i in xrange(times): decompressedStream.write(copy)
  1404. decompressedStream.write(copy[:rest])
  1405. else:
  1406. copy=decompressedStream.read(copySize)
  1407. decompressedStream.seek(pos0)
  1408. decompressedStream.write(copy)
  1409. return decompressedStream.getvalue()
  1410.  
  1411.  
  1412.  
  1413. #go through all files in the ebx folder
  1414. for dir0, dirs, ff in os.walk(r"D:\hexing\bf4 beta dump\bundles\ebx"):
  1415. for fname in ff:
  1416. fullPath=dir0+"\\"+fname
  1417.  
  1418. writePath=fullPath.replace(r"bf4 beta dump\bundles\ebx","bf4 decompressed ebx")
  1419. f=open(dir0+"\\"+fname,"rb")
  1420. data=decompressLZ77(f)
  1421. f.close()
  1422. f2=open2(writePath,"wb")
  1423. f2.write(data)
  1424. f2.close()
  1425.  
  1426.  
  1427.  
  1428.  
  1429. The files certainly look more tolerable now:
  1430.  
  1431. Before: http://i.imgur.com/RwjMdgi.png
  1432. After: http://i.imgur.com/xHkGjOD.jpg
  1433.  
  1434.  
  1435. Supplement:
  1436. While dealing with the new patched cas-enabled sbtoc I've stumbled upon two more compression types.
  1437. Type 0070 is almost the same as type 0071, but for 0070 the compressed size equals the decompressed size
  1438. whereas the compressed size is zero for type 0071.
  1439.  
  1440. Another type is 0000, which only occurs when decompressed and compressed size are null too.
  1441. Basically there are 8 nullbytes. In this case, return an empty string.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement