Advertisement
Guest User

Untitled

a guest
Nov 6th, 2013
619
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 74.11 KB | None | 0 0
  1. Next up: The patched archives don't work.
  2. As I already have the retail files I might as well work with those. I don't care about the beta.
  3. Try to dump the patched retail files.
  4.  
  5. Error 1:
  6. Traceback (most recent call last):
  7. File "D:\hexing\release tools\dumper.py", line 368, in <module>
  8. main()
  9. File "D:\hexing\release tools\dumper.py", line 365, in main
  10. dump(fname,outputfolder)
  11. File "D:\hexing\release tools\dumper.py", line 220, in dump
  12. casHandlePayload(entry,ebxPath+entry.elems["name"].content+".ebx")
  13. File "D:\hexing\release tools\dumper.py", line 329, in casHandlePayload
  14. catEntry=cat.entries[entry.elems["sha1"].content]
  15. KeyError: '\xdf+\xdb\xc7(\xef\x18\xbe\xa9d\x16\x00TV\xebT\xfc\xd6\x1c\xb3'
  16.  
  17. So a cascat-dependant sb file asks the cat for the payload with the SHA1 as seen above. But the cat
  18. can't find that SHA1. Well, which cat is it and what's the hexlified form of the SHA1?
  19. >>> hexlify('\xdf+\xdb\xc7(\xef\x18\xbe\xa9d\x16\x00TV\xebT\xfc\xd6\x1c\xb3')
  20. 'df2bdbc728ef18bea96416005456eb54fcd61cb3'
  21.  
  22. So that's the sha1 that's apparently nowhere to be found in the cats. Have a look at the cats myself.
  23. Yup, the sha1 is simply not there. Well, fuck.
  24.  
  25. The archive with this sha1 is Globals.toc/sb. The toc is encrypted and I can't find my unXOR.py right now.
  26. Besides, the sb contains the sha1s etc. so take a look at it first.
  27. Well, the sha1 is definitely there: http://i.imgur.com/SIwp5Yh.png
  28. But I can't find it in the cats.
  29.  
  30. Cascat is redundant. The cas alone have all info necessary to recreate a cat.
  31. Maybe the cat is broken or something, ask the cas instead:
  32. from struct import unpack
  33. for i in xrange(1,22):
  34. f=open("cas_"+(str(i) if i>9 else "0"+str(i))+".cas","rb")
  35. f.seek(0,2)
  36. EOF=f.tell()
  37. f.seek(0)
  38. while f.tell()<EOF:
  39. f.read(4)
  40. sha1=f.read(20)
  41. size=unpack("I",f.read(4))[0]
  42. f.read(4)
  43. if sha1=='\xdf+\xdb\xc7(\xef\x18\xbe\xa9d\x16\x00TV\xebT\xfc\xd6\x1c\xb3':
  44. print i
  45. asdf
  46. f.seek(size,1)
  47. f.close()
  48.  
  49. No hits, so the cas don't know about that sha1 either.
  50. Whatever, just continue with the next file when it can't find the sha1.
  51.  
  52. Okay neat, it extracts about 1k files now.
  53.  
  54. Error 2:
  55. Traceback (most recent call last):
  56. File "D:\hexing\release tools\dumper.py", line 379, in <module>
  57. main()
  58. File "D:\hexing\release tools\dumper.py", line 376, in main
  59. dump(fname,outputfolder)
  60. File "D:\hexing\release tools\dumper.py", line 220, in dump
  61. bundle=sbtoc.Entry(sb)
  62. File "D:\hexing\release tools\sbtoc.py", line 90, in __init__
  63. raise Exception("Entry does not start with \x82 or (rare) \x87 byte. Position: "+str(toc.tell()))
  64. Exception: Entry does not start with ‚ or (rare) ‡ byte. Position: 22849
  65.  
  66.  
  67. Occured in MpCharacter.toc/sb. That error was the issue with the patched beta files too.
  68. Hm well, let's see if the unpatched files extract without complaining... done, 172k files and no issues.
  69.  
  70. Now back to the error, the toc says where to move in the sb file to read a single bundle.
  71. Now, I have the script output the offset as given by the toc file. Interestingly, the very first
  72. offset given by the toc is 22848. So it wants to read in the middle of the file. One would expect
  73. the offset to be in the single digit region. Well, maybe the bundles are not ordered and the one with
  74. the lower offset comes later on.
  75.  
  76. Ah this one has base = true while still being cas.
  77.  
  78. The patched archives have always been tricky. Sometimes it's necessary to cut pieces out
  79. of the unpatched archives to obtain the patched files. Previously that was restricted to
  80. patched non-cas files though.
  81.  
  82. Before going further, look through my script and recall what it does exactly. The archives
  83. require very different handling depending on some flags in the toc, and I honestly don't
  84. remember everything of it.
  85.  
  86. dumper.py:
  87. for each toc file:
  88. Read the toc in the superbundle format, then check if there is a cas flag.
  89.  
  90. if cas is true (globally set by the toc):
  91. for each bundle metadata given in the toc entries:
  92. Go to the bundle offset in the sb.
  93. Read the bundle in a format similar to the superbundle format.
  94.  
  95. I now know the name of every file, and its sha1.
  96. Ask the cat about the sha1, grab the payload and use the name
  97. to extract a file.
  98.  
  99. for each chunk given in the toc entries:
  100. Just ask the cat directly and extract the file.
  101.  
  102. #This approach used to work for both patched and unpatched cascat files.
  103. #For the patched files, ask the patched cat first, then fall back to the unpatched cat.
  104.  
  105. if cas not true (globally set by the toc):
  106. for each bundle metadata given in the toc entries:
  107. Go to the bundle offset in the sb.
  108.  
  109. if base is true (specified for each bundle):
  110. #patched non-cas bundle is the same as unpatched bundle
  111. Skip the process. This tells me to just use the unpatched
  112. bundle. As I expect the user to extract all unpatched files too,
  113. I can leave this one out.
  114.  
  115. if base false, but delta true:
  116. #patched non-cas bundle
  117. In the patched sb, read some delta metadata.
  118. Then take the unpatched sb, but insert pieces of payload
  119. according to the metadata into the sb. The result is a
  120. valid bundle, which is then read.
  121.  
  122. if base false, delta false:
  123. Just read the bundle.
  124.  
  125. With the bundle parsed, use the info to extract the files.
  126. It's almost the same as before, but this time the payload
  127. is given in the sb directly.
  128.  
  129. The script does not check base or delta when dealing with cas sbtoc.
  130.  
  131. Structure of the toc in question:
  132. bundles
  133. entry
  134. id
  135. offset
  136. size
  137. base
  138. entry
  139. id
  140. offset
  141. size
  142. delta
  143. chunks
  144. cas
  145.  
  146. First come the bundles with base, then the ones with delta.
  147. Thus the first bundle has base, and an offset of 22848 which makes no sense in the patched sb.
  148. Look at the unpatched sb instead. Yup, that works alright.
  149. The size of the bundle should be 14444. That's confirmed too by the unpatched sb; each entry starts
  150. with a 82 byte which comes right after that size.
  151.  
  152. So this works similar to the noncas archives. However, I think I can't skip the base bundles this time.
  153. It may be that the bundle requires the file from the patched cat (although this is unlikely as
  154. the sha1 depends on the payload). Nevertheless, implementing that is not that difficult anyway, so I'll do it.
  155.  
  156. Now, what about delta files.
  157. The offset given is for the patched sbtoc, the size too.
  158.  
  159. Grab such a patched delta bundle and get its structure:
  160. path #ignore this, just dump all files to the same place
  161. magicSalt #not necessary for extraction
  162. ebx
  163. entry
  164. name
  165. sha1
  166. size
  167. originalSize #decompressed size IIRC
  168. entry
  169. name
  170. sha1
  171. size
  172. originalSize
  173. casPatchType #never seen this before
  174. entry
  175. name
  176. sha1
  177. size
  178. originalSize
  179. casPatchType
  180. baseSha1 #never seen this before
  181. deltaSha1 #never seen this before
  182. res
  183. entry
  184. name
  185. sha1
  186. size
  187. originalSize
  188. resType
  189. resMeta
  190. resRid #never seen this before
  191. entry
  192. name
  193. sha1
  194. size
  195. originalSize
  196. resType
  197. resMeta
  198. resRid
  199. casPatchType
  200. entry
  201. name
  202. sha1
  203. size
  204. originalSize
  205. resType
  206. resMeta
  207. resRid
  208. casPatchType
  209. baseSha1
  210. deltaSha1
  211. chunks
  212. entry
  213. id
  214. sha1
  215. size
  216. logicalOffset #previously, logicalOffset always had to appear together with rangeStart and rangeEnd. Not anymore.
  217. logicalSize #never seen this before
  218. entry
  219. id
  220. sha1
  221. size
  222. rangeStart
  223. rangeEnd
  224. logicalOffset
  225. logicalSize
  226. entry
  227. id
  228. sha1
  229. size
  230. logicalOffset
  231. logicalSize
  232. casPatchType
  233. chunkMeta
  234. entry
  235. h32
  236. meta
  237. alignMembers
  238. ridSupport #never seen this before
  239. storeCompressedSizes #never seen this before
  240. totalSize
  241. dbxTotalSize #not sure why this is mentioned separately
  242.  
  243.  
  244. Ugh, many new keywords:
  245. casPatchType #integer, either 1 or 2
  246. baseSha1 #indeed a sha1, I assume of the unpatched file
  247. deltaSha1 #sha1 of the new, patched file?
  248. resRid #8 bytes, some kind of hash maybe? http://en.wikipedia.org/wiki/Relative_ID
  249. logicalSize #integer, no clue, don't care
  250. ridSupport #bool, set to 1 (left out if 0 I suppose)
  251. storeCompressedSizes #bool, set to 0
  252. dbxTotalSize #integer, it's two megabytes for the bundle I'm looking at: 2093471
  253.  
  254.  
  255.  
  256.  
  257. Pick the first entry in this delta bundle.
  258. sha1: E4D44ADB1AF9CABDDB1827D288F3344221FE09C9
  259. Exists in the unpatched cat, but not the patched one.
  260. That entry has none of these fancy new keywords, so the current
  261. extraction script should be able to handle this case already.
  262.  
  263. Entry with casPatchType (set to 1) but no other new keywords:
  264. sha1: 81C73AB2A760D16B94D22982D916E91264A4C964
  265. Exists in the patched cat, but not the unpatched one.
  266.  
  267. Entry with casPatchType (set to 2), also contains baseSha1 and deltaSha1:
  268. sha1: 3C9AAEA1E3FA1117ED2CD458DC22E28003F6CFB3
  269. Does not exist in either cat.
  270.  
  271. baseSha1: AFB0C31A2D05F331B6BD013E26F435C2EBC2CB68
  272. Exists in the unpatched cat.
  273.  
  274. deltaSha1: 8614645E2C818F207A589F62E8B40752DF1E8F84
  275. Exists in the patched cat.
  276.  
  277. I don't care about the other keywords as they don't seem essential for extraction.
  278. Well, this is confusing. In the patched noncas bundles there was metadata at the beginning of the bundle
  279. which told me where to use pieces of the unpatched bundle and where to insert the patched data.
  280. This bundle here however does not have any metadata like that. Additionally, I don't have anything to
  281. work with except for the sha1s. So in a way it should probably not be that hard to deal with this.
  282.  
  283. if casPatchType is 0 (or not specified):
  284. Grab the payload from the unpatched cat.
  285. if casPatchType is 1:
  286. Grab the payload from the patched cat.
  287. if casPatchType is 2:
  288. No clue.
  289.  
  290.  
  291. Look at strings to figure this one out.
  292. The bundle name is win32/persistence/unlocks/soldiers/visual/mp/ch/camo05/ch_assault_mp_appearance_camo05_bpb
  293. which also exists in the unpatched sb. Grab the bundle from the sb so I have both the patched and unpatched bundle
  294. to compare directly.
  295.  
  296. The patched casPatchType 2 entry has Name: persistence/unlocks/soldiers/visual/mp/ch/camo05/ch_assault_mp_appearance_camo05_bpb/meshvariationdb_win32
  297.  
  298. For whatever reason, this string appears twice (?!) in the unpatched bundle, with different sha1s though.
  299. Note, a bundle is always read in its entirety, so I have absolutely no idea what is going on.
  300. sha1: AFB0C31A2D05F331B6BD013E26F435C2EBC2CB68
  301. sha1: 19A0C8033C1264587F3E30306CB89C0329CE1524
  302.  
  303. Well, ignore that for now, though keep it in mind; I hadn't even considered the possibility that a single bundle contains two
  304. files with different content but the same name. Eventually I'll have to have the script compare the sha1 for every file
  305. while dumping to make sure that both files are extracted. Err, I'd rather not think about the performance hit and the implementation details.
  306.  
  307. The first sha1 is used as baseSha1, so retrieve the 2 payloads from cascat. It's an ebx file.
  308.  
  309. I suspect that the deltaSha1 in the patched cascat does not refer to an actual file, but instead gives me metadata to cut and glue together pieces
  310. from the unpatched/patched files to obtain the actual file (which then has the sha1 as specified).
  311.  
  312.  
  313. The delta file has just 76 bytes. It does look like guids that replace the original ones.
  314. So the challenge is to distinguish metadata from actual data that is to be inserted and make sure
  315. that the resulting sha1 is 3C9AAEA1E3FA1117ED2CD458DC22E28003F6CFB3.
  316.  
  317. delta file:
  318. 20000046 0D2F0180 424290E0 C8AA785A
  319. E2119BEB DAE5903E 29166F54 408804F6
  320. 823A942C 8D62362D 11F07A76 CE54AB76
  321. E211BE13 C8D7C07E 9A021BBE CE0140B7
  322. 3092179E B63C2010 5C3F6414
  323.  
  324. The 0046 is pretty close to the number of bytes that remain in the file when counting
  325. from after 0046, namely 48. So this might incicate the number of bytes to copy.
  326.  
  327. Now, I suspect these are guids, and parts of the guids might remain the same even in the patched version.
  328. Search for E2119BEB DAE5903E.
  329.  
  330. Yup, got a hit at c0 in the unpatched file. In fact 785A which comes right before is part of the guid too.
  331.  
  332. 76E211BE13C8D7C07E9A02 appears at 3ec.
  333.  
  334. So somehow the file tells me to move several hundred bytes in between.
  335.  
  336. Rearrange a bit:
  337. 2000 0046
  338. 0D2F01804242
  339. 90E0C8AA #unpatched: 157 to 15b; maybe random, but unlikely
  340. 785AE2119BEBDAE5903E #unpatched: BE to C8
  341. 2916
  342. 6F54408804F6823A942C8D62362D11F0 #unpatched: 15F to 16F
  343. 7A76CE54AB
  344. 76E211BE13C8D7C07E9A02 #unpatched: 3ec to 3f7
  345. 1BBECE0140B73092179EB63C20105C3F6414
  346.  
  347.  
  348. Meh, I don't get it. So what do you need if you want to patch stuff?
  349.  
  350. I would expect something like this:
  351. Offset and size of unpatched data
  352. Size of bytes to copy from the delta file
  353.  
  354. Grab pieces of the unpatched data and put delta pieces in between.
  355. Though it might also work with a relative offset.
  356.  
  357. Alright, as interesting it would be to solve this with just a single file, take
  358. a look at another file too.
  359.  
  360. Delta starts with 2000005C, and similiar to before 5C is exactly the number
  361. of bytes coming after that 5C, minus 2.
  362.  
  363. So I suppose that 2000 is some kind of magic without deeper meaning.
  364. And then there are 4 bytes, the first two being the size of the delta file minus the metadata.
  365. The third and fourth byte are then still part of the metadata with an unknown purpose.
  366.  
  367. The second half of the sixth byte is always F it seems.
  368.  
  369. Well, I do know the sha1 that I will get in the end.
  370. Assume that the delta file (which is very small) somehow described the replacement of a single
  371. piece in the unpatched file. As I'm not sure where the metadata is and where the replacement payload starts,
  372. just try all possibilities and check the sha1. Replace a slice of the unpatched file with a slice of the
  373. delta file:
  374. f=open("delta","rb")
  375. delta=list(f.read())
  376. f.close()
  377. f=open("actualfile","rb")
  378. data=list(f.read())
  379. f.close()
  380.  
  381. import copy
  382. import hashlib
  383. from binascii import hexlify
  384.  
  385. for dataPos in xrange(len(data)):
  386. for deltaPos in xrange(len(delta)):
  387. for deltaSize in xrange(len(delta)-deltaPos):
  388. data2=copy.deepcopy(data)
  389. data2[dataPos:dataPos+deltaSize+1]=delta[deltaPos:deltaPos+deltaSize+1]
  390. if hashlib.sha1("".join(data2)).digest()=='<\x9a\xae\xa1\xe3\xfa\x11\x17\xed,\xd4X\xdc"\xe2\x80\x03\xf6\xcf\xb3':
  391. asdf
  392. print dataPos
  393.  
  394. Meh, does not work.
  395.  
  396. Retrieve all delta files and corresponding unpatched files.
  397.  
  398. First of all, confirm that casPatchType 0 always relies the unpatched cat
  399. and casPatchType 1 always relies on the patched cat
  400. def casHandlePayload(entry,outPath): #this version searches the patched cat first
  401. if os.path.exists(lp(outPath)): return #don't overwrite existing files to speed up things
  402. ## print outPath
  403. sha1=entry.elems["sha1"].content
  404. try:
  405. patchType=entry.elems["casPatchType"].content
  406. except:
  407. patchType=0
  408. if patchType==0:
  409. if "baseSha1" in entry.elems: asdf
  410. if "deltaSha1" in entry.elems: asdf
  411. if sha1 not in cat.entries: asdf
  412. if sha1 in cat2.entries: asdf
  413.  
  414. elif patchType==1:
  415. if "baseSha1" in entry.elems: asdf
  416. if "deltaSha1" in entry.elems: asdf
  417. if sha1 in cat.entries: asdf
  418. if sha1 not in cat2.entries: asdf
  419. elif patchType==2:
  420. if "baseSha1" not in entry.elems: asdf
  421. if "deltaSha1" not in entry.elems: asdf
  422.  
  423. baseSha1=entry.elems["baseSha1"].content
  424. deltaSha1=entry.elems["deltaSha1"].content
  425. if baseSha1 not in cat.entries: asdf
  426. if deltaSha1 not in cat2.entries: asdf
  427.  
  428. Oh great, this fails. For some reason with casPatchType 1 the sha1 is found in the unpatched cat but not the patched one.
  429. More precisely, for casPatchType 1 the sha1 may be found in either the unpatched or the patched cat. Will have to
  430. do the usual approach; try the patched first, then fall back if necessary.
  431.  
  432. The other two types work as expected though. Type 0 always has the sha1 in the unpatched cat.
  433. Type 2 has the base in the unpatched cat and the delta in the patched cat.
  434.  
  435. Also: baseSha1/deltaSha1 <=> patchType 2
  436.  
  437.  
  438. Alright, ignore type 0 and 1 for now. Retrieve the deltas and bases.
  439. def casHandlePayload(entry,outPath): #this version searches the patched cat first
  440. if os.path.exists(lp(outPath)): return #don't overwrite existing files to speed up things
  441. ## print outPath
  442. sha1=entry.elems["sha1"].content
  443. try:
  444. patchType=entry.elems["casPatchType"].content
  445. except:
  446. patchType=0
  447. if patchType==2:
  448. baseSha1=entry.elems["baseSha1"].content
  449. deltaSha1=entry.elems["deltaSha1"].content
  450.  
  451. deltaEntry=cat2.entries[deltaSha1]
  452. baseEntry=cat.entries[baseSha1]
  453. deltaPath=outPath+" delta"
  454. basePath=outPath+" base"
  455. out=open2(deltaPath,"wb")
  456. out.write(sha1) #write the sha1 in the beginning of the delta file for convenience
  457. out.write(cat2.grabPayload(deltaEntry))
  458. out.close()
  459. out=open2(basePath,"wb")
  460. out.write(cat.grabPayload(baseEntry))
  461. out.close()
  462.  
  463. Alright, 5140 files. So over 2500 base-delta pairs to work with.
  464. The shortest delta file (shaderdatabase_win32.shaderdatabase) is just 0a bytes payload:
  465. 20000004 203F1974 3000 (this is the entire file)
  466.  
  467. Once again, the total size after the first 4 bytes is given by 0004 + 2.
  468. Once again, the second half of the sixth byte is F.
  469.  
  470. Run the script over the base and delta to see if I get a matching sha1.
  471. The smaller the delta the less likely that there are several substitutions.
  472. That is of course assuming that this is what actually happens.
  473.  
  474. Meh, nothing.
  475.  
  476. mainmenuscreen.ebx:
  477. delta: 2000000C 014F0140 08087BBD 728586FA 25F2
  478. Total size after first 4 bytes: 000c + 2
  479. Second half of sixths byte: F
  480.  
  481. Hmm well. This does look similar to the LZ77 algorithm.
  482. Proceed length is given by c. And the final two bytes are the offset in little endian?
  483. mainmenuscreen is a very small file though, so it can't be an offset.
  484.  
  485. Maybe the sha1s are for the decompressed files (although it was different in bf3).
  486. Nope, just checked that and the sha1 is always taken from the compressed file.
  487.  
  488.  
  489. Try something different with the algorithm (without that odd list thing):
  490. f=open("mainmenuscreen.ebx delta","rb")
  491. sha1=f.read(20)
  492. delta=f.read()
  493. f.close()
  494. f=open("mainmenuscreen.ebx base","rb")
  495. data=f.read()
  496. f.close()
  497.  
  498. import hashlib
  499. from binascii import hexlify
  500.  
  501. for dataPos in xrange(len(data)):
  502. for deltaPos in xrange(len(delta)):
  503. for deltaSize in xrange(len(delta)-deltaPos):
  504. data2=data[:dataPos]+delta[deltaPos:deltaPos+deltaSize+1]+data[dataPos+deltaSize+1:]
  505. if hashlib.sha1(data2).digest()==sha1:
  506. print dataPos, deltaPos, deltaSize
  507. print dataPos
  508.  
  509. Applied to mainmenuscreen:
  510. 259 10 7
  511.  
  512. Traceback (most recent call last):
  513. File "D:\hexing\bf4 test\Neuer Ordner\trial.py", line 17, in <module>
  514. asdf
  515. NameError: name 'asdf' is not defined
  516.  
  517. Got it. Now I know the resulting file. I suppose I just messed up the previous script.
  518. So at position 259 in the unpatched file, I replace 8 (7+1) bytes with bytes from the delta file.
  519.  
  520. The delta file is:
  521. 2000000C 014F0140 0808 7BBD728586FA25F2
  522. And the 8 bytes on the right are the ones that were substituted.
  523.  
  524. Phew, so the replacement bytes are at the very end.
  525.  
  526. 259 in hex is 103, can't find that anywhere in the delta though.
  527. The two 08s are related to the number of bytes to replace I suppose.
  528.  
  529.  
  530. Dataversion.ebx has a rather small delta file too, just 1b bytes. However,
  531. apparently at least two things are substituted here because I can't find the sha1.
  532. So I possibly didn't fail with the script before but the files were just bad.
  533. Find more suitable files (with small bases).
  534.  
  535. fontcollection_ja_fontlibwin32.ebx:
  536. 260 10 7
  537.  
  538. delta: 2000000C 015F0150 0808 8D46479E3AD65FDD
  539.  
  540. Very neat, this delta is the same as the one before but differs only
  541. by exactly one in the offset. As a result it says 15 twice instead
  542. of 14. Note that a number is stored in 1.5 bytes apparently.
  543. 01 5F, but the number is 15.
  544.  
  545. That looks awfully redundant. Both the offset and the replacement
  546. bytes are specified twice. I think there's more to it though:
  547. The second offset/size could be there to say where to go on after
  548. placing the delta payload. In the case of these simple examples, a
  549. number of bytes is substituted, so the two numbers are identical.
  550. Will need further investigation of course.
  551.  
  552. actionscriptlibrary.ebx:
  553. 287 10 7
  554. 2000000C 017F0170 0808 3D9AC231DD6F15FE
  555.  
  556. campaignmissionsscreen.ebx:
  557. 270 10 7
  558. 2000000C 015F0150 0808 B34AD4896B370A2C
  559.  
  560. campaignmissionsscreen and fontcollection_ja_fontlibwin32 specify
  561. identical delta values, but the actual offset in the files is different.
  562.  
  563. mainmenuscreenpc.ebx:
  564. 263 10 7
  565. 2000000C 014F0140 0808 D7647AE7C8329AD3
  566.  
  567. Hm... there's no offset specified somehow.
  568. But there is a number that appears twice. I just don't know its purpose.
  569.  
  570. Well, one way or another this number has to tell me the offset. There are no other bytes left.
  571. Have the script also me the offset counting from the end instead of the start. Also fix it so
  572. it gives the right number of bytes that are copied (deltaSize+1):
  573. for dataPos in xrange(len(data)):
  574. for deltaPos in xrange(len(delta)):
  575. for deltaSize in xrange(len(delta)-deltaPos):
  576. data2=data[:dataPos]+delta[deltaPos:deltaPos+deltaSize+1]+data[dataPos+deltaSize+1:]
  577. if hashlib.sha1(data2).digest()==sha1:
  578. print dataPos, deltaPos, deltaSize+1, len(data)-dataPos
  579. asdf
  580. print dataPos
  581.  
  582. mainmenuscreenpc.ebx:
  583. presumed offset bytes: 014F0140
  584. 263 10 8 16
  585.  
  586. mainmenuscreen.ebx:
  587. presumed offset bytes: 014F0140
  588. 259 10 8 16
  589.  
  590. Looking good so far.
  591.  
  592. fontcollection_ja_fontlibwin32.ebx:
  593. presumed offset bytes: 015F0150
  594. 260 10 8 16
  595.  
  596. Nope. Not good at all.
  597.  
  598. actionscriptlibrary.ebx:
  599. presumed offset bytes: 017F0170
  600. 287 10 8 16
  601.  
  602. 16 everywhere? Is the script doing something wrong?
  603. Manually substituting the bytes does yield the wanted sha1 though.
  604. Odd.
  605.  
  606. I need some files that differ somehow yet make just one substitution.
  607.  
  608.  
  609.  
  610. ultimax_antstate_chunk.ebx:
  611. 306 14 16 28
  612. 20000024 019F0180 2020 00000000 F6FF267B5E455B0EB41D0644812E0DE7 5C9B01000000000000000000
  613.  
  614. The substitute part is F6FF267B5E455B0EB41D0644812E0DE7.
  615.  
  616. ss_stones_01.ebx:
  617. 319 10 1 61
  618. 20000005 01BF0177 0101 73
  619.  
  620. nogadget2.ebx:
  621. 579 10 1 66
  622. 20000005 035F0301 0101 43
  623.  
  624.  
  625. movietexture_shader_sp_prologue_gasexplosion:
  626. 20000005 01CF0178 0101 73
  627. No sha1 match which is odd because it looks so similar to the previous ones.
  628.  
  629.  
  630. 263 10 8 16
  631. 2000000c 014f0140 0808 d7647ae7c8329ad3
  632.  
  633. 259 10 8 16
  634. 2000000c 014f0140 0808 7bbd728586fa25f2
  635.  
  636. 260 10 8 16
  637. 2000000c 015f0150 0808 8d46479e3ad65fdd
  638.  
  639. 313 10 8 16
  640. 2000000c 019f0190 0808 178d137e63801f41
  641.  
  642. 289 10 8 16
  643. 2000000c 017f0170 0808 e5139873617cd410
  644.  
  645.  
  646.  
  647. Mmmh. ultimax_antstate_chunk.ebx again:
  648. 306 14 16 28
  649. 20000024 019F0180 2020 00000000 F6FF267B5E455B0EB41D0644812E0DE7 5C9B01000000000000000000
  650.  
  651. The substitute part is F6FF267B5E455B0EB41D0644812E0DE7.
  652.  
  653. In fact the substitute could also include all the bytes at the end too.
  654. It's strange, those bytes are the same before and after applying the delta.
  655.  
  656. The 2020 refer to all bytes to the right of them, including 4 nulls. Those nulls however
  657. are not substituted.
  658.  
  659.  
  660. While it will be horribly slow, use that sha1 script within the dumper script.
  661. Automatically dump all files that have just a single substitution.
  662. def validateSha1(data,delta,sha1):
  663. t0=time()
  664. for dataPos in xrange(len(data)):
  665. for deltaPos in xrange(len(delta)):
  666. for deltaSize in xrange(len(delta)-deltaPos):
  667. data2=data[:dataPos]+delta[deltaPos:deltaPos+deltaSize+1]+data[dataPos+deltaSize+1:]
  668. if hashlib.sha1(data2).digest()==sha1:
  669. return [dataPos, deltaPos, deltaSize+1, len(data)-dataPos]
  670. if time()-t0>30: asdf
  671. asdf
  672. Limit the time spent with each file to 30 seconds.
  673. Also don't consider delta files larger than 50 bytes.
  674. It should take just a few hours to go through all files.
  675.  
  676.  
  677. 1235 0A 04 37
  678. 20000008 1DAF 1D6C 0404 94B5AA0E
  679.  
  680. 11F3 0A 04 04
  681. 20000008 1D2F 1D2C 0404 ACF453A8
  682.  
  683.  
  684.  
  685. jet_mfd_q5_mesh.ebx:
  686. 2000000C 0D0F 0C20 0808 5158166D711EB911
  687. Decompressed base size: d10
  688.  
  689. So 0d0f is exactly one less than the base size. Either way, this number
  690. contains no useful information.
  691.  
  692.  
  693. Let me get this straight. The delta file contains offsets in the decompressed file.
  694. But the sha1 that is calculated is the one you get when compressing such a patched file.
  695.  
  696. Well, that makes zero sense as it is a waste of computational power to first decompress
  697. a file, then patch it, then compress it to confirm its sha1 is correct. So I can only assume
  698. that once again the sha1 is not checked at all. Ahh, never mind that. Both the base and
  699. delta have a sha1 so there's no way to bypass that anyway. I suppose the resulting sha1
  700. serves no real purpose.
  701.  
  702. Problem is, I don't have the compression algorithm so I cannot directly compare the
  703. sha1. However, I can do the usual sanity checks on the ebx which should suffice to
  704. deal with this.
  705.  
  706. Oh. The system is really obvious with the decompressed files.
  707.  
  708. ultimax_antstate_chunk.ebx again:
  709. 306 14 16 28
  710. 20000024 019F 0180 2020 00000000F6FF267B5E455B0EB41D0644812E0DE75C9B01000000000000000000
  711.  
  712. The substitute part are the 0x20 bytes at the end.
  713. These bytes are inserted at 0x180 in the decompressed file.
  714. The decompressed file is then read until 19f+1, i.e. till the end of file.
  715.  
  716.  
  717.  
  718. ak5c_gunsway.ebx:
  719. Has several substitutions.
  720. Entire delta file:
  721. 2000 002A 12BF
  722. 0E38 0303 0AD7A3
  723. 0EC4 0303 0AD7A3
  724. 0F10 0303 0AD7A3
  725. 0F5C 0303 0AD7A3
  726. 0FA8 0303 0AD7A3
  727. 0FF4 0303 0AD7A3
  728.  
  729. Header:
  730. 2 (or 1) bytes: Delta type
  731. 2 (or 3) bytes: Size without header
  732. 2 bytes: Total decompressed file size-1 (at least for file size <0x10000); patched/unpatched file? Don't know yet.
  733.  
  734. while current position in file < size without header:
  735. 2 bytes: Replacement offset (move there in the base file)
  736. 1 byte: Number of replacement bytes (in the delta file)
  737. 1 byte: Number of bytes to replace in the base file
  738. x bytes: The payload to replace.
  739.  
  740. The two single bytes are just guesses at the moment. It could be similar to
  741. LZ77 compression so maybe it's possible to e.g. make the second single byte
  742. twice as large as the first. Thus the delta payload is read just once, but
  743. applied twice to the base file. That's my idea so far anyway.
  744.  
  745. With that I should be able to handle at least some of the simpler files.
  746.  
  747. I do recall seeing a few files with a non 2000 type. Grab them.
  748. import os
  749. import sys
  750.  
  751. for dir0, dirs,ff in os.walk(sys.path[0]):
  752. for filename in ff:
  753. if filename[-5:]!="delta": continue
  754. f=open(dir0+"\\"+filename,"rb")
  755. f.read(16)
  756. typ=f.read(2)
  757. f.close()
  758. if typ!="\x20\x00":
  759. print filename
  760.  
  761. This yields:
  762. m224_pda_mesh.ebx delta
  763. layer0_cinematic.ebx delta
  764. layer1_ui_schematic.ebx delta
  765. layer18_shipwreck.ebx delta
  766. layer1_ui_schematic.ebx delta
  767. hotel_wings_02_mesh.ebx delta
  768. twigs_01_mesh.ebx delta
  769. decal_plaster_02_mesh.ebx delta
  770. foresttree_l_01_rig_mesh.ebx delta
  771. foresttree_l_03_skin_mesh.ebx delta
  772. leaftree_full_l_01_mesh.ebx delta
  773.  
  774. m224_pda_mesh.ebx:
  775. Substitute the last 9 bytes in the delta file.
  776. 10000001 0C80 0018 0000 0018 0970 000E
  777. 1A000100 90
  778. 80378B11A531D30298
  779.  
  780. This one is rather different from the previous format.
  781. I think I could substitute the 9 bytes, then decompress the file,
  782. and get the offset after decompression. That should appear
  783. somewhere in this format I suppose.
  784.  
  785. Compressed offset: 84c
  786. Decompressed offset: c8f
  787.  
  788. Compressed base file size: 95c
  789. Decompressed base file size: f30
  790.  
  791. 0C80 in the delta is pretty close to the decompressed offset.
  792.  
  793.  
  794. decal_plaster_02_mesh.ebx:
  795. 10000001 0C00 0018 0000 0018 0970 000E
  796. 1A000100 90
  797. 803D811935F2361F1E
  798.  
  799. Almost the same as before.
  800.  
  801. Compressed offset: 7e0
  802. Decompressed offset: c0f
  803.  
  804. Compressed base file size: 85c
  805. Decompressed base file size: cd0
  806.  
  807. Once again, subtract F from the decompressed offset to get 0c00 from the delta.
  808.  
  809. Wait a sec.
  810. No. Please.
  811.  
  812. 0970 here? So the delta is compressed?
  813. Not only that, it's compressed in such a way that it actually wastes space.
  814.  
  815. Recall how compression works:
  816. Structure of a compressed block (big endian):
  817. 4 bytes: decompressed size (0x10000 or less)
  818. 2 bytes: compression type (0970 for LZ77, 0071 for uncompressed data)
  819. 2 bytes: compressed size (0000 for uncompressed data) of the payload (i.e. without the header)
  820. compressed payload
  821.  
  822. Decompress each block and glue the decompressed parts together to obtain the file.
  823.  
  824. The compression is an LZ77 variant. It requires 3 parameters:
  825. Copy offset: Move backwards by this amount of bytes and start copying a certain number of bytes following that position.
  826. Copy length: How many bytes to copy. If the length is larger than the offset, start at the offset again and copy the same values again.
  827. Proceed length: The number of bytes that were not compressed and can be read directly.
  828.  
  829. Note that the offset is defined in regards to the already decompressed data which e.g. does not contain any compression metadata.
  830.  
  831. The three values are split up however; while the copy length and proceed length are
  832. stated together in a single byte, before an uncompressed section, the relevant offset
  833. is given after the uncompressed section:
  834. Use the proceed length to read the uncompressed data, at which point you arrive at the start of the offset value.
  835. Read this value, then move to the offset and copy a number of bytes (given by copy length)
  836. to the decompressed data. Afterwards, the next copy and proceed length are given and the process starts anew.
  837.  
  838. The offset has a constant size of 2 bytes, in little endian.
  839.  
  840. The two lengths share the same byte. The first half of the byte belongs to the proceed length,
  841. whereas the second half belongs to the copy length.
  842.  
  843.  
  844. So the previous file is something like this:
  845. 10000001 0C00 0018
  846.  
  847. Compression header:
  848. 00000018 decompressed size
  849. 0970 LZ77
  850. 000E compressed size
  851. 1A, proceed 1 bytes, copy A+4=E times
  852. 00, the payload to be copied E times
  853. 0100, the offset to move backwards to copy, namely the nullbyte
  854.  
  855. 90, proceed 9 bytes (till the end of the file)
  856. 803D811935F2361F1E, just add this to the decompressed payload
  857.  
  858. So the decompressed payload is (1 null, clone E times, then 9 bytes):
  859. 000000000000000000000000000000803D811935F2361F1E
  860.  
  861. And thus the delta file becomes (after decompression):
  862. 1000 0001 0C00 0018
  863. 000000000000000000000000000000803D811935F2361F1E
  864.  
  865. 0c00: offset in base file to substitute
  866. 0018: number of bytes to copy.
  867.  
  868.  
  869. Header (big endian):
  870. 2 (or 1) bytes: Delta type (2000 is uncompressed, 1000 is compressed)
  871. 2 (or 3) bytes: Size without header (set to 1 for compressed)
  872. 2 bytes (only if uncompressed): Total decompressed file size-1
  873.  
  874. while current position in file < size without header:
  875. 2 bytes: Replacement offset (move there in the base file)
  876. 1 byte: Number of replacement bytes (in the delta file; 0 for compressed)
  877. 1 byte: Number of bytes to replace in the base file
  878. x bytes: The payload to replace.
  879.  
  880. With that I should be able to handle the simple deltas without brute-forcing my way through them.
  881. But surely there are more types out there than just 10 and 20. It would be too easy otherwise.
  882. I can't find any though in those files that I have solved.
  883.  
  884. Run through the bundles again and throw some errors (for the moment, only consider small deltas):
  885. if len(deltaData)>50: return
  886. deltaStream=StringIO(deltaData)
  887. deltaStream.seek(0)
  888.  
  889. typ,deltaSize=unpack(">HH",deltaStream.read(4))
  890.  
  891. if typ==0x1000:
  892. if deltaSize!=1: asdf
  893. elif typ==0x2000:
  894. totalDecompressed=unpack(">H",deltaStream.read(2))[0]
  895. if totalDecompressed+1!=entry.elems["originalSize"].content:
  896. asdf
  897. else:
  898. asdf
  899.  
  900. Error with file 9k22_tunguska_m.ebx; totalDecompressed does not match originalSize.
  901.  
  902. Delta:
  903. 2000 000C FFFF
  904. E926 0101 C0
  905. E948 0303 000000
  906.  
  907. 2000 0017 7FBF
  908. 1BA4 0606 CDCC0C3F0000
  909. 56C2 0202 7042
  910. 56CC 0303 333333
  911.  
  912. So this delta has two blocks within a single file.
  913. Each block can only handle FFFF+1=10000 bytes max.
  914.  
  915. This should be easy to fix. However, ignore the error for now, and continue.
  916. Next error, lav_ad.ebx has type 0.
  917.  
  918. Delta:
  919. 0000 0001
  920. 2000 0007 6F3F
  921. 1294 0303 E17A14
  922.  
  923. Two random ideas what this is about.
  924. 1) A way to seek past several blocks without having to specify all the other crap.
  925. 2) Substitution in the compressed file instead.
  926.  
  927. I know that exactly three bytes, E17A14, are substituted in the file.
  928. Just ask a script where exactly.
  929.  
  930. No hits.
  931.  
  932. Compressed size: BB86
  933. Decompressed size: 16F40
  934.  
  935. The value of 6f3f is pretty far from the expected size.
  936.  
  937. No clue what this is about, keep it in mind for later when everything else is done.
  938.  
  939.  
  940. Remove the restriction so larger deltas are okay too.
  941.  
  942. venicesoldierinputconcepts.ebx is type 10 but has deltaSize!=1:
  943. 1000 0002 0004 0024 0000 0024 0070 0024
  944.  
  945. So the value is 2 here. Problem is, that delta is 8kb so I can't manually analyze it.
  946. Keep that in mind too for later.
  947.  
  948.  
  949. Redefine the format a bit:
  950. All values in big endian.
  951. Header:
  952. 0.5 bytes: Delta type (2 is uncompressed, 1 is compressed, 0/3/4 possible too)
  953. 3.5 bytes: deltaSize, size without header (set to 1 for compressed)
  954. 2 bytes (only if uncompressed): Total decompressed file size-1
  955.  
  956. while current position in file < size without header:
  957. 2 bytes: Replacement offset (move there in the base file)
  958. 1 byte: Number of replacement bytes (in the delta file; 0 for compressed)
  959. 1 byte: Number of bytes to replace in the base file
  960. x bytes: The payload to replace.
  961.  
  962. Number of files depending on type:
  963. type 0: 15
  964. type 1: 1311
  965. type 2: 10075
  966. type 3: 12
  967. type 4: 245
  968.  
  969. All the more reason to finish type 2.
  970.  
  971. Hack something together:
  972. deltaStream=StringIO(deltaData)
  973. deltaStream.seek(0)
  974.  
  975. EOF=len(deltaData)
  976. baseOffset=0
  977.  
  978. while deltaStream.tell()< EOF:
  979. deltaPos0=deltaStream.tell()
  980. tmpSize=unpack(">I",deltaStream.read(4))[0]
  981. typ=tmpSize>>28
  982. deltaBlockSize=tmpSize&0xfffffff
  983.  
  984. if typ!=2:
  985. if baseOffset!=0: asdf
  986. else: return
  987.  
  988. baseBlockSize=unpack(">H",deltaStream.read(2))[0]+1 #add to baseOffset at the end
  989.  
  990. while deltaStream.tell()-deltaPos0 < deltaBlockSize:
  991. offset=unpack(">H",deltaStream.read(2))[0]
  992. byte1,byte2=unpack("BB",deltaStream.read(2))
  993. if byte1!=byte2: asdf
  994. substitute=deltaStream.read(byte1) #not used yet
  995.  
  996.  
  997.  
  998. multiplayerconsumableunlocksetup.ebx fails due to byte1!=byte2:
  999. 2 0000493 08CF
  1000. 0008 0101 20
  1001. 001C 0A0A D000000002000000F004
  1002. 0286 0101 16
  1003. 02A0 0101 16
  1004. 030B 0600 036107000387 (not sure what's going on)
  1005. 0300 048A FFFFA412 (offset went backwards, bad)
  1006. CB06 3444 922CDD837D4910F8500000002C (offset exceeds file size, completely lost track of it)
  1007.  
  1008. Try that again from the critical part.
  1009. 030B 0600
  1010. 0361 0700
  1011. 0387 0300
  1012. 048A FFFF
  1013. A412CB063444922CDD837D4910F8500000002C
  1014.  
  1015. Apparently it accumulates offsets and sizes and terminates with FFFF.
  1016.  
  1017. Have it skip the file when something like this occurs.
  1018.  
  1019. Error with dataversion.ebx:
  1020. 2 0000015 01CF
  1021. 0186 0002 4430
  1022. 019E 0200
  1023. 01B8 0202 4149
  1024. 01C4 0101 31
  1025.  
  1026. The second byte specifies the number of substitute bytes in the delta file.
  1027.  
  1028. The first byte does... something. It certainly requires an offset.
  1029. So I substite 2 bytes at that offset, but where do I take them from?
  1030. Or do I just remove them? That could actually work. Ebx files
  1031. always have a size that is a multiple of 16, so in the lines above
  1032. I would add two bytes first, then remove two bytes later, so these cancel
  1033. each other out.
  1034.  
  1035. First byte: Remove this number of bytes at the offset
  1036. Second byte: Place this number of bytes at the offset (read the bytes in the delta file)
  1037.  
  1038. Still wondering why it went FFFF above. Or was it just the maximum amount of bytes that could be specified?
  1039. Indeed, if I read FF bytes in the multiplayerconsumableunlocksetup delta, the next bytes say 0589 FFFF.
  1040. Not sure how to code such a byte removal in an efficient manner. Well, there are more pressing matters
  1041. right now anyway.
  1042.  
  1043. Adjust the script a bit. Note that baseOffset is not increased yet, so the script returns as
  1044. soon as the type is changed:
  1045. while deltaStream.tell()< EOF:
  1046. tmpSize=unpack(">I",deltaStream.read(4))[0]
  1047. typ=tmpSize>>28
  1048. deltaBlockSize=tmpSize&0xfffffff
  1049.  
  1050. if typ!=2:
  1051. if baseOffset!=0: asdf
  1052. else: return
  1053.  
  1054. baseBlockSize=unpack(">H",deltaStream.read(2))[0]+1 #add to baseOffset at the end
  1055. deltaPos0=deltaStream.tell()
  1056. while deltaStream.tell()-deltaPos0 < deltaBlockSize:
  1057. offset=unpack(">H",deltaStream.read(2))[0]
  1058. removeCount,addCount=unpack("BB",deltaStream.read(2))
  1059. substitute=deltaStream.read(addCount) #not used yet
  1060.  
  1061. Small issues (like most of the implementation) aside, the script should be able
  1062. to extract 9612 files (those which use type 2 delta blocks only).
  1063. The total number of casPatchType 2 files is 11658, so that's a fair share of files already.
  1064. A few hundred files that start as type 2 have gone missing, so they must've changed the type later on.
  1065.  
  1066. Now the final question is whether that delta file is applied sequentially or not.
  1067. Removing a few bytes from the middle of the file, then shifting everything after the cut
  1068. to the left does not seem viable.
  1069. That should be easy to figure out though:
  1070. if removeCount==255 and addCount==0:
  1071. asdf
  1072.  
  1073. The relevant delta bytes:
  1074. feef ff 00
  1075. ffee 12 00
  1076.  
  1077. So in effect it removes all bytes from feef until ffee+12.
  1078. Try the same the other way around:
  1079. if removeCount==255 and addCount==0:
  1080. asdf
  1081.  
  1082. The relevant delta bytes:
  1083. 83F1 00 FF *255 bytes*
  1084. 83F1 00 FF *255 bytes*
  1085.  
  1086. So the offsets do not adjust for that either.
  1087. I'm not sure what this means in practice. It could be that
  1088. the second operation puts the bytes before the bytes of the first operation.
  1089. The opposite case seems just as plausible though.
  1090. Well alright. Have a second stream and just put the bytes in it.
  1091.  
  1092. Keep in mind to decompress the base file before applying the delta.
  1093. Also, for some reason a compression type 0x70 appeared in the base files.
  1094. So a patched bundle accesses an unpatched file which is never used in any
  1095. unpatched bundle. Just handle 0x70 the same as 0x71, i.e. as uncompressed payload:
  1096.  
  1097. Snippet:
  1098. deltaEntry,deltaStream=cat2.getCas(deltaSha1)
  1099. baseEntry,compressedBase=cat.getCas(baseSha1)
  1100. baseStream=decompressLZ77(compressedBase,baseEntry.size)
  1101. compressedBase.close()
  1102.  
  1103. patchedStream=open2(outPath,"wb") #here be the new data
  1104.  
  1105. baseOffset=0 #to handle the base offsets when delta contains more than one block
  1106. while deltaStream.tell()-deltaEntry.offset < deltaEntry.size: #read one block
  1107. tmpSize=unpack(">I",deltaStream.read(4))[0]
  1108. typ=tmpSize>>28
  1109. deltaBlockSize=tmpSize&0xfffffff
  1110.  
  1111. if typ!=2:
  1112. patchedStream.seek(0)
  1113. patchedStream.write("abcd") #break the file magic so the ebx script does not get confused
  1114. patchedStream.close() #this is actually better than deleting the file (I skip the file entirely if it already exists)
  1115. return #todo
  1116.  
  1117. baseBlockSize=unpack(">H",deltaStream.read(2))[0]+1 #has no other purpose than to be added to baseOffset after the loop
  1118. deltaPos0=deltaStream.tell()
  1119. while deltaStream.tell()-deltaPos0 < deltaBlockSize: #go through the individual changes described within one block
  1120. offset=unpack(">H",deltaStream.read(2))[0]
  1121. skipCount,addCount=unpack("BB",deltaStream.read(2))
  1122.  
  1123. sizeUntilOffset=baseOffset+offset-baseStream.tell()
  1124. patchedStream.write(baseStream.read(sizeUntilOffset))
  1125.  
  1126. #skip the bytes, move to new position in the base stream and pretend the bytes were read
  1127. baseStream.seek(skipCount,1)
  1128. baseOffset+=skipCount
  1129.  
  1130. #add the bytes
  1131. patchedStream.write(deltaStream.read(addCount))
  1132.  
  1133. baseOffset+=baseBlockSize
  1134.  
  1135. #add the remaining bytes of the base
  1136. patchedStream.write(baseStream.read())
  1137. patchedStream.close()
  1138. deltaStream.close()
  1139.  
  1140.  
  1141. It's still really slow. The individual blocks in the LZ77 file and in the delta file suggest that I should
  1142. decompress one block, then apply the delta on that block, then decompress the next block etc.
  1143. This does require that the deltas are always synchronized with the base file though. I have
  1144. added 1 to baseBlockSize to obtain the number of bytes in the decompressed file. Still, verify
  1145. this for all files before going further. Though first of all, see if the ebx script can handle
  1146. the files without errors.
  1147.  
  1148.  
  1149. Fail at file c_marine_01.ebx:
  1150. KeyError: 177537
  1151.  
  1152. Somehow there are random bytes in the keyword section.
  1153.  
  1154. Back to the beginning then.
  1155. The corresponding delta:
  1156. 2 0000028 075F #type 2, 28 bytes replaced, decompressed block is 75f+1 in total
  1157. 00C0 20 00 #remove 20 bytes at c0; these are the same bytes that are added in the next step
  1158. 0100 00 20 D66030DC07317746BAAC1CEEF8212E0F1B556D54D2A69448B3621769C50B227C #add these 20 bytes at 100
  1159.  
  1160. In effect this should remove the guid pair D660... from the middle of the list of external guids and
  1161. put it at the end of the list.
  1162.  
  1163. I know for certain that the keyword section must start at 100 (in both the unpatched and patched file).
  1164. The appropriate place to add the bytes is 100-20=e0. Though in fact, I don't even need
  1165. the offset for a pure add operation. I just attach the bytes to the end of
  1166. the patched stream. For some reason there are keywords at e0 and the guid pair at 100.
  1167. Keep a separate counter for removed bytes and subtract it from the offset?
  1168. This is starting to confuse me.
  1169.  
  1170. So what does the script do exactly:
  1171. 00c0 20 00:
  1172. First of all, write all bytes until c0 to the patched stream.
  1173. In the base stream, seek 20 bytes forwards.
  1174.  
  1175. 0100 0020:
  1176. Write all bytes until 100 to the patched stream (bad).
  1177. Add the 20 bytes to the patched stream.
  1178.  
  1179.  
  1180.  
  1181. Okaaay, with a new variable skipTotal it seems to work correctly:
  1182. skipTotal=0
  1183. while deltaStream.tell()-deltaPos0 < deltaBlockSize: #go through the individual changes described within one block
  1184. offset=unpack(">H",deltaStream.read(2))[0]
  1185. skipCount,addCount=unpack("BB",deltaStream.read(2))
  1186.  
  1187. sizeUntilOffset=baseOffset+offset-baseStream.tell()-skipTotal
  1188. patchedStream.write(baseStream.read(sizeUntilOffset))
  1189.  
  1190. #skip the bytes, move to new position in the base stream and pretend the bytes were read
  1191. baseStream.seek(skipCount,1)
  1192. baseOffset+=skipCount
  1193.  
  1194. #add the bytes
  1195. patchedStream.write(deltaStream.read(addCount))
  1196. skipTotal+=skipCount
  1197.  
  1198. If there's an elegant solution to this, I can't see it right now.
  1199.  
  1200. Not too bad now, but it fails at 9k22_tunguska_m.ebx. That file is more than 10000 bytes.
  1201.  
  1202. Delta:
  1203. 2 000000C FFFF
  1204. E926 01 01 C0
  1205. E948 03 03 000000
  1206. 2 0000017 7FBF
  1207. 1BA4 06 06 CDCC0C3F0000
  1208. 56C2 02 02 7042
  1209. 56CC 03 03 333333
  1210.  
  1211. These are plain substitutions. I really need to check if the delta blocks always match compressed blocks.
  1212. That should simplify things at least a bit.
  1213.  
  1214. Well, that's interesting. The number of blocks always match, so one delta block means one compressed block.
  1215. However, the size given by the delta is not always the block size.
  1216.  
  1217. Grab the smallest delta file with size mismatch.
  1218.  
  1219. mp_naval_networkregistry_win32.ebx:
  1220. base decompressed block size: 6a0
  1221. delta block size: 680
  1222.  
  1223. delta:
  1224. 2 0000023 067F #block size 67f+1?
  1225. 0004 09 09 90050000 F000000022 #oh. substitute the metadata right after the magic
  1226. 0160 20 00
  1227. 05A0 01 01 22
  1228. 0610 01 01 22
  1229. 069C 04 04 00000000
  1230.  
  1231. The metadata says that the new file size is 90050000+f0000000 = 590+f0 = 680.
  1232.  
  1233. So the delta block size is the size the block must have in the end.
  1234.  
  1235. Find a mismatch with at least two blocks and check the metadata again.
  1236.  
  1237. sp_airfield delta:
  1238. 2 0000103 FFD7
  1239. 0008 01 01 40 #patch one byte of the payload size (from 70) to 40
  1240. 9998 09 09 060400002F00000070
  1241. 99AC 01 01 AC
  1242. 99B8 01 01 B8
  1243. 99C4 01 01 D0
  1244. 99D0 04 04 D8740000
  1245. 99DC 01 01 00
  1246. 99E8 01 01 2C
  1247. 99F4 01 01 34
  1248. 9A00 01 01 50
  1249. 9A0C 01 01 58
  1250. 9A18 01 01 60
  1251. 9A24 01 01 6C
  1252. 9A30 01 01 84
  1253. 9A3C 01 01 8C
  1254. 9A48 01 01 A0
  1255. 9A54 01 01 A8
  1256. 9A60 01 01 B0
  1257. 9A6C 01 01 C8
  1258. 9A78 01 01 D0
  1259. 9A84 04 04 D8A70000
  1260. 9A90 02 02 E0A7
  1261. 9A9C 02 02 F0A7
  1262. 9AA8 01 01 80
  1263. 9AB4 01 01 90
  1264. 9AC0 02 02 D4BA
  1265. 9ACC 04 04 D0BC0000
  1266. 9AD8 02 02 D8BC
  1267. 9AE4 02 02 E0BC
  1268. 9AF0 01 01 00
  1269. 9AFC 01 01 30
  1270. 9B08 01 01 60
  1271. 9B14 01 01 90
  1272. 9B20 01 01 B4
  1273. 9B2C 01 01 C8
  1274. 9B38 02 02 D8BD
  1275. 9B44 01 01 00
  1276. 9B50 01 01 10
  1277. 9B5C 02 02 D4C1
  1278. 9B68 02 02 F8CA
  1279. 9B74 01 01 2C
  1280. 9B80 01 01 34
  1281. 9B8C 01 01 48
  1282. 9B98 01 01 54
  1283. 9BA4 01 01 5C
  1284. F270 01 01 06
  1285. F74C 28 00
  1286. 2 0000004 9B17
  1287. 7794 08 00
  1288.  
  1289. Meta section size: 9bb0 both before and after patching
  1290. Payload section size: ff70 (before), ff40 (after)
  1291. Total size: 19b20 (before), 19af0 (after)
  1292.  
  1293. delta block sizes: FFD7+1 + 9B17+1 = 19af0
  1294.  
  1295. I get the idea of how to implement this.
  1296. However, I really want to make sure I can treat each block separately.
  1297. That should increase performance and be much simpler to handle.
  1298.  
  1299. Try to find a delta file with more than one block. For any block
  1300. before the last one, check if its last delta operation has
  1301. offset+skipCount >= baseBlockSize. That would indicate
  1302. that the operation stretches over two blocks, but requires two
  1303. entries, one for each block the operation resides in.
  1304.  
  1305. multiplayertemplate.ebx matches this requirement.
  1306. However, the next block is of type 3. Ignore that file for now.
  1307.  
  1308.  
  1309. weaponsbundlesp/shaderdb.shaderdb:
  1310. offset+skipCount = 65149
  1311. baseBlockSize = 64632
  1312.  
  1313. In fact, the offset alone is greater. Hmm, the requirement is wrong.
  1314. I should compare offset+skipCount with the decompressed block size (in the base).
  1315. At least, that's what I think.
  1316.  
  1317.  
  1318.  
  1319. ind_servicebuilding_02_destruction_physics_win32:
  1320. offset+skipCount = 65536
  1321. decompressedBlockSize = 65536
  1322.  
  1323. Delta:
  1324. ...
  1325. FFA4 5C 00
  1326. 2 0002637 fedf
  1327. 0000 3c ff
  1328.  
  1329. I would prefer a delta that doesn't have an addCount.
  1330.  
  1331. ch_fac_dv15_sp_player.ebx:
  1332. 2 00015CB 764F
  1333. 0000 3F 6F
  1334.  
  1335. And these two are indeed the only files that satisfy the condition.
  1336. Well, fuck it. The data is pretty conclusive anyway.
  1337. It has offset 0 for both of these files, so immediately
  1338. at the start of the block the script skips some bytes.
  1339. If the offset was higher the script would first read some unpatched bytes.
  1340. As it is however, with offset 0 it does not really matter if it's
  1341. a pure skipCount or if there are some bytes added.
  1342. More importantly, there is not a single time when the skip count
  1343. exceeds the block size (it's equal in the two cases above).
  1344.  
  1345. Therefore, rewrite the LZ77 decompression to yield single decompressed blocks.
  1346.  
  1347.  
  1348.  
  1349. Still can't get the script to work correctly. Try to gather all remarkable features of
  1350. the delta format an describe it once more:
  1351.  
  1352. Special cases:
  1353. Skipping more than ff bytes:
  1354. feef ff 00
  1355. ffee 12 00 #offset moves through the base payload
  1356. => Do not read the bytes in the base file from feef to ffee+12 under any circumstances.
  1357.  
  1358. Adding more than ff bytes:
  1359. 83F1 00 FF *255 bytes*
  1360. 83F1 00 FF *255 bytes* #the offset remains the same as no bytes of the base payload are read in between
  1361. => When adding bytes only the base offset remains the same.
  1362.  
  1363.  
  1364. General approach:
  1365. Decompress one LZ77 base block and parse the corresponding delta block.
  1366. Apply the delta on the decompressed base to obtain the patched block.
  1367.  
  1368. Delta block structure:
  1369. All values in big endian.
  1370. Header:
  1371. 0.5 bytes: Delta type (2 is uncompressed, 1 is compressed, 0/3/4 possible too)
  1372. 3.5 bytes: Delta block size; size without header (set to 1 for compressed)
  1373. 2 bytes (only if uncompressed): Final size of the patched block (must add 1 to get the actual value)
  1374.  
  1375. while current position in the block < delta block size:
  1376. 2 bytes: Base offset (add base data to the new patched file until reaching this offset)
  1377. 1 byte: Skip count, do not read these bytes from the base file (but seek past these bytes)
  1378. 1 byte: Add count
  1379. *The bytes to add, given by add count*
  1380.  
  1381. 1) Read all bytes from the current position in the decompressed base block until the base offset is reached.
  1382. 2) In the base block, (starting from the base offset) seek past the number of bytes given by skip count.
  1383. 3) Add the add-bytes to the end of the patched stream.
  1384.  
  1385. Finally, read more base bytes until the patched block has the size given by the header.
  1386.  
  1387. Phew, that summarizing that was a great help, got it working right away and the ebx script could handle all extracted files.
  1388.  
  1389. Snippet:
  1390. patchedStream=open2(outPath,"wb") #write the new file in stream, might write directly too though.
  1391. deltaStreamSize=0 #got to keep track of the number of bytes written to the stream
  1392. for baseBlockStream in decompressLZ77(compressedStream,baseEntry.size):
  1393. baseBlockStream.seek(0)
  1394.  
  1395. tmpSize=unpack(">I",deltaStream.read(4))[0]
  1396. typ=tmpSize>>28
  1397. deltaBlockSize=tmpSize&0xfffffff
  1398.  
  1399. if typ!=2:
  1400. patchedStream.seek(0)
  1401. patchedStream.write("abcd") #break the file magic so the ebx script does not get confused
  1402. patchedStream.close() #this is actually better than deleting the file (I skip the file entirely if it already exists)
  1403. return #todo
  1404.  
  1405. patchedBlockSize=unpack(">H",deltaStream.read(2))[0]+1 #usually equals the uncompressed base block, but not always
  1406. deltaPos0=deltaStream.tell()
  1407.  
  1408. currentPatchedBlockSize=0 #use this size to calculate the remaining base bytes to read (after the loop)
  1409. while deltaStream.tell()-deltaPos0 < deltaBlockSize: #go through the individual changes described within the delta block
  1410. #parse the delta
  1411. offset=unpack(">H",deltaStream.read(2))[0]
  1412. skipCount,addCount=unpack("BB",deltaStream.read(2))
  1413. addBytes=deltaStream.read(addCount)
  1414.  
  1415. sizeUntilOffset=offset-baseBlockStream.tell()
  1416. patchedStream.write(baseBlockStream.read(sizeUntilOffset)) #write bytes that require no modification
  1417. baseBlockStream.seek(skipCount,1) #seek past the skip bytes
  1418. patchedStream.write(addBytes) #add the bytes
  1419.  
  1420. currentPatchedBlockSize+=(sizeUntilOffset+addCount)
  1421.  
  1422. #read as many bytes necessary until the patchedStream has the correct size
  1423. patchedStream.write(baseBlockStream.read(patchedBlockSize-currentPatchedBlockSize))
  1424. patchedStream.close()
  1425. deltaStream.close()
  1426.  
  1427.  
  1428.  
  1429. Type 1 delta (rankparams.ebx):
  1430. 1 #type
  1431. 0000001 #maybe number of compressed blocks?
  1432. 3000 #decompressed base offset to substitute
  1433. 12EB #substitute size, also given in the compression header
  1434. Compressed block:
  1435. Header:
  1436. 000012EB 0970 10C0
  1437. *10c0 bytes compressed payload*
  1438.  
  1439.  
  1440. Alright, that's not too difficult to implement.
  1441. Find a delta with more the presumed number of compressed blocks greater than one.
  1442.  
  1443. venicesoldierinputconcepts.ebx (again):
  1444. 1 0000002 0004 #replace at offset 4
  1445. 0024 00000024 0070 0024 *24 bytes*
  1446. 4BEA #replace at offset 4bea?
  1447. 4846 000045F6 0970 1E53 *1e53 bytes*
  1448. And the delta ends after that.
  1449.  
  1450. But what's the purpose of 4846 here?
  1451.  
  1452. First of all, the header is replaced (with uncompressed data even).
  1453. So is the size of the file different?
  1454. before: 5870 + 3bc0 = 9430
  1455. after: 5810 + 39d0 = 91E0
  1456.  
  1457. Great, so the delta must remove 250 bytes.
  1458. Which is of course exactly the difference 4846-45f6.
  1459. 4846 is the number of bytes till the end of file.
  1460.  
  1461. I suppose then that 4846 specifies the number of bytes to skip
  1462. in favor of the new bytes that are added.
  1463.  
  1464.  
  1465.  
  1466. Type 1 delta (venicesoldierinputconcepts.ebx):
  1467. 1 #type
  1468. 0000002 #number of blocks
  1469.  
  1470. for each block:
  1471. 0004 #decompressed base offset
  1472. 0024 #skip count
  1473.  
  1474. Compression header:
  1475. 00000024 0070 0024
  1476. *0024 bytes compressed payload*
  1477.  
  1478. Skip the specified bytes and use decompress the payload to use instead.
  1479.  
  1480.  
  1481.  
  1482. battlepacks.ebx fails:
  1483. For whatever reason, I end up 0c bytes before the end of the delta file.
  1484. And that's just after reading one of two blocks.
  1485.  
  1486. The last few bytes:
  1487. B70C 001C 00000000 0000 0000
  1488.  
  1489. I suppose it just tells me to skip these bytes and not add anything?
  1490. For the compression, if type is 0, then return empty-handed.
  1491. I bet the ebx script will fail anyway.
  1492.  
  1493.  
  1494. levellistreport.ebx cannot be handled by the ebx script.
  1495. In fact it seems to contain every line twice and it missing
  1496. lots of stuff. It contains just a single type 1 delta block.
  1497.  
  1498. 1 0000001
  1499. 0000 02A0 000002A0 0970 01BE *01BE bytes compressed payload*
  1500.  
  1501. Read till offset 0, i.e. read no bytes.
  1502. Remove 2a0 bytes.
  1503. Add the decompressed payload, which consists of 2a0 bytes.
  1504. Thus, the entire file is replaced.
  1505.  
  1506. So why does it fail so horribly? Simple, I forgot to read the rest of
  1507. the block once the bytes are skipped and replaced.
  1508.  
  1509.  
  1510.  
  1511. vehicleshed_medium_mesh fails. Some parts of the string section are
  1512. right in the middle of the metadata. It's type 1 with 2 blocks within.
  1513.  
  1514. Delta:
  1515. 1 0000002
  1516. 0004 005C 0000007C 0970 007B *7b bytes compressed payload*
  1517. 0BAC 01E4 000002B4 0970 0193 *193 bytes compressed payload*
  1518. End of file
  1519.  
  1520. Just manually separate the pieces, decompress them and the base,
  1521. then figure out what to do. Meh.
  1522.  
  1523. So the first block changes the metadata and the size of the ebx.
  1524.  
  1525. Before: Size = bc0+1d0 = d90
  1526. After: Size = be0+2a0 = e80
  1527.  
  1528. Size of the file that the script put together using the delta: ee0
  1529. That's close, but not good enough.
  1530.  
  1531. Also note how the first delta block skips 5c bytes, but adds 7c. That means
  1532. that one guid pair (size exactly 20 bytes) is added. In fact, the new metadata
  1533. confirms this too (the number of guid pairs is increased from 2 to 3).
  1534.  
  1535. The first delta block is safe to apply.
  1536. So err, create another file in the hex editor, then grab base 4 bytes.
  1537. Then the 7c delta bytes.
  1538. Then skip 5c base bytes starting from 4, so move to 60.
  1539. Read base from 60 until 0bac.
  1540. Add 2b4 delta bytes. AND WITH THIS I HAVE REACHED E80.
  1541. Then skip 1e4 base bytes starting from bac, i.e. until D90 (which is the base EOF).
  1542.  
  1543. That looks all fine to me, including the file I get when manually doing this.
  1544. So why the heck did the script fail? Nvm, just a coding mistake. All fixed.
  1545.  
  1546.  
  1547. I get 1951 ebx files out of 2069 and the ebx script can handle them all.
  1548. Keep in mind that this number is without the duplicates throughout the different bundles.
  1549. That leaves about 100 unique files containing type 0,3 or 4.
  1550.  
  1551. Snippet:
  1552. patchedStream=open2(outPath,"wb") #write the new file in stream, might write directly too though.
  1553. deltaStreamSize=0 #got to keep track of the number of bytes written to the stream
  1554. for baseBlockStream in decompressLZ77(compressedStream,baseEntry.size):
  1555. baseBlockStream.seek(0)
  1556.  
  1557. tmpSize=unpack(">I",deltaStream.read(4))[0]
  1558. typ=tmpSize>>28
  1559. deltaBlockSize=tmpSize&0xfffffff
  1560.  
  1561. if typ==2:
  1562. patchedBlockSize=unpack(">H",deltaStream.read(2))[0]+1 #usually equals the uncompressed base block, but not always
  1563. deltaPos0=deltaStream.tell()
  1564.  
  1565. currentPatchedBlockSize=0 #use this size to calculate the remaining base bytes to read (after the loop)
  1566. while deltaStream.tell()-deltaPos0 < deltaBlockSize: #go through the individual changes described within the delta block
  1567. #parse the delta
  1568. offset=unpack(">H",deltaStream.read(2))[0]
  1569. skipCount,addCount=unpack("BB",deltaStream.read(2))
  1570. addBytes=deltaStream.read(addCount)
  1571.  
  1572. sizeUntilOffset=offset-baseBlockStream.tell()
  1573. patchedStream.write(baseBlockStream.read(sizeUntilOffset)) #write bytes that require no modification
  1574. baseBlockStream.seek(skipCount,1) #seek past the skip bytes
  1575. patchedStream.write(addBytes) #add the bytes
  1576.  
  1577. currentPatchedBlockSize+=(sizeUntilOffset+addCount)
  1578.  
  1579. #read as many bytes necessary until the patchedStream has the correct size
  1580. patchedStream.write(baseBlockStream.read(patchedBlockSize-currentPatchedBlockSize))
  1581. elif typ==1:
  1582. for i in xrange(deltaBlockSize):
  1583. offset,skipCount=unpack(">HH",deltaStream.read(4))
  1584. addBytes=decompressLZ77Block(deltaStream).getvalue()
  1585.  
  1586. sizeUntilOffset=offset-baseBlockStream.tell()
  1587. patchedStream.write(baseBlockStream.read(sizeUntilOffset))
  1588. patchedStream.write(addBytes)
  1589. baseBlockStream.seek(skipCount,1)
  1590. patchedStream.write(baseBlockStream.read())
  1591. else:
  1592. patchedStream.seek(0)
  1593. patchedStream.write("abcd")
  1594. patchedStream.close()
  1595. return
  1596.  
  1597.  
  1598. Next types:
  1599.  
  1600. multiplayertemplate.ebx has type 3 (a bit further down the delta file).
  1601.  
  1602. 2 0001590 FFFF
  1603. ...
  1604. 3 0000001
  1605. 00000210 0970 0098 *98 bytes*
  1606. 1 0000002
  1607. 0D80 0039 00000039 0970 0021 *21 bytes*
  1608. A854 408C 0000435C 0970 1030 *1030 bytes*
  1609. EOF
  1610.  
  1611. Well isn't that grand. The base file has only two blocks while
  1612. the delta has three. Now, should I revert the script to a former
  1613. version and deal with the offset madness? For the moment I want
  1614. to hope that the new type 3 might explain everything.
  1615.  
  1616. To be honest I think the type 1 (compressed) does rely
  1617. on the block separation. Without each block that type wouldn't
  1618. know how many bytes to read at the end. Or put another way,
  1619. that type always expects me to read until the end of the block.
  1620.  
  1621. Let's ask other type 3 blocks about their opinion.
  1622.  
  1623. rhib.ebx:
  1624. 3 0000001 00000060 0970 003b *3b bytes*
  1625. 00000001
  1626. EOF
  1627.  
  1628. quadbike:
  1629. 3 0000001 00000060 0970 005b *5b bytes*
  1630. 00000001
  1631. EOF
  1632.  
  1633. ch_fav_lyt2021:
  1634. 3 0000001 00000060 0970 0032 *32 bytes*
  1635. 00000001
  1636. EOF
  1637.  
  1638. I can see a pattern there, lol.
  1639.  
  1640. vdv_buggy:
  1641. 3 00000001 ...
  1642. 2 0000007 5aaf 01e4 03 03 000001
  1643. EOF
  1644.  
  1645. Type 2 comes after a type 3. Oh man...
  1646. But the EOF always seems to come soon after.
  1647. Maybe type 0 or 4 are easier to understand.
  1648.  
  1649.  
  1650.  
  1651. weaponstatcategories has type 4.
  1652. Three meshvariationdb_win32 files too (mp_flooded/content, mp_thedish/content, sp_dam/citybridge).
  1653.  
  1654.  
  1655. weaponstatcategories:
  1656. 2 ...
  1657. 4 0000001 3000 0001 0000CC00 0970 A4D3 *a4d3 bytes*
  1658. EOF
  1659.  
  1660. The file also contains two blocks like the delta, phew.
  1661. So type 4 defines two shorts and then has the compressed payload.
  1662.  
  1663.  
  1664. mp_thedish/content:
  1665. 2 ...
  1666. 4 0000001 3000 0001 00006629 0970 33E6 *33e6 bytes*
  1667. EOF
  1668.  
  1669. Same behaviour.
  1670. This type certainly looks more tolerable than type 3, so it
  1671. should not be too hard to handle it.
  1672.  
  1673.  
  1674.  
  1675. weaponstatcategories:
  1676. Have the script apply all those little changes from type 2 already.
  1677. I.e. I have the first block all patched, but the second block missing
  1678. entirely at the moment.
  1679.  
  1680. Total expected patched size (according to header): 34d0+18b30 = 1C000
  1681. Size of the decompressed delta: CC00
  1682. Size of the patched first block: f400
  1683. cc00+f400 = 1c000. So apparently I do not need to do anything, just
  1684. replace the entire block with the new payload.
  1685.  
  1686. Oh. It's actually a prefix. It's 4 bytes of type 4, prefixing type 3.
  1687.  
  1688. But maybe type 0 is easier to understand?
  1689.  
  1690. mp_resort/content:
  1691. 2 000000C FFFF
  1692. C640 08 08 F977E76551DEC342
  1693. 0 0000002
  1694. EOF
  1695.  
  1696. The file has three base blocks. I suspect that type 0 simply means
  1697. that these blocks remain the same.
  1698.  
  1699. In fact I can't even verify this either way. The delta simply
  1700. replaces 8 bytes in the payload section (some number or guid that
  1701. the ebx script certainly will not complain about) and the rest
  1702. must remain the same or the file size would not match the metadata
  1703. (not to mention, end abruptly).
  1704. Still, it makes the most sense to me, plus it confirms that it
  1705. is mandatory to deal with each block individually.
  1706. However, so far I've expected one delta block for each base block,
  1707. which is not the case. The above example in particular shows that
  1708. I must loop over the delta blocks and not the base blocks.
  1709. When iterating over the base blocks I can't correctly perform
  1710. the 0 0000002 instruction above (unless I add some more variables).
  1711.  
  1712.  
  1713.  
  1714. 3 types done, 2 to go.
  1715.  
  1716. Now, about the number of blocks.
  1717.  
  1718. Recall that multiplayertemplate showed that type 3 acts independently of the base blocks:
  1719.  
  1720. 2 0001590 FFFF
  1721. ...
  1722. 3 0000001
  1723. 00000210 0970 0098 *98 bytes*
  1724. 1 0000002
  1725. 0D80 0039 00000039 0970 0021 *21 bytes*
  1726. A854 408C 0000435C 0970 1030 *1030 bytes*
  1727. EOF
  1728.  
  1729. with 2 base blocks, but three delta blocks. Similarly, weaponstatcategories had two base
  1730. blocks, and the type 3 delta is probably once again independent of the base blocks:
  1731. 2 ...
  1732. 4 0000001
  1733. 3 0000001 0000CC00 0970 A4D3 *a4d3 bytes*
  1734. EOF
  1735.  
  1736. As type 0 keeps one block unchanged, I think type 4 could be about skipping one block entirely.
  1737. Type 3 seems to just insert new payload independently of base blocks.
  1738.  
  1739. Give that a shot. Though, what happens when type 3 has a number different from 1? Does that mean
  1740. several compressed pieces because a single piece would exceed 10000 bytes?
  1741.  
  1742.  
  1743. mp_prison/materialgrid_win32 has such a case:
  1744. 2 ...
  1745. 3 0000002
  1746. 00010000 0970 581B ... #yup, 10000 right there
  1747. 000051D7 0970 23B8 ...
  1748. 1 0000002
  1749. 0000 9A64 #read till offset 0, skip 9a64 bytes
  1750. 00000000 0000 0000 #so this should return an empty string?
  1751. 9B1C 64E4
  1752. 00005851 0970 3DD8 ... #ordinary compression
  1753. 4 0000001 #skip one block?
  1754. 3 0000001
  1755. 00008324 0970 574B ... #use this block instead
  1756. 4 0000002 #skip two blocks
  1757. 3 0000001 00004C89 0970 231C ... #use this block
  1758. 3 0000003 00010000 0970 5B3B ... #use these blocks. But why isn't there just one type 3 assignment with 4 blocks?
  1759. 4 0000001 #skip a block
  1760. EOF
  1761.  
  1762. The base file has 6 blocks.
  1763. Delta: type2, type1, type4, type4(*2),type4 => 6 blocks
  1764.  
  1765. So what's up with that odd type 0 compression?
  1766.  
  1767. It's been there with battlepacks.ebx too:
  1768. 2 ...
  1769. 1 0000002
  1770. 338B 002E
  1771. 00000029 0070 0029 ...
  1772. A7F1 16A4
  1773. 0000166C 0970 0B9B ...
  1774. 1 0000002
  1775. 42F1 65EC
  1776. 000064F0 0970 5872 ...
  1777. B70C 001C
  1778. 00000000 0000 0000
  1779. EOF
  1780.  
  1781. Alright, returning an empty string makes sense. I've also just realized that compression
  1782. type 70 apparently uses compressedSize=decompressedSize, whereas type 71
  1783. uses compressedSize=0 (note that both types have uncompressed payload).
  1784.  
  1785. The ebx script seems to run fine over all files:
  1786. >>> correct
  1787. [68160, 3494, 36387]
  1788. >>> incorrect
  1789. [0, 0, 0]
  1790.  
  1791.  
  1792. Well those were two excellent guesses about these last two types I suppose. 2069 files in total now.
  1793. Enable the other casPatchTypes again.
  1794.  
  1795.  
  1796. Recall that:
  1797. casPatchType 0 has the sha1 in the unpatched cat.
  1798. casPatchType 1 may have the sha1 in either the unpatched or the patched cat.
  1799. casPatchType 2 has the base in the unpatched cat and the delta in the patched cat.
  1800.  
  1801. Go for the safest approach and treat casPatchType 1 like casPatchType0, i.e. try patched first, then unpatched.
  1802.  
  1803. So what about base and delta. Can they appear together?
  1804.  
  1805. Number of files for each combination:
  1806. +base +delta => 0 files
  1807. +base -delta => 955 files
  1808. -base +delta => 686 files
  1809. -base -delta => 0 files
  1810.  
  1811. And for the unpatched files there are only files without base and delta.
  1812. So there are three different cases. As I have different functions
  1813. depending on whether or not there's "Update" in the tocRoot, I can
  1814. at ignore the unpatched part.
  1815.  
  1816. Put it all together, and it fails miserably after a few thousand files.
  1817.  
  1818. sbtoc with error: MpCharacter
  1819.  
  1820. It can read the toc file without issues. Apparently the sb causes some trouble.
  1821. It fails at a base/nondelta bundle. Ah. I forgot that base requires me to
  1822. open up the unpatched sb (in the unpatched folder).
  1823. Hack something together to get the unpatched path. The patched noncas required that too,
  1824. so maybe I can grab the lines from there. Meh, it was different there.
  1825.  
  1826. Unpatched path is: ...\bf4\Data\Win32
  1827. Patched path is: ...\bf4\Update\Patch\Data\Win32
  1828.  
  1829. Something like this should work:
  1830. unpatchedPath=toc.fullpath.replace(r"patched\bf4\Update\Patch\Data\Win32",r"bf4\Data\Win32")
  1831.  
  1832. Alright, I got a working version.
  1833.  
  1834.  
  1835. Time for a format description:
  1836. The sbtoc (superbundle/table of contents) format has a new way of handling patched cascat files.
  1837. Previously, the sbtoc did not contain any patch specific info. The most sensible approach was to
  1838. check if the archives were located in the Update folder and use the patched cascat in that case.
  1839. Now however the sbtoc contains metadata to handle a rather wide range of different types of patches.
  1840.  
  1841. Unchanged from the previous format is the cas flag (found in the toc) which states that all bundles
  1842. inside the sb corresponding to the toc have their assets stored in the cascat archives.
  1843. If the flag does not exist or is set to false, the assets are instead directly stored in the sb file.
  1844.  
  1845. For a cas-enabled sbtoc, the toc does now give additional metadata for every single bundle.
  1846. Each patched bundle may have either a base or delta flag.
  1847.  
  1848. If the base flag is set (for the bundle), then the entire bundle does not require any patching. Note that the game
  1849. apparently relies on the patched files only, which then make references back to the unpatched files.
  1850. The sbtoc in the Update folder contain all the necessary info to retrieve files from the unpatched archives.
  1851.  
  1852. If the delta flag is set (for the bundle), then a casPatchType is specified for each
  1853. file within the bundle, which may take one of three values:
  1854. casPatchType 0 has the sha1 in the unpatched cat.
  1855. casPatchType 1 may have the sha1 in either the unpatched or the patched cat.
  1856. casPatchType 2 has the base in the unpatched cat and the delta in the patched cat.
  1857.  
  1858. If the type is not specified, assume type 0.
  1859.  
  1860. casPatchType 2 defines two more variables, baseSha1 and deltaSha1, which specify the
  1861. sha1s of the unpatched (base) file in the unpatched cascat and of the delta file
  1862. in the patched cascat. The delta file contains the info to patch the base file.
  1863.  
  1864. Note that an ordinary (third) sha1 is still specified. That sha1 belongs to the compressed
  1865. patched file. This is rather odd because the patching process is applied to the
  1866. decompressed file. As both the base sha1 and delta sha1 are given though, there is
  1867. no way to bypass file integrity checks (assuming there are any).
  1868.  
  1869.  
  1870. casPatchType 2 in detail:
  1871. The operations described in the delta file rely on the individual LZ77 blocks.
  1872. It is not possible to decompress the base completely and then apply the patch,
  1873. nor is it possible to apply the patch to the compressed base file.
  1874.  
  1875. A typical delta file contains several blocks with no global header.
  1876. The file is in big endian.
  1877.  
  1878. Each block starts with:
  1879. 0.5 bytes deltaType
  1880. 3.5 bytes deltaBlockSize/blockCount
  1881.  
  1882. The 3.5 bytes specify a blockCount for all deltaTypes except type 2 (which is the most common type).
  1883. I've sorted the types by frequency of occurence.
  1884.  
  1885. deltaType 2:
  1886. (This type contains information about lots of small changes, usually less than ff bytes,
  1887. which are applied to the base block to obtain the patched block.)
  1888.  
  1889. In the base file, decompress a single block. Hereafter, when talking about
  1890. the base block I mean the decompressed base block. The compression is of no importance.
  1891.  
  1892. In the delta file, read 2 bytes: The expected size of the resulting patched block.
  1893. Add 1 to obtain the actual size, so e.g. ffff becomes 10000.
  1894.  
  1895. In the delta file, read all operations belonging to this delta block.
  1896. Its size is given by deltaBlockSize (note that the previous bytes do not count towards the size).
  1897. Each operation has this structure:
  1898. First come 4 bytes:
  1899. 2 bytes offset
  1900. 1 byte skipCount
  1901. 1 byte addCount
  1902.  
  1903. Write parts of the base block to the patched file until
  1904. the offset given above is reached in the base block.
  1905.  
  1906. In the base block, proceed the position by skipCount bytes.
  1907. Those bytes must not appear in the patched file.
  1908.  
  1909. Read addCount bytes from the delta file and add them to the patched file.
  1910.  
  1911. Finally, read as many bytes necessary from the base block
  1912. until the patched file size equals the expected size (as read from the delta file).
  1913.  
  1914. deltaType 1:
  1915. (This type is contains information about larger changes made to the file.
  1916. The delta itself contains LZ77 blocks which are decompressed and used to
  1917. substitute large chunks of the base block.)
  1918.  
  1919. In the base file, decompress a single block.
  1920.  
  1921. Iterate blockCount times doing the following:
  1922. Read 4 bytes from the delta file:
  1923. 2 bytes offset
  1924. 2 bytes skipCount
  1925.  
  1926. Write parts of the base block to the patched file until
  1927. the offset given above is reached in the base block.
  1928.  
  1929. Read one LZ77 block in the delta file and add it to the patched file.
  1930.  
  1931. In the base file, proceed the position by skipCount bytes.
  1932. Those bytes must not appear in the patched file.
  1933.  
  1934. Finally, add all remaining bytes of the base block to the patched file.
  1935.  
  1936. deltaType 0:
  1937. In the base file, decompress a number of blocks (given by blockCount)
  1938. and add them to the patched file directly.
  1939.  
  1940. deltaType 3:
  1941. In the delta file, read a number of LZ77 blocks (given by blockCount) and add them to the patched file.
  1942. This is the only operation which does not depend on the base file at all.
  1943.  
  1944. deltaType 4:
  1945. In the base file, skip this number of blocks entirely, i.e. move past them
  1946. but do not add them to the patched file.
  1947.  
  1948.  
  1949. The relevant casPatchType 2 section:
  1950. else: #casPatchType == 2
  1951. baseSha1=entry.elems["baseSha1"].content
  1952. deltaSha1=entry.elems["deltaSha1"].content
  1953.  
  1954. deltaEntry,deltaStream=cat2.getCas(deltaSha1)
  1955. baseEntry,compressedStream=cat.getCas(baseSha1)
  1956.  
  1957. patchedStream=open2(outPath,"wb") #write the new file directly
  1958. while deltaStream.tell()-deltaEntry.offset<deltaEntry.size:
  1959. tmpSize=unpack(">I",deltaStream.read(4))[0]
  1960. deltaType=tmpSize>>28
  1961. deltaBlockSize=tmpSize&0xfffffff #if type!=2, this number actually specifies repetitions. Type 2 is by far the most frequent type so the name is justified
  1962.  
  1963. if deltaType==2:
  1964. #small changes (usually less than FF bytes) within a single decompressed base block
  1965. baseBlockStream=decompressLZ77BlockWrapper(compressedStream)
  1966. patchedBlockSize=unpack(">H",deltaStream.read(2))[0]+1 #usually equals the uncompressed base block, but not always
  1967. deltaPos0=deltaStream.tell()
  1968.  
  1969. currentPatchedBlockSize=0 #use this size to calculate the remaining base bytes to read (after the loop)
  1970. while deltaStream.tell()-deltaPos0 < deltaBlockSize: #go through the individual changes described within the delta block
  1971. #parse the delta
  1972. offset=unpack(">H",deltaStream.read(2))[0]
  1973. skipCount,addCount=unpack("BB",deltaStream.read(2))
  1974. addBytes=deltaStream.read(addCount)
  1975.  
  1976. sizeUntilOffset=offset-baseBlockStream.tell()
  1977. patchedStream.write(baseBlockStream.read(sizeUntilOffset)) #write bytes that require no modification
  1978. baseBlockStream.seek(skipCount,1) #seek past the skip bytes
  1979. patchedStream.write(addBytes) #add the bytes
  1980.  
  1981. currentPatchedBlockSize+=(sizeUntilOffset+addCount)
  1982.  
  1983. #read as many bytes necessary until the patchedStream has the correct size
  1984. patchedStream.write(baseBlockStream.read(patchedBlockSize-currentPatchedBlockSize))
  1985.  
  1986. elif deltaType==1:
  1987. #similar to type 2 but with compressed delta payload and the replacement of larger sections
  1988. baseBlockStream=decompressLZ77BlockWrapper(compressedStream)
  1989. for i in xrange(deltaBlockSize):
  1990. offset,skipCount=unpack(">HH",deltaStream.read(4))
  1991. addBytes=decompressLZ77BlockWrapper(deltaStream).getvalue()
  1992.  
  1993. sizeUntilOffset=offset-baseBlockStream.tell()
  1994. patchedStream.write(baseBlockStream.read(sizeUntilOffset))
  1995. patchedStream.write(addBytes)
  1996. baseBlockStream.seek(skipCount,1)
  1997. patchedStream.write(baseBlockStream.read())
  1998.  
  1999. elif deltaType==4:
  2000. #skip these base blocks entirely. (manually seeking should be faster than using the decompression script)
  2001. for i in xrange(deltaBlockSize):
  2002. decompressedSize, compressionType, compressedSize = unpack(">IHH",compressedStream.read(8))
  2003. if compressionType in (0x70,0x71): compressedStream.seek(decompressedSize,1)
  2004. elif compressionType==0x970: compressedStream.seek(compressedSize,1)
  2005. elif compressionType==0: asdf
  2006.  
  2007. elif deltaType==0:
  2008. #read the base blocks entirely without modifying them
  2009. for i in xrange(deltaBlockSize):
  2010. patchedStream.write(decompressLZ77BlockWrapper(compressedStream).read())
  2011.  
  2012. elif deltaType==3:
  2013. #add payload in between base blocks. This is the only type that does not depend on the base file at all
  2014. for i in xrange(deltaBlockSize):
  2015. patchedStream.write(decompressLZ77BlockWrapper(deltaStream).read())
  2016.  
  2017. else: asdf
  2018.  
  2019. patchedStream.close()
  2020. deltaStream.close()
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement