Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Next up: The patched archives don't work.
- As I already have the retail files I might as well work with those. I don't care about the beta.
- Try to dump the patched retail files.
- Error 1:
- Traceback (most recent call last):
- File "D:\hexing\release tools\dumper.py", line 368, in <module>
- main()
- File "D:\hexing\release tools\dumper.py", line 365, in main
- dump(fname,outputfolder)
- File "D:\hexing\release tools\dumper.py", line 220, in dump
- casHandlePayload(entry,ebxPath+entry.elems["name"].content+".ebx")
- File "D:\hexing\release tools\dumper.py", line 329, in casHandlePayload
- catEntry=cat.entries[entry.elems["sha1"].content]
- KeyError: '\xdf+\xdb\xc7(\xef\x18\xbe\xa9d\x16\x00TV\xebT\xfc\xd6\x1c\xb3'
- So a cascat-dependant sb file asks the cat for the payload with the SHA1 as seen above. But the cat
- can't find that SHA1. Well, which cat is it and what's the hexlified form of the SHA1?
- >>> hexlify('\xdf+\xdb\xc7(\xef\x18\xbe\xa9d\x16\x00TV\xebT\xfc\xd6\x1c\xb3')
- 'df2bdbc728ef18bea96416005456eb54fcd61cb3'
- So that's the sha1 that's apparently nowhere to be found in the cats. Have a look at the cats myself.
- Yup, the sha1 is simply not there. Well, fuck.
- The archive with this sha1 is Globals.toc/sb. The toc is encrypted and I can't find my unXOR.py right now.
- Besides, the sb contains the sha1s etc. so take a look at it first.
- Well, the sha1 is definitely there: http://i.imgur.com/SIwp5Yh.png
- But I can't find it in the cats.
- Cascat is redundant. The cas alone have all info necessary to recreate a cat.
- Maybe the cat is broken or something, ask the cas instead:
- from struct import unpack
- for i in xrange(1,22):
- f=open("cas_"+(str(i) if i>9 else "0"+str(i))+".cas","rb")
- f.seek(0,2)
- EOF=f.tell()
- f.seek(0)
- while f.tell()<EOF:
- f.read(4)
- sha1=f.read(20)
- size=unpack("I",f.read(4))[0]
- f.read(4)
- if sha1=='\xdf+\xdb\xc7(\xef\x18\xbe\xa9d\x16\x00TV\xebT\xfc\xd6\x1c\xb3':
- print i
- asdf
- f.seek(size,1)
- f.close()
- No hits, so the cas don't know about that sha1 either.
- Whatever, just continue with the next file when it can't find the sha1.
- Okay neat, it extracts about 1k files now.
- Error 2:
- Traceback (most recent call last):
- File "D:\hexing\release tools\dumper.py", line 379, in <module>
- main()
- File "D:\hexing\release tools\dumper.py", line 376, in main
- dump(fname,outputfolder)
- File "D:\hexing\release tools\dumper.py", line 220, in dump
- bundle=sbtoc.Entry(sb)
- File "D:\hexing\release tools\sbtoc.py", line 90, in __init__
- raise Exception("Entry does not start with \x82 or (rare) \x87 byte. Position: "+str(toc.tell()))
- Exception: Entry does not start with ‚ or (rare) ‡ byte. Position: 22849
- Occured in MpCharacter.toc/sb. That error was the issue with the patched beta files too.
- Hm well, let's see if the unpatched files extract without complaining... done, 172k files and no issues.
- Now back to the error, the toc says where to move in the sb file to read a single bundle.
- Now, I have the script output the offset as given by the toc file. Interestingly, the very first
- offset given by the toc is 22848. So it wants to read in the middle of the file. One would expect
- the offset to be in the single digit region. Well, maybe the bundles are not ordered and the one with
- the lower offset comes later on.
- Ah this one has base = true while still being cas.
- The patched archives have always been tricky. Sometimes it's necessary to cut pieces out
- of the unpatched archives to obtain the patched files. Previously that was restricted to
- patched non-cas files though.
- Before going further, look through my script and recall what it does exactly. The archives
- require very different handling depending on some flags in the toc, and I honestly don't
- remember everything of it.
- dumper.py:
- for each toc file:
- Read the toc in the superbundle format, then check if there is a cas flag.
- if cas is true (globally set by the toc):
- for each bundle metadata given in the toc entries:
- Go to the bundle offset in the sb.
- Read the bundle in a format similar to the superbundle format.
- I now know the name of every file, and its sha1.
- Ask the cat about the sha1, grab the payload and use the name
- to extract a file.
- for each chunk given in the toc entries:
- Just ask the cat directly and extract the file.
- #This approach used to work for both patched and unpatched cascat files.
- #For the patched files, ask the patched cat first, then fall back to the unpatched cat.
- if cas not true (globally set by the toc):
- for each bundle metadata given in the toc entries:
- Go to the bundle offset in the sb.
- if base is true (specified for each bundle):
- #patched non-cas bundle is the same as unpatched bundle
- Skip the process. This tells me to just use the unpatched
- bundle. As I expect the user to extract all unpatched files too,
- I can leave this one out.
- if base false, but delta true:
- #patched non-cas bundle
- In the patched sb, read some delta metadata.
- Then take the unpatched sb, but insert pieces of payload
- according to the metadata into the sb. The result is a
- valid bundle, which is then read.
- if base false, delta false:
- Just read the bundle.
- With the bundle parsed, use the info to extract the files.
- It's almost the same as before, but this time the payload
- is given in the sb directly.
- The script does not check base or delta when dealing with cas sbtoc.
- Structure of the toc in question:
- bundles
- entry
- id
- offset
- size
- base
- entry
- id
- offset
- size
- delta
- chunks
- cas
- First come the bundles with base, then the ones with delta.
- Thus the first bundle has base, and an offset of 22848 which makes no sense in the patched sb.
- Look at the unpatched sb instead. Yup, that works alright.
- The size of the bundle should be 14444. That's confirmed too by the unpatched sb; each entry starts
- with a 82 byte which comes right after that size.
- So this works similar to the noncas archives. However, I think I can't skip the base bundles this time.
- It may be that the bundle requires the file from the patched cat (although this is unlikely as
- the sha1 depends on the payload). Nevertheless, implementing that is not that difficult anyway, so I'll do it.
- Now, what about delta files.
- The offset given is for the patched sbtoc, the size too.
- Grab such a patched delta bundle and get its structure:
- path #ignore this, just dump all files to the same place
- magicSalt #not necessary for extraction
- ebx
- entry
- name
- sha1
- size
- originalSize #decompressed size IIRC
- entry
- name
- sha1
- size
- originalSize
- casPatchType #never seen this before
- entry
- name
- sha1
- size
- originalSize
- casPatchType
- baseSha1 #never seen this before
- deltaSha1 #never seen this before
- res
- entry
- name
- sha1
- size
- originalSize
- resType
- resMeta
- resRid #never seen this before
- entry
- name
- sha1
- size
- originalSize
- resType
- resMeta
- resRid
- casPatchType
- entry
- name
- sha1
- size
- originalSize
- resType
- resMeta
- resRid
- casPatchType
- baseSha1
- deltaSha1
- chunks
- entry
- id
- sha1
- size
- logicalOffset #previously, logicalOffset always had to appear together with rangeStart and rangeEnd. Not anymore.
- logicalSize #never seen this before
- entry
- id
- sha1
- size
- rangeStart
- rangeEnd
- logicalOffset
- logicalSize
- entry
- id
- sha1
- size
- logicalOffset
- logicalSize
- casPatchType
- chunkMeta
- entry
- h32
- meta
- alignMembers
- ridSupport #never seen this before
- storeCompressedSizes #never seen this before
- totalSize
- dbxTotalSize #not sure why this is mentioned separately
- Ugh, many new keywords:
- casPatchType #integer, either 1 or 2
- baseSha1 #indeed a sha1, I assume of the unpatched file
- deltaSha1 #sha1 of the new, patched file?
- resRid #8 bytes, some kind of hash maybe? http://en.wikipedia.org/wiki/Relative_ID
- logicalSize #integer, no clue, don't care
- ridSupport #bool, set to 1 (left out if 0 I suppose)
- storeCompressedSizes #bool, set to 0
- dbxTotalSize #integer, it's two megabytes for the bundle I'm looking at: 2093471
- Pick the first entry in this delta bundle.
- sha1: E4D44ADB1AF9CABDDB1827D288F3344221FE09C9
- Exists in the unpatched cat, but not the patched one.
- That entry has none of these fancy new keywords, so the current
- extraction script should be able to handle this case already.
- Entry with casPatchType (set to 1) but no other new keywords:
- sha1: 81C73AB2A760D16B94D22982D916E91264A4C964
- Exists in the patched cat, but not the unpatched one.
- Entry with casPatchType (set to 2), also contains baseSha1 and deltaSha1:
- sha1: 3C9AAEA1E3FA1117ED2CD458DC22E28003F6CFB3
- Does not exist in either cat.
- baseSha1: AFB0C31A2D05F331B6BD013E26F435C2EBC2CB68
- Exists in the unpatched cat.
- deltaSha1: 8614645E2C818F207A589F62E8B40752DF1E8F84
- Exists in the patched cat.
- I don't care about the other keywords as they don't seem essential for extraction.
- Well, this is confusing. In the patched noncas bundles there was metadata at the beginning of the bundle
- which told me where to use pieces of the unpatched bundle and where to insert the patched data.
- This bundle here however does not have any metadata like that. Additionally, I don't have anything to
- work with except for the sha1s. So in a way it should probably not be that hard to deal with this.
- if casPatchType is 0 (or not specified):
- Grab the payload from the unpatched cat.
- if casPatchType is 1:
- Grab the payload from the patched cat.
- if casPatchType is 2:
- No clue.
- Look at strings to figure this one out.
- The bundle name is win32/persistence/unlocks/soldiers/visual/mp/ch/camo05/ch_assault_mp_appearance_camo05_bpb
- which also exists in the unpatched sb. Grab the bundle from the sb so I have both the patched and unpatched bundle
- to compare directly.
- The patched casPatchType 2 entry has Name: persistence/unlocks/soldiers/visual/mp/ch/camo05/ch_assault_mp_appearance_camo05_bpb/meshvariationdb_win32
- For whatever reason, this string appears twice (?!) in the unpatched bundle, with different sha1s though.
- Note, a bundle is always read in its entirety, so I have absolutely no idea what is going on.
- sha1: AFB0C31A2D05F331B6BD013E26F435C2EBC2CB68
- sha1: 19A0C8033C1264587F3E30306CB89C0329CE1524
- Well, ignore that for now, though keep it in mind; I hadn't even considered the possibility that a single bundle contains two
- files with different content but the same name. Eventually I'll have to have the script compare the sha1 for every file
- while dumping to make sure that both files are extracted. Err, I'd rather not think about the performance hit and the implementation details.
- The first sha1 is used as baseSha1, so retrieve the 2 payloads from cascat. It's an ebx file.
- I suspect that the deltaSha1 in the patched cascat does not refer to an actual file, but instead gives me metadata to cut and glue together pieces
- from the unpatched/patched files to obtain the actual file (which then has the sha1 as specified).
- The delta file has just 76 bytes. It does look like guids that replace the original ones.
- So the challenge is to distinguish metadata from actual data that is to be inserted and make sure
- that the resulting sha1 is 3C9AAEA1E3FA1117ED2CD458DC22E28003F6CFB3.
- delta file:
- 20000046 0D2F0180 424290E0 C8AA785A
- E2119BEB DAE5903E 29166F54 408804F6
- 823A942C 8D62362D 11F07A76 CE54AB76
- E211BE13 C8D7C07E 9A021BBE CE0140B7
- 3092179E B63C2010 5C3F6414
- The 0046 is pretty close to the number of bytes that remain in the file when counting
- from after 0046, namely 48. So this might incicate the number of bytes to copy.
- Now, I suspect these are guids, and parts of the guids might remain the same even in the patched version.
- Search for E2119BEB DAE5903E.
- Yup, got a hit at c0 in the unpatched file. In fact 785A which comes right before is part of the guid too.
- 76E211BE13C8D7C07E9A02 appears at 3ec.
- So somehow the file tells me to move several hundred bytes in between.
- Rearrange a bit:
- 2000 0046
- 0D2F01804242
- 90E0C8AA #unpatched: 157 to 15b; maybe random, but unlikely
- 785AE2119BEBDAE5903E #unpatched: BE to C8
- 2916
- 6F54408804F6823A942C8D62362D11F0 #unpatched: 15F to 16F
- 7A76CE54AB
- 76E211BE13C8D7C07E9A02 #unpatched: 3ec to 3f7
- 1BBECE0140B73092179EB63C20105C3F6414
- Meh, I don't get it. So what do you need if you want to patch stuff?
- I would expect something like this:
- Offset and size of unpatched data
- Size of bytes to copy from the delta file
- Grab pieces of the unpatched data and put delta pieces in between.
- Though it might also work with a relative offset.
- Alright, as interesting it would be to solve this with just a single file, take
- a look at another file too.
- Delta starts with 2000005C, and similiar to before 5C is exactly the number
- of bytes coming after that 5C, minus 2.
- So I suppose that 2000 is some kind of magic without deeper meaning.
- And then there are 4 bytes, the first two being the size of the delta file minus the metadata.
- The third and fourth byte are then still part of the metadata with an unknown purpose.
- The second half of the sixth byte is always F it seems.
- Well, I do know the sha1 that I will get in the end.
- Assume that the delta file (which is very small) somehow described the replacement of a single
- piece in the unpatched file. As I'm not sure where the metadata is and where the replacement payload starts,
- just try all possibilities and check the sha1. Replace a slice of the unpatched file with a slice of the
- delta file:
- f=open("delta","rb")
- delta=list(f.read())
- f.close()
- f=open("actualfile","rb")
- data=list(f.read())
- f.close()
- import copy
- import hashlib
- from binascii import hexlify
- for dataPos in xrange(len(data)):
- for deltaPos in xrange(len(delta)):
- for deltaSize in xrange(len(delta)-deltaPos):
- data2=copy.deepcopy(data)
- data2[dataPos:dataPos+deltaSize+1]=delta[deltaPos:deltaPos+deltaSize+1]
- if hashlib.sha1("".join(data2)).digest()=='<\x9a\xae\xa1\xe3\xfa\x11\x17\xed,\xd4X\xdc"\xe2\x80\x03\xf6\xcf\xb3':
- asdf
- print dataPos
- Meh, does not work.
- Retrieve all delta files and corresponding unpatched files.
- First of all, confirm that casPatchType 0 always relies the unpatched cat
- and casPatchType 1 always relies on the patched cat
- def casHandlePayload(entry,outPath): #this version searches the patched cat first
- if os.path.exists(lp(outPath)): return #don't overwrite existing files to speed up things
- ## print outPath
- sha1=entry.elems["sha1"].content
- try:
- patchType=entry.elems["casPatchType"].content
- except:
- patchType=0
- if patchType==0:
- if "baseSha1" in entry.elems: asdf
- if "deltaSha1" in entry.elems: asdf
- if sha1 not in cat.entries: asdf
- if sha1 in cat2.entries: asdf
- elif patchType==1:
- if "baseSha1" in entry.elems: asdf
- if "deltaSha1" in entry.elems: asdf
- if sha1 in cat.entries: asdf
- if sha1 not in cat2.entries: asdf
- elif patchType==2:
- if "baseSha1" not in entry.elems: asdf
- if "deltaSha1" not in entry.elems: asdf
- baseSha1=entry.elems["baseSha1"].content
- deltaSha1=entry.elems["deltaSha1"].content
- if baseSha1 not in cat.entries: asdf
- if deltaSha1 not in cat2.entries: asdf
- Oh great, this fails. For some reason with casPatchType 1 the sha1 is found in the unpatched cat but not the patched one.
- More precisely, for casPatchType 1 the sha1 may be found in either the unpatched or the patched cat. Will have to
- do the usual approach; try the patched first, then fall back if necessary.
- The other two types work as expected though. Type 0 always has the sha1 in the unpatched cat.
- Type 2 has the base in the unpatched cat and the delta in the patched cat.
- Also: baseSha1/deltaSha1 <=> patchType 2
- Alright, ignore type 0 and 1 for now. Retrieve the deltas and bases.
- def casHandlePayload(entry,outPath): #this version searches the patched cat first
- if os.path.exists(lp(outPath)): return #don't overwrite existing files to speed up things
- ## print outPath
- sha1=entry.elems["sha1"].content
- try:
- patchType=entry.elems["casPatchType"].content
- except:
- patchType=0
- if patchType==2:
- baseSha1=entry.elems["baseSha1"].content
- deltaSha1=entry.elems["deltaSha1"].content
- deltaEntry=cat2.entries[deltaSha1]
- baseEntry=cat.entries[baseSha1]
- deltaPath=outPath+" delta"
- basePath=outPath+" base"
- out=open2(deltaPath,"wb")
- out.write(sha1) #write the sha1 in the beginning of the delta file for convenience
- out.write(cat2.grabPayload(deltaEntry))
- out.close()
- out=open2(basePath,"wb")
- out.write(cat.grabPayload(baseEntry))
- out.close()
- Alright, 5140 files. So over 2500 base-delta pairs to work with.
- The shortest delta file (shaderdatabase_win32.shaderdatabase) is just 0a bytes payload:
- 20000004 203F1974 3000 (this is the entire file)
- Once again, the total size after the first 4 bytes is given by 0004 + 2.
- Once again, the second half of the sixth byte is F.
- Run the script over the base and delta to see if I get a matching sha1.
- The smaller the delta the less likely that there are several substitutions.
- That is of course assuming that this is what actually happens.
- Meh, nothing.
- mainmenuscreen.ebx:
- delta: 2000000C 014F0140 08087BBD 728586FA 25F2
- Total size after first 4 bytes: 000c + 2
- Second half of sixths byte: F
- Hmm well. This does look similar to the LZ77 algorithm.
- Proceed length is given by c. And the final two bytes are the offset in little endian?
- mainmenuscreen is a very small file though, so it can't be an offset.
- Maybe the sha1s are for the decompressed files (although it was different in bf3).
- Nope, just checked that and the sha1 is always taken from the compressed file.
- Try something different with the algorithm (without that odd list thing):
- f=open("mainmenuscreen.ebx delta","rb")
- sha1=f.read(20)
- delta=f.read()
- f.close()
- f=open("mainmenuscreen.ebx base","rb")
- data=f.read()
- f.close()
- import hashlib
- from binascii import hexlify
- for dataPos in xrange(len(data)):
- for deltaPos in xrange(len(delta)):
- for deltaSize in xrange(len(delta)-deltaPos):
- data2=data[:dataPos]+delta[deltaPos:deltaPos+deltaSize+1]+data[dataPos+deltaSize+1:]
- if hashlib.sha1(data2).digest()==sha1:
- print dataPos, deltaPos, deltaSize
- print dataPos
- Applied to mainmenuscreen:
- 259 10 7
- Traceback (most recent call last):
- File "D:\hexing\bf4 test\Neuer Ordner\trial.py", line 17, in <module>
- asdf
- NameError: name 'asdf' is not defined
- Got it. Now I know the resulting file. I suppose I just messed up the previous script.
- So at position 259 in the unpatched file, I replace 8 (7+1) bytes with bytes from the delta file.
- The delta file is:
- 2000000C 014F0140 0808 7BBD728586FA25F2
- And the 8 bytes on the right are the ones that were substituted.
- Phew, so the replacement bytes are at the very end.
- 259 in hex is 103, can't find that anywhere in the delta though.
- The two 08s are related to the number of bytes to replace I suppose.
- Dataversion.ebx has a rather small delta file too, just 1b bytes. However,
- apparently at least two things are substituted here because I can't find the sha1.
- So I possibly didn't fail with the script before but the files were just bad.
- Find more suitable files (with small bases).
- fontcollection_ja_fontlibwin32.ebx:
- 260 10 7
- delta: 2000000C 015F0150 0808 8D46479E3AD65FDD
- Very neat, this delta is the same as the one before but differs only
- by exactly one in the offset. As a result it says 15 twice instead
- of 14. Note that a number is stored in 1.5 bytes apparently.
- 01 5F, but the number is 15.
- That looks awfully redundant. Both the offset and the replacement
- bytes are specified twice. I think there's more to it though:
- The second offset/size could be there to say where to go on after
- placing the delta payload. In the case of these simple examples, a
- number of bytes is substituted, so the two numbers are identical.
- Will need further investigation of course.
- actionscriptlibrary.ebx:
- 287 10 7
- 2000000C 017F0170 0808 3D9AC231DD6F15FE
- campaignmissionsscreen.ebx:
- 270 10 7
- 2000000C 015F0150 0808 B34AD4896B370A2C
- campaignmissionsscreen and fontcollection_ja_fontlibwin32 specify
- identical delta values, but the actual offset in the files is different.
- mainmenuscreenpc.ebx:
- 263 10 7
- 2000000C 014F0140 0808 D7647AE7C8329AD3
- Hm... there's no offset specified somehow.
- But there is a number that appears twice. I just don't know its purpose.
- Well, one way or another this number has to tell me the offset. There are no other bytes left.
- Have the script also me the offset counting from the end instead of the start. Also fix it so
- it gives the right number of bytes that are copied (deltaSize+1):
- for dataPos in xrange(len(data)):
- for deltaPos in xrange(len(delta)):
- for deltaSize in xrange(len(delta)-deltaPos):
- data2=data[:dataPos]+delta[deltaPos:deltaPos+deltaSize+1]+data[dataPos+deltaSize+1:]
- if hashlib.sha1(data2).digest()==sha1:
- print dataPos, deltaPos, deltaSize+1, len(data)-dataPos
- asdf
- print dataPos
- mainmenuscreenpc.ebx:
- presumed offset bytes: 014F0140
- 263 10 8 16
- mainmenuscreen.ebx:
- presumed offset bytes: 014F0140
- 259 10 8 16
- Looking good so far.
- fontcollection_ja_fontlibwin32.ebx:
- presumed offset bytes: 015F0150
- 260 10 8 16
- Nope. Not good at all.
- actionscriptlibrary.ebx:
- presumed offset bytes: 017F0170
- 287 10 8 16
- 16 everywhere? Is the script doing something wrong?
- Manually substituting the bytes does yield the wanted sha1 though.
- Odd.
- I need some files that differ somehow yet make just one substitution.
- ultimax_antstate_chunk.ebx:
- 306 14 16 28
- 20000024 019F0180 2020 00000000 F6FF267B5E455B0EB41D0644812E0DE7 5C9B01000000000000000000
- The substitute part is F6FF267B5E455B0EB41D0644812E0DE7.
- ss_stones_01.ebx:
- 319 10 1 61
- 20000005 01BF0177 0101 73
- nogadget2.ebx:
- 579 10 1 66
- 20000005 035F0301 0101 43
- movietexture_shader_sp_prologue_gasexplosion:
- 20000005 01CF0178 0101 73
- No sha1 match which is odd because it looks so similar to the previous ones.
- 263 10 8 16
- 2000000c 014f0140 0808 d7647ae7c8329ad3
- 259 10 8 16
- 2000000c 014f0140 0808 7bbd728586fa25f2
- 260 10 8 16
- 2000000c 015f0150 0808 8d46479e3ad65fdd
- 313 10 8 16
- 2000000c 019f0190 0808 178d137e63801f41
- 289 10 8 16
- 2000000c 017f0170 0808 e5139873617cd410
- Mmmh. ultimax_antstate_chunk.ebx again:
- 306 14 16 28
- 20000024 019F0180 2020 00000000 F6FF267B5E455B0EB41D0644812E0DE7 5C9B01000000000000000000
- The substitute part is F6FF267B5E455B0EB41D0644812E0DE7.
- In fact the substitute could also include all the bytes at the end too.
- It's strange, those bytes are the same before and after applying the delta.
- The 2020 refer to all bytes to the right of them, including 4 nulls. Those nulls however
- are not substituted.
- While it will be horribly slow, use that sha1 script within the dumper script.
- Automatically dump all files that have just a single substitution.
- def validateSha1(data,delta,sha1):
- t0=time()
- for dataPos in xrange(len(data)):
- for deltaPos in xrange(len(delta)):
- for deltaSize in xrange(len(delta)-deltaPos):
- data2=data[:dataPos]+delta[deltaPos:deltaPos+deltaSize+1]+data[dataPos+deltaSize+1:]
- if hashlib.sha1(data2).digest()==sha1:
- return [dataPos, deltaPos, deltaSize+1, len(data)-dataPos]
- if time()-t0>30: asdf
- asdf
- Limit the time spent with each file to 30 seconds.
- Also don't consider delta files larger than 50 bytes.
- It should take just a few hours to go through all files.
- 1235 0A 04 37
- 20000008 1DAF 1D6C 0404 94B5AA0E
- 11F3 0A 04 04
- 20000008 1D2F 1D2C 0404 ACF453A8
- jet_mfd_q5_mesh.ebx:
- 2000000C 0D0F 0C20 0808 5158166D711EB911
- Decompressed base size: d10
- So 0d0f is exactly one less than the base size. Either way, this number
- contains no useful information.
- Let me get this straight. The delta file contains offsets in the decompressed file.
- But the sha1 that is calculated is the one you get when compressing such a patched file.
- Well, that makes zero sense as it is a waste of computational power to first decompress
- a file, then patch it, then compress it to confirm its sha1 is correct. So I can only assume
- that once again the sha1 is not checked at all. Ahh, never mind that. Both the base and
- delta have a sha1 so there's no way to bypass that anyway. I suppose the resulting sha1
- serves no real purpose.
- Problem is, I don't have the compression algorithm so I cannot directly compare the
- sha1. However, I can do the usual sanity checks on the ebx which should suffice to
- deal with this.
- Oh. The system is really obvious with the decompressed files.
- ultimax_antstate_chunk.ebx again:
- 306 14 16 28
- 20000024 019F 0180 2020 00000000F6FF267B5E455B0EB41D0644812E0DE75C9B01000000000000000000
- The substitute part are the 0x20 bytes at the end.
- These bytes are inserted at 0x180 in the decompressed file.
- The decompressed file is then read until 19f+1, i.e. till the end of file.
- ak5c_gunsway.ebx:
- Has several substitutions.
- Entire delta file:
- 2000 002A 12BF
- 0E38 0303 0AD7A3
- 0EC4 0303 0AD7A3
- 0F10 0303 0AD7A3
- 0F5C 0303 0AD7A3
- 0FA8 0303 0AD7A3
- 0FF4 0303 0AD7A3
- Header:
- 2 (or 1) bytes: Delta type
- 2 (or 3) bytes: Size without header
- 2 bytes: Total decompressed file size-1 (at least for file size <0x10000); patched/unpatched file? Don't know yet.
- while current position in file < size without header:
- 2 bytes: Replacement offset (move there in the base file)
- 1 byte: Number of replacement bytes (in the delta file)
- 1 byte: Number of bytes to replace in the base file
- x bytes: The payload to replace.
- The two single bytes are just guesses at the moment. It could be similar to
- LZ77 compression so maybe it's possible to e.g. make the second single byte
- twice as large as the first. Thus the delta payload is read just once, but
- applied twice to the base file. That's my idea so far anyway.
- With that I should be able to handle at least some of the simpler files.
- I do recall seeing a few files with a non 2000 type. Grab them.
- import os
- import sys
- for dir0, dirs,ff in os.walk(sys.path[0]):
- for filename in ff:
- if filename[-5:]!="delta": continue
- f=open(dir0+"\\"+filename,"rb")
- f.read(16)
- typ=f.read(2)
- f.close()
- if typ!="\x20\x00":
- print filename
- This yields:
- m224_pda_mesh.ebx delta
- layer0_cinematic.ebx delta
- layer1_ui_schematic.ebx delta
- layer18_shipwreck.ebx delta
- layer1_ui_schematic.ebx delta
- hotel_wings_02_mesh.ebx delta
- twigs_01_mesh.ebx delta
- decal_plaster_02_mesh.ebx delta
- foresttree_l_01_rig_mesh.ebx delta
- foresttree_l_03_skin_mesh.ebx delta
- leaftree_full_l_01_mesh.ebx delta
- m224_pda_mesh.ebx:
- Substitute the last 9 bytes in the delta file.
- 10000001 0C80 0018 0000 0018 0970 000E
- 1A000100 90
- 80378B11A531D30298
- This one is rather different from the previous format.
- I think I could substitute the 9 bytes, then decompress the file,
- and get the offset after decompression. That should appear
- somewhere in this format I suppose.
- Compressed offset: 84c
- Decompressed offset: c8f
- Compressed base file size: 95c
- Decompressed base file size: f30
- 0C80 in the delta is pretty close to the decompressed offset.
- decal_plaster_02_mesh.ebx:
- 10000001 0C00 0018 0000 0018 0970 000E
- 1A000100 90
- 803D811935F2361F1E
- Almost the same as before.
- Compressed offset: 7e0
- Decompressed offset: c0f
- Compressed base file size: 85c
- Decompressed base file size: cd0
- Once again, subtract F from the decompressed offset to get 0c00 from the delta.
- Wait a sec.
- No. Please.
- 0970 here? So the delta is compressed?
- Not only that, it's compressed in such a way that it actually wastes space.
- Recall how compression works:
- Structure of a compressed block (big endian):
- 4 bytes: decompressed size (0x10000 or less)
- 2 bytes: compression type (0970 for LZ77, 0071 for uncompressed data)
- 2 bytes: compressed size (0000 for uncompressed data) of the payload (i.e. without the header)
- compressed payload
- Decompress each block and glue the decompressed parts together to obtain the file.
- The compression is an LZ77 variant. It requires 3 parameters:
- Copy offset: Move backwards by this amount of bytes and start copying a certain number of bytes following that position.
- Copy length: How many bytes to copy. If the length is larger than the offset, start at the offset again and copy the same values again.
- Proceed length: The number of bytes that were not compressed and can be read directly.
- Note that the offset is defined in regards to the already decompressed data which e.g. does not contain any compression metadata.
- The three values are split up however; while the copy length and proceed length are
- stated together in a single byte, before an uncompressed section, the relevant offset
- is given after the uncompressed section:
- Use the proceed length to read the uncompressed data, at which point you arrive at the start of the offset value.
- Read this value, then move to the offset and copy a number of bytes (given by copy length)
- to the decompressed data. Afterwards, the next copy and proceed length are given and the process starts anew.
- The offset has a constant size of 2 bytes, in little endian.
- The two lengths share the same byte. The first half of the byte belongs to the proceed length,
- whereas the second half belongs to the copy length.
- So the previous file is something like this:
- 10000001 0C00 0018
- Compression header:
- 00000018 decompressed size
- 0970 LZ77
- 000E compressed size
- 1A, proceed 1 bytes, copy A+4=E times
- 00, the payload to be copied E times
- 0100, the offset to move backwards to copy, namely the nullbyte
- 90, proceed 9 bytes (till the end of the file)
- 803D811935F2361F1E, just add this to the decompressed payload
- So the decompressed payload is (1 null, clone E times, then 9 bytes):
- 000000000000000000000000000000803D811935F2361F1E
- And thus the delta file becomes (after decompression):
- 1000 0001 0C00 0018
- 000000000000000000000000000000803D811935F2361F1E
- 0c00: offset in base file to substitute
- 0018: number of bytes to copy.
- Header (big endian):
- 2 (or 1) bytes: Delta type (2000 is uncompressed, 1000 is compressed)
- 2 (or 3) bytes: Size without header (set to 1 for compressed)
- 2 bytes (only if uncompressed): Total decompressed file size-1
- while current position in file < size without header:
- 2 bytes: Replacement offset (move there in the base file)
- 1 byte: Number of replacement bytes (in the delta file; 0 for compressed)
- 1 byte: Number of bytes to replace in the base file
- x bytes: The payload to replace.
- With that I should be able to handle the simple deltas without brute-forcing my way through them.
- But surely there are more types out there than just 10 and 20. It would be too easy otherwise.
- I can't find any though in those files that I have solved.
- Run through the bundles again and throw some errors (for the moment, only consider small deltas):
- if len(deltaData)>50: return
- deltaStream=StringIO(deltaData)
- deltaStream.seek(0)
- typ,deltaSize=unpack(">HH",deltaStream.read(4))
- if typ==0x1000:
- if deltaSize!=1: asdf
- elif typ==0x2000:
- totalDecompressed=unpack(">H",deltaStream.read(2))[0]
- if totalDecompressed+1!=entry.elems["originalSize"].content:
- asdf
- else:
- asdf
- Error with file 9k22_tunguska_m.ebx; totalDecompressed does not match originalSize.
- Delta:
- 2000 000C FFFF
- E926 0101 C0
- E948 0303 000000
- 2000 0017 7FBF
- 1BA4 0606 CDCC0C3F0000
- 56C2 0202 7042
- 56CC 0303 333333
- So this delta has two blocks within a single file.
- Each block can only handle FFFF+1=10000 bytes max.
- This should be easy to fix. However, ignore the error for now, and continue.
- Next error, lav_ad.ebx has type 0.
- Delta:
- 0000 0001
- 2000 0007 6F3F
- 1294 0303 E17A14
- Two random ideas what this is about.
- 1) A way to seek past several blocks without having to specify all the other crap.
- 2) Substitution in the compressed file instead.
- I know that exactly three bytes, E17A14, are substituted in the file.
- Just ask a script where exactly.
- No hits.
- Compressed size: BB86
- Decompressed size: 16F40
- The value of 6f3f is pretty far from the expected size.
- No clue what this is about, keep it in mind for later when everything else is done.
- Remove the restriction so larger deltas are okay too.
- venicesoldierinputconcepts.ebx is type 10 but has deltaSize!=1:
- 1000 0002 0004 0024 0000 0024 0070 0024
- So the value is 2 here. Problem is, that delta is 8kb so I can't manually analyze it.
- Keep that in mind too for later.
- Redefine the format a bit:
- All values in big endian.
- Header:
- 0.5 bytes: Delta type (2 is uncompressed, 1 is compressed, 0/3/4 possible too)
- 3.5 bytes: deltaSize, size without header (set to 1 for compressed)
- 2 bytes (only if uncompressed): Total decompressed file size-1
- while current position in file < size without header:
- 2 bytes: Replacement offset (move there in the base file)
- 1 byte: Number of replacement bytes (in the delta file; 0 for compressed)
- 1 byte: Number of bytes to replace in the base file
- x bytes: The payload to replace.
- Number of files depending on type:
- type 0: 15
- type 1: 1311
- type 2: 10075
- type 3: 12
- type 4: 245
- All the more reason to finish type 2.
- Hack something together:
- deltaStream=StringIO(deltaData)
- deltaStream.seek(0)
- EOF=len(deltaData)
- baseOffset=0
- while deltaStream.tell()< EOF:
- deltaPos0=deltaStream.tell()
- tmpSize=unpack(">I",deltaStream.read(4))[0]
- typ=tmpSize>>28
- deltaBlockSize=tmpSize&0xfffffff
- if typ!=2:
- if baseOffset!=0: asdf
- else: return
- baseBlockSize=unpack(">H",deltaStream.read(2))[0]+1 #add to baseOffset at the end
- while deltaStream.tell()-deltaPos0 < deltaBlockSize:
- offset=unpack(">H",deltaStream.read(2))[0]
- byte1,byte2=unpack("BB",deltaStream.read(2))
- if byte1!=byte2: asdf
- substitute=deltaStream.read(byte1) #not used yet
- multiplayerconsumableunlocksetup.ebx fails due to byte1!=byte2:
- 2 0000493 08CF
- 0008 0101 20
- 001C 0A0A D000000002000000F004
- 0286 0101 16
- 02A0 0101 16
- 030B 0600 036107000387 (not sure what's going on)
- 0300 048A FFFFA412 (offset went backwards, bad)
- CB06 3444 922CDD837D4910F8500000002C (offset exceeds file size, completely lost track of it)
- Try that again from the critical part.
- 030B 0600
- 0361 0700
- 0387 0300
- 048A FFFF
- A412CB063444922CDD837D4910F8500000002C
- Apparently it accumulates offsets and sizes and terminates with FFFF.
- Have it skip the file when something like this occurs.
- Error with dataversion.ebx:
- 2 0000015 01CF
- 0186 0002 4430
- 019E 0200
- 01B8 0202 4149
- 01C4 0101 31
- The second byte specifies the number of substitute bytes in the delta file.
- The first byte does... something. It certainly requires an offset.
- So I substite 2 bytes at that offset, but where do I take them from?
- Or do I just remove them? That could actually work. Ebx files
- always have a size that is a multiple of 16, so in the lines above
- I would add two bytes first, then remove two bytes later, so these cancel
- each other out.
- First byte: Remove this number of bytes at the offset
- Second byte: Place this number of bytes at the offset (read the bytes in the delta file)
- Still wondering why it went FFFF above. Or was it just the maximum amount of bytes that could be specified?
- Indeed, if I read FF bytes in the multiplayerconsumableunlocksetup delta, the next bytes say 0589 FFFF.
- Not sure how to code such a byte removal in an efficient manner. Well, there are more pressing matters
- right now anyway.
- Adjust the script a bit. Note that baseOffset is not increased yet, so the script returns as
- soon as the type is changed:
- while deltaStream.tell()< EOF:
- tmpSize=unpack(">I",deltaStream.read(4))[0]
- typ=tmpSize>>28
- deltaBlockSize=tmpSize&0xfffffff
- if typ!=2:
- if baseOffset!=0: asdf
- else: return
- baseBlockSize=unpack(">H",deltaStream.read(2))[0]+1 #add to baseOffset at the end
- deltaPos0=deltaStream.tell()
- while deltaStream.tell()-deltaPos0 < deltaBlockSize:
- offset=unpack(">H",deltaStream.read(2))[0]
- removeCount,addCount=unpack("BB",deltaStream.read(2))
- substitute=deltaStream.read(addCount) #not used yet
- Small issues (like most of the implementation) aside, the script should be able
- to extract 9612 files (those which use type 2 delta blocks only).
- The total number of casPatchType 2 files is 11658, so that's a fair share of files already.
- A few hundred files that start as type 2 have gone missing, so they must've changed the type later on.
- Now the final question is whether that delta file is applied sequentially or not.
- Removing a few bytes from the middle of the file, then shifting everything after the cut
- to the left does not seem viable.
- That should be easy to figure out though:
- if removeCount==255 and addCount==0:
- asdf
- The relevant delta bytes:
- feef ff 00
- ffee 12 00
- So in effect it removes all bytes from feef until ffee+12.
- Try the same the other way around:
- if removeCount==255 and addCount==0:
- asdf
- The relevant delta bytes:
- 83F1 00 FF *255 bytes*
- 83F1 00 FF *255 bytes*
- So the offsets do not adjust for that either.
- I'm not sure what this means in practice. It could be that
- the second operation puts the bytes before the bytes of the first operation.
- The opposite case seems just as plausible though.
- Well alright. Have a second stream and just put the bytes in it.
- Keep in mind to decompress the base file before applying the delta.
- Also, for some reason a compression type 0x70 appeared in the base files.
- So a patched bundle accesses an unpatched file which is never used in any
- unpatched bundle. Just handle 0x70 the same as 0x71, i.e. as uncompressed payload:
- Snippet:
- deltaEntry,deltaStream=cat2.getCas(deltaSha1)
- baseEntry,compressedBase=cat.getCas(baseSha1)
- baseStream=decompressLZ77(compressedBase,baseEntry.size)
- compressedBase.close()
- patchedStream=open2(outPath,"wb") #here be the new data
- baseOffset=0 #to handle the base offsets when delta contains more than one block
- while deltaStream.tell()-deltaEntry.offset < deltaEntry.size: #read one block
- tmpSize=unpack(">I",deltaStream.read(4))[0]
- typ=tmpSize>>28
- deltaBlockSize=tmpSize&0xfffffff
- if typ!=2:
- patchedStream.seek(0)
- patchedStream.write("abcd") #break the file magic so the ebx script does not get confused
- patchedStream.close() #this is actually better than deleting the file (I skip the file entirely if it already exists)
- return #todo
- baseBlockSize=unpack(">H",deltaStream.read(2))[0]+1 #has no other purpose than to be added to baseOffset after the loop
- deltaPos0=deltaStream.tell()
- while deltaStream.tell()-deltaPos0 < deltaBlockSize: #go through the individual changes described within one block
- offset=unpack(">H",deltaStream.read(2))[0]
- skipCount,addCount=unpack("BB",deltaStream.read(2))
- sizeUntilOffset=baseOffset+offset-baseStream.tell()
- patchedStream.write(baseStream.read(sizeUntilOffset))
- #skip the bytes, move to new position in the base stream and pretend the bytes were read
- baseStream.seek(skipCount,1)
- baseOffset+=skipCount
- #add the bytes
- patchedStream.write(deltaStream.read(addCount))
- baseOffset+=baseBlockSize
- #add the remaining bytes of the base
- patchedStream.write(baseStream.read())
- patchedStream.close()
- deltaStream.close()
- It's still really slow. The individual blocks in the LZ77 file and in the delta file suggest that I should
- decompress one block, then apply the delta on that block, then decompress the next block etc.
- This does require that the deltas are always synchronized with the base file though. I have
- added 1 to baseBlockSize to obtain the number of bytes in the decompressed file. Still, verify
- this for all files before going further. Though first of all, see if the ebx script can handle
- the files without errors.
- Fail at file c_marine_01.ebx:
- KeyError: 177537
- Somehow there are random bytes in the keyword section.
- Back to the beginning then.
- The corresponding delta:
- 2 0000028 075F #type 2, 28 bytes replaced, decompressed block is 75f+1 in total
- 00C0 20 00 #remove 20 bytes at c0; these are the same bytes that are added in the next step
- 0100 00 20 D66030DC07317746BAAC1CEEF8212E0F1B556D54D2A69448B3621769C50B227C #add these 20 bytes at 100
- In effect this should remove the guid pair D660... from the middle of the list of external guids and
- put it at the end of the list.
- I know for certain that the keyword section must start at 100 (in both the unpatched and patched file).
- The appropriate place to add the bytes is 100-20=e0. Though in fact, I don't even need
- the offset for a pure add operation. I just attach the bytes to the end of
- the patched stream. For some reason there are keywords at e0 and the guid pair at 100.
- Keep a separate counter for removed bytes and subtract it from the offset?
- This is starting to confuse me.
- So what does the script do exactly:
- 00c0 20 00:
- First of all, write all bytes until c0 to the patched stream.
- In the base stream, seek 20 bytes forwards.
- 0100 0020:
- Write all bytes until 100 to the patched stream (bad).
- Add the 20 bytes to the patched stream.
- Okaaay, with a new variable skipTotal it seems to work correctly:
- skipTotal=0
- while deltaStream.tell()-deltaPos0 < deltaBlockSize: #go through the individual changes described within one block
- offset=unpack(">H",deltaStream.read(2))[0]
- skipCount,addCount=unpack("BB",deltaStream.read(2))
- sizeUntilOffset=baseOffset+offset-baseStream.tell()-skipTotal
- patchedStream.write(baseStream.read(sizeUntilOffset))
- #skip the bytes, move to new position in the base stream and pretend the bytes were read
- baseStream.seek(skipCount,1)
- baseOffset+=skipCount
- #add the bytes
- patchedStream.write(deltaStream.read(addCount))
- skipTotal+=skipCount
- If there's an elegant solution to this, I can't see it right now.
- Not too bad now, but it fails at 9k22_tunguska_m.ebx. That file is more than 10000 bytes.
- Delta:
- 2 000000C FFFF
- E926 01 01 C0
- E948 03 03 000000
- 2 0000017 7FBF
- 1BA4 06 06 CDCC0C3F0000
- 56C2 02 02 7042
- 56CC 03 03 333333
- These are plain substitutions. I really need to check if the delta blocks always match compressed blocks.
- That should simplify things at least a bit.
- Well, that's interesting. The number of blocks always match, so one delta block means one compressed block.
- However, the size given by the delta is not always the block size.
- Grab the smallest delta file with size mismatch.
- mp_naval_networkregistry_win32.ebx:
- base decompressed block size: 6a0
- delta block size: 680
- delta:
- 2 0000023 067F #block size 67f+1?
- 0004 09 09 90050000 F000000022 #oh. substitute the metadata right after the magic
- 0160 20 00
- 05A0 01 01 22
- 0610 01 01 22
- 069C 04 04 00000000
- The metadata says that the new file size is 90050000+f0000000 = 590+f0 = 680.
- So the delta block size is the size the block must have in the end.
- Find a mismatch with at least two blocks and check the metadata again.
- sp_airfield delta:
- 2 0000103 FFD7
- 0008 01 01 40 #patch one byte of the payload size (from 70) to 40
- 9998 09 09 060400002F00000070
- 99AC 01 01 AC
- 99B8 01 01 B8
- 99C4 01 01 D0
- 99D0 04 04 D8740000
- 99DC 01 01 00
- 99E8 01 01 2C
- 99F4 01 01 34
- 9A00 01 01 50
- 9A0C 01 01 58
- 9A18 01 01 60
- 9A24 01 01 6C
- 9A30 01 01 84
- 9A3C 01 01 8C
- 9A48 01 01 A0
- 9A54 01 01 A8
- 9A60 01 01 B0
- 9A6C 01 01 C8
- 9A78 01 01 D0
- 9A84 04 04 D8A70000
- 9A90 02 02 E0A7
- 9A9C 02 02 F0A7
- 9AA8 01 01 80
- 9AB4 01 01 90
- 9AC0 02 02 D4BA
- 9ACC 04 04 D0BC0000
- 9AD8 02 02 D8BC
- 9AE4 02 02 E0BC
- 9AF0 01 01 00
- 9AFC 01 01 30
- 9B08 01 01 60
- 9B14 01 01 90
- 9B20 01 01 B4
- 9B2C 01 01 C8
- 9B38 02 02 D8BD
- 9B44 01 01 00
- 9B50 01 01 10
- 9B5C 02 02 D4C1
- 9B68 02 02 F8CA
- 9B74 01 01 2C
- 9B80 01 01 34
- 9B8C 01 01 48
- 9B98 01 01 54
- 9BA4 01 01 5C
- F270 01 01 06
- F74C 28 00
- 2 0000004 9B17
- 7794 08 00
- Meta section size: 9bb0 both before and after patching
- Payload section size: ff70 (before), ff40 (after)
- Total size: 19b20 (before), 19af0 (after)
- delta block sizes: FFD7+1 + 9B17+1 = 19af0
- I get the idea of how to implement this.
- However, I really want to make sure I can treat each block separately.
- That should increase performance and be much simpler to handle.
- Try to find a delta file with more than one block. For any block
- before the last one, check if its last delta operation has
- offset+skipCount >= baseBlockSize. That would indicate
- that the operation stretches over two blocks, but requires two
- entries, one for each block the operation resides in.
- multiplayertemplate.ebx matches this requirement.
- However, the next block is of type 3. Ignore that file for now.
- weaponsbundlesp/shaderdb.shaderdb:
- offset+skipCount = 65149
- baseBlockSize = 64632
- In fact, the offset alone is greater. Hmm, the requirement is wrong.
- I should compare offset+skipCount with the decompressed block size (in the base).
- At least, that's what I think.
- ind_servicebuilding_02_destruction_physics_win32:
- offset+skipCount = 65536
- decompressedBlockSize = 65536
- Delta:
- ...
- FFA4 5C 00
- 2 0002637 fedf
- 0000 3c ff
- I would prefer a delta that doesn't have an addCount.
- ch_fac_dv15_sp_player.ebx:
- 2 00015CB 764F
- 0000 3F 6F
- And these two are indeed the only files that satisfy the condition.
- Well, fuck it. The data is pretty conclusive anyway.
- It has offset 0 for both of these files, so immediately
- at the start of the block the script skips some bytes.
- If the offset was higher the script would first read some unpatched bytes.
- As it is however, with offset 0 it does not really matter if it's
- a pure skipCount or if there are some bytes added.
- More importantly, there is not a single time when the skip count
- exceeds the block size (it's equal in the two cases above).
- Therefore, rewrite the LZ77 decompression to yield single decompressed blocks.
- Still can't get the script to work correctly. Try to gather all remarkable features of
- the delta format an describe it once more:
- Special cases:
- Skipping more than ff bytes:
- feef ff 00
- ffee 12 00 #offset moves through the base payload
- => Do not read the bytes in the base file from feef to ffee+12 under any circumstances.
- Adding more than ff bytes:
- 83F1 00 FF *255 bytes*
- 83F1 00 FF *255 bytes* #the offset remains the same as no bytes of the base payload are read in between
- => When adding bytes only the base offset remains the same.
- General approach:
- Decompress one LZ77 base block and parse the corresponding delta block.
- Apply the delta on the decompressed base to obtain the patched block.
- Delta block structure:
- All values in big endian.
- Header:
- 0.5 bytes: Delta type (2 is uncompressed, 1 is compressed, 0/3/4 possible too)
- 3.5 bytes: Delta block size; size without header (set to 1 for compressed)
- 2 bytes (only if uncompressed): Final size of the patched block (must add 1 to get the actual value)
- while current position in the block < delta block size:
- 2 bytes: Base offset (add base data to the new patched file until reaching this offset)
- 1 byte: Skip count, do not read these bytes from the base file (but seek past these bytes)
- 1 byte: Add count
- *The bytes to add, given by add count*
- 1) Read all bytes from the current position in the decompressed base block until the base offset is reached.
- 2) In the base block, (starting from the base offset) seek past the number of bytes given by skip count.
- 3) Add the add-bytes to the end of the patched stream.
- Finally, read more base bytes until the patched block has the size given by the header.
- Phew, that summarizing that was a great help, got it working right away and the ebx script could handle all extracted files.
- Snippet:
- patchedStream=open2(outPath,"wb") #write the new file in stream, might write directly too though.
- deltaStreamSize=0 #got to keep track of the number of bytes written to the stream
- for baseBlockStream in decompressLZ77(compressedStream,baseEntry.size):
- baseBlockStream.seek(0)
- tmpSize=unpack(">I",deltaStream.read(4))[0]
- typ=tmpSize>>28
- deltaBlockSize=tmpSize&0xfffffff
- if typ!=2:
- patchedStream.seek(0)
- patchedStream.write("abcd") #break the file magic so the ebx script does not get confused
- patchedStream.close() #this is actually better than deleting the file (I skip the file entirely if it already exists)
- return #todo
- patchedBlockSize=unpack(">H",deltaStream.read(2))[0]+1 #usually equals the uncompressed base block, but not always
- deltaPos0=deltaStream.tell()
- currentPatchedBlockSize=0 #use this size to calculate the remaining base bytes to read (after the loop)
- while deltaStream.tell()-deltaPos0 < deltaBlockSize: #go through the individual changes described within the delta block
- #parse the delta
- offset=unpack(">H",deltaStream.read(2))[0]
- skipCount,addCount=unpack("BB",deltaStream.read(2))
- addBytes=deltaStream.read(addCount)
- sizeUntilOffset=offset-baseBlockStream.tell()
- patchedStream.write(baseBlockStream.read(sizeUntilOffset)) #write bytes that require no modification
- baseBlockStream.seek(skipCount,1) #seek past the skip bytes
- patchedStream.write(addBytes) #add the bytes
- currentPatchedBlockSize+=(sizeUntilOffset+addCount)
- #read as many bytes necessary until the patchedStream has the correct size
- patchedStream.write(baseBlockStream.read(patchedBlockSize-currentPatchedBlockSize))
- patchedStream.close()
- deltaStream.close()
- Type 1 delta (rankparams.ebx):
- 1 #type
- 0000001 #maybe number of compressed blocks?
- 3000 #decompressed base offset to substitute
- 12EB #substitute size, also given in the compression header
- Compressed block:
- Header:
- 000012EB 0970 10C0
- *10c0 bytes compressed payload*
- Alright, that's not too difficult to implement.
- Find a delta with more the presumed number of compressed blocks greater than one.
- venicesoldierinputconcepts.ebx (again):
- 1 0000002 0004 #replace at offset 4
- 0024 00000024 0070 0024 *24 bytes*
- 4BEA #replace at offset 4bea?
- 4846 000045F6 0970 1E53 *1e53 bytes*
- And the delta ends after that.
- But what's the purpose of 4846 here?
- First of all, the header is replaced (with uncompressed data even).
- So is the size of the file different?
- before: 5870 + 3bc0 = 9430
- after: 5810 + 39d0 = 91E0
- Great, so the delta must remove 250 bytes.
- Which is of course exactly the difference 4846-45f6.
- 4846 is the number of bytes till the end of file.
- I suppose then that 4846 specifies the number of bytes to skip
- in favor of the new bytes that are added.
- Type 1 delta (venicesoldierinputconcepts.ebx):
- 1 #type
- 0000002 #number of blocks
- for each block:
- 0004 #decompressed base offset
- 0024 #skip count
- Compression header:
- 00000024 0070 0024
- *0024 bytes compressed payload*
- Skip the specified bytes and use decompress the payload to use instead.
- battlepacks.ebx fails:
- For whatever reason, I end up 0c bytes before the end of the delta file.
- And that's just after reading one of two blocks.
- The last few bytes:
- B70C 001C 00000000 0000 0000
- I suppose it just tells me to skip these bytes and not add anything?
- For the compression, if type is 0, then return empty-handed.
- I bet the ebx script will fail anyway.
- levellistreport.ebx cannot be handled by the ebx script.
- In fact it seems to contain every line twice and it missing
- lots of stuff. It contains just a single type 1 delta block.
- 1 0000001
- 0000 02A0 000002A0 0970 01BE *01BE bytes compressed payload*
- Read till offset 0, i.e. read no bytes.
- Remove 2a0 bytes.
- Add the decompressed payload, which consists of 2a0 bytes.
- Thus, the entire file is replaced.
- So why does it fail so horribly? Simple, I forgot to read the rest of
- the block once the bytes are skipped and replaced.
- vehicleshed_medium_mesh fails. Some parts of the string section are
- right in the middle of the metadata. It's type 1 with 2 blocks within.
- Delta:
- 1 0000002
- 0004 005C 0000007C 0970 007B *7b bytes compressed payload*
- 0BAC 01E4 000002B4 0970 0193 *193 bytes compressed payload*
- End of file
- Just manually separate the pieces, decompress them and the base,
- then figure out what to do. Meh.
- So the first block changes the metadata and the size of the ebx.
- Before: Size = bc0+1d0 = d90
- After: Size = be0+2a0 = e80
- Size of the file that the script put together using the delta: ee0
- That's close, but not good enough.
- Also note how the first delta block skips 5c bytes, but adds 7c. That means
- that one guid pair (size exactly 20 bytes) is added. In fact, the new metadata
- confirms this too (the number of guid pairs is increased from 2 to 3).
- The first delta block is safe to apply.
- So err, create another file in the hex editor, then grab base 4 bytes.
- Then the 7c delta bytes.
- Then skip 5c base bytes starting from 4, so move to 60.
- Read base from 60 until 0bac.
- Add 2b4 delta bytes. AND WITH THIS I HAVE REACHED E80.
- Then skip 1e4 base bytes starting from bac, i.e. until D90 (which is the base EOF).
- That looks all fine to me, including the file I get when manually doing this.
- So why the heck did the script fail? Nvm, just a coding mistake. All fixed.
- I get 1951 ebx files out of 2069 and the ebx script can handle them all.
- Keep in mind that this number is without the duplicates throughout the different bundles.
- That leaves about 100 unique files containing type 0,3 or 4.
- Snippet:
- patchedStream=open2(outPath,"wb") #write the new file in stream, might write directly too though.
- deltaStreamSize=0 #got to keep track of the number of bytes written to the stream
- for baseBlockStream in decompressLZ77(compressedStream,baseEntry.size):
- baseBlockStream.seek(0)
- tmpSize=unpack(">I",deltaStream.read(4))[0]
- typ=tmpSize>>28
- deltaBlockSize=tmpSize&0xfffffff
- if typ==2:
- patchedBlockSize=unpack(">H",deltaStream.read(2))[0]+1 #usually equals the uncompressed base block, but not always
- deltaPos0=deltaStream.tell()
- currentPatchedBlockSize=0 #use this size to calculate the remaining base bytes to read (after the loop)
- while deltaStream.tell()-deltaPos0 < deltaBlockSize: #go through the individual changes described within the delta block
- #parse the delta
- offset=unpack(">H",deltaStream.read(2))[0]
- skipCount,addCount=unpack("BB",deltaStream.read(2))
- addBytes=deltaStream.read(addCount)
- sizeUntilOffset=offset-baseBlockStream.tell()
- patchedStream.write(baseBlockStream.read(sizeUntilOffset)) #write bytes that require no modification
- baseBlockStream.seek(skipCount,1) #seek past the skip bytes
- patchedStream.write(addBytes) #add the bytes
- currentPatchedBlockSize+=(sizeUntilOffset+addCount)
- #read as many bytes necessary until the patchedStream has the correct size
- patchedStream.write(baseBlockStream.read(patchedBlockSize-currentPatchedBlockSize))
- elif typ==1:
- for i in xrange(deltaBlockSize):
- offset,skipCount=unpack(">HH",deltaStream.read(4))
- addBytes=decompressLZ77Block(deltaStream).getvalue()
- sizeUntilOffset=offset-baseBlockStream.tell()
- patchedStream.write(baseBlockStream.read(sizeUntilOffset))
- patchedStream.write(addBytes)
- baseBlockStream.seek(skipCount,1)
- patchedStream.write(baseBlockStream.read())
- else:
- patchedStream.seek(0)
- patchedStream.write("abcd")
- patchedStream.close()
- return
- Next types:
- multiplayertemplate.ebx has type 3 (a bit further down the delta file).
- 2 0001590 FFFF
- ...
- 3 0000001
- 00000210 0970 0098 *98 bytes*
- 1 0000002
- 0D80 0039 00000039 0970 0021 *21 bytes*
- A854 408C 0000435C 0970 1030 *1030 bytes*
- EOF
- Well isn't that grand. The base file has only two blocks while
- the delta has three. Now, should I revert the script to a former
- version and deal with the offset madness? For the moment I want
- to hope that the new type 3 might explain everything.
- To be honest I think the type 1 (compressed) does rely
- on the block separation. Without each block that type wouldn't
- know how many bytes to read at the end. Or put another way,
- that type always expects me to read until the end of the block.
- Let's ask other type 3 blocks about their opinion.
- rhib.ebx:
- 3 0000001 00000060 0970 003b *3b bytes*
- 00000001
- EOF
- quadbike:
- 3 0000001 00000060 0970 005b *5b bytes*
- 00000001
- EOF
- ch_fav_lyt2021:
- 3 0000001 00000060 0970 0032 *32 bytes*
- 00000001
- EOF
- I can see a pattern there, lol.
- vdv_buggy:
- 3 00000001 ...
- 2 0000007 5aaf 01e4 03 03 000001
- EOF
- Type 2 comes after a type 3. Oh man...
- But the EOF always seems to come soon after.
- Maybe type 0 or 4 are easier to understand.
- weaponstatcategories has type 4.
- Three meshvariationdb_win32 files too (mp_flooded/content, mp_thedish/content, sp_dam/citybridge).
- weaponstatcategories:
- 2 ...
- 4 0000001 3000 0001 0000CC00 0970 A4D3 *a4d3 bytes*
- EOF
- The file also contains two blocks like the delta, phew.
- So type 4 defines two shorts and then has the compressed payload.
- mp_thedish/content:
- 2 ...
- 4 0000001 3000 0001 00006629 0970 33E6 *33e6 bytes*
- EOF
- Same behaviour.
- This type certainly looks more tolerable than type 3, so it
- should not be too hard to handle it.
- weaponstatcategories:
- Have the script apply all those little changes from type 2 already.
- I.e. I have the first block all patched, but the second block missing
- entirely at the moment.
- Total expected patched size (according to header): 34d0+18b30 = 1C000
- Size of the decompressed delta: CC00
- Size of the patched first block: f400
- cc00+f400 = 1c000. So apparently I do not need to do anything, just
- replace the entire block with the new payload.
- Oh. It's actually a prefix. It's 4 bytes of type 4, prefixing type 3.
- But maybe type 0 is easier to understand?
- mp_resort/content:
- 2 000000C FFFF
- C640 08 08 F977E76551DEC342
- 0 0000002
- EOF
- The file has three base blocks. I suspect that type 0 simply means
- that these blocks remain the same.
- In fact I can't even verify this either way. The delta simply
- replaces 8 bytes in the payload section (some number or guid that
- the ebx script certainly will not complain about) and the rest
- must remain the same or the file size would not match the metadata
- (not to mention, end abruptly).
- Still, it makes the most sense to me, plus it confirms that it
- is mandatory to deal with each block individually.
- However, so far I've expected one delta block for each base block,
- which is not the case. The above example in particular shows that
- I must loop over the delta blocks and not the base blocks.
- When iterating over the base blocks I can't correctly perform
- the 0 0000002 instruction above (unless I add some more variables).
- 3 types done, 2 to go.
- Now, about the number of blocks.
- Recall that multiplayertemplate showed that type 3 acts independently of the base blocks:
- 2 0001590 FFFF
- ...
- 3 0000001
- 00000210 0970 0098 *98 bytes*
- 1 0000002
- 0D80 0039 00000039 0970 0021 *21 bytes*
- A854 408C 0000435C 0970 1030 *1030 bytes*
- EOF
- with 2 base blocks, but three delta blocks. Similarly, weaponstatcategories had two base
- blocks, and the type 3 delta is probably once again independent of the base blocks:
- 2 ...
- 4 0000001
- 3 0000001 0000CC00 0970 A4D3 *a4d3 bytes*
- EOF
- As type 0 keeps one block unchanged, I think type 4 could be about skipping one block entirely.
- Type 3 seems to just insert new payload independently of base blocks.
- Give that a shot. Though, what happens when type 3 has a number different from 1? Does that mean
- several compressed pieces because a single piece would exceed 10000 bytes?
- mp_prison/materialgrid_win32 has such a case:
- 2 ...
- 3 0000002
- 00010000 0970 581B ... #yup, 10000 right there
- 000051D7 0970 23B8 ...
- 1 0000002
- 0000 9A64 #read till offset 0, skip 9a64 bytes
- 00000000 0000 0000 #so this should return an empty string?
- 9B1C 64E4
- 00005851 0970 3DD8 ... #ordinary compression
- 4 0000001 #skip one block?
- 3 0000001
- 00008324 0970 574B ... #use this block instead
- 4 0000002 #skip two blocks
- 3 0000001 00004C89 0970 231C ... #use this block
- 3 0000003 00010000 0970 5B3B ... #use these blocks. But why isn't there just one type 3 assignment with 4 blocks?
- 4 0000001 #skip a block
- EOF
- The base file has 6 blocks.
- Delta: type2, type1, type4, type4(*2),type4 => 6 blocks
- So what's up with that odd type 0 compression?
- It's been there with battlepacks.ebx too:
- 2 ...
- 1 0000002
- 338B 002E
- 00000029 0070 0029 ...
- A7F1 16A4
- 0000166C 0970 0B9B ...
- 1 0000002
- 42F1 65EC
- 000064F0 0970 5872 ...
- B70C 001C
- 00000000 0000 0000
- EOF
- Alright, returning an empty string makes sense. I've also just realized that compression
- type 70 apparently uses compressedSize=decompressedSize, whereas type 71
- uses compressedSize=0 (note that both types have uncompressed payload).
- The ebx script seems to run fine over all files:
- >>> correct
- [68160, 3494, 36387]
- >>> incorrect
- [0, 0, 0]
- Well those were two excellent guesses about these last two types I suppose. 2069 files in total now.
- Enable the other casPatchTypes again.
- Recall that:
- casPatchType 0 has the sha1 in the unpatched cat.
- casPatchType 1 may have the sha1 in either the unpatched or the patched cat.
- casPatchType 2 has the base in the unpatched cat and the delta in the patched cat.
- Go for the safest approach and treat casPatchType 1 like casPatchType0, i.e. try patched first, then unpatched.
- So what about base and delta. Can they appear together?
- Number of files for each combination:
- +base +delta => 0 files
- +base -delta => 955 files
- -base +delta => 686 files
- -base -delta => 0 files
- And for the unpatched files there are only files without base and delta.
- So there are three different cases. As I have different functions
- depending on whether or not there's "Update" in the tocRoot, I can
- at ignore the unpatched part.
- Put it all together, and it fails miserably after a few thousand files.
- sbtoc with error: MpCharacter
- It can read the toc file without issues. Apparently the sb causes some trouble.
- It fails at a base/nondelta bundle. Ah. I forgot that base requires me to
- open up the unpatched sb (in the unpatched folder).
- Hack something together to get the unpatched path. The patched noncas required that too,
- so maybe I can grab the lines from there. Meh, it was different there.
- Unpatched path is: ...\bf4\Data\Win32
- Patched path is: ...\bf4\Update\Patch\Data\Win32
- Something like this should work:
- unpatchedPath=toc.fullpath.replace(r"patched\bf4\Update\Patch\Data\Win32",r"bf4\Data\Win32")
- Alright, I got a working version.
- Time for a format description:
- The sbtoc (superbundle/table of contents) format has a new way of handling patched cascat files.
- Previously, the sbtoc did not contain any patch specific info. The most sensible approach was to
- check if the archives were located in the Update folder and use the patched cascat in that case.
- Now however the sbtoc contains metadata to handle a rather wide range of different types of patches.
- Unchanged from the previous format is the cas flag (found in the toc) which states that all bundles
- inside the sb corresponding to the toc have their assets stored in the cascat archives.
- If the flag does not exist or is set to false, the assets are instead directly stored in the sb file.
- For a cas-enabled sbtoc, the toc does now give additional metadata for every single bundle.
- Each patched bundle may have either a base or delta flag.
- If the base flag is set (for the bundle), then the entire bundle does not require any patching. Note that the game
- apparently relies on the patched files only, which then make references back to the unpatched files.
- The sbtoc in the Update folder contain all the necessary info to retrieve files from the unpatched archives.
- If the delta flag is set (for the bundle), then a casPatchType is specified for each
- file within the bundle, which may take one of three values:
- casPatchType 0 has the sha1 in the unpatched cat.
- casPatchType 1 may have the sha1 in either the unpatched or the patched cat.
- casPatchType 2 has the base in the unpatched cat and the delta in the patched cat.
- If the type is not specified, assume type 0.
- casPatchType 2 defines two more variables, baseSha1 and deltaSha1, which specify the
- sha1s of the unpatched (base) file in the unpatched cascat and of the delta file
- in the patched cascat. The delta file contains the info to patch the base file.
- Note that an ordinary (third) sha1 is still specified. That sha1 belongs to the compressed
- patched file. This is rather odd because the patching process is applied to the
- decompressed file. As both the base sha1 and delta sha1 are given though, there is
- no way to bypass file integrity checks (assuming there are any).
- casPatchType 2 in detail:
- The operations described in the delta file rely on the individual LZ77 blocks.
- It is not possible to decompress the base completely and then apply the patch,
- nor is it possible to apply the patch to the compressed base file.
- A typical delta file contains several blocks with no global header.
- The file is in big endian.
- Each block starts with:
- 0.5 bytes deltaType
- 3.5 bytes deltaBlockSize/blockCount
- The 3.5 bytes specify a blockCount for all deltaTypes except type 2 (which is the most common type).
- I've sorted the types by frequency of occurence.
- deltaType 2:
- (This type contains information about lots of small changes, usually less than ff bytes,
- which are applied to the base block to obtain the patched block.)
- In the base file, decompress a single block. Hereafter, when talking about
- the base block I mean the decompressed base block. The compression is of no importance.
- In the delta file, read 2 bytes: The expected size of the resulting patched block.
- Add 1 to obtain the actual size, so e.g. ffff becomes 10000.
- In the delta file, read all operations belonging to this delta block.
- Its size is given by deltaBlockSize (note that the previous bytes do not count towards the size).
- Each operation has this structure:
- First come 4 bytes:
- 2 bytes offset
- 1 byte skipCount
- 1 byte addCount
- Write parts of the base block to the patched file until
- the offset given above is reached in the base block.
- In the base block, proceed the position by skipCount bytes.
- Those bytes must not appear in the patched file.
- Read addCount bytes from the delta file and add them to the patched file.
- Finally, read as many bytes necessary from the base block
- until the patched file size equals the expected size (as read from the delta file).
- deltaType 1:
- (This type is contains information about larger changes made to the file.
- The delta itself contains LZ77 blocks which are decompressed and used to
- substitute large chunks of the base block.)
- In the base file, decompress a single block.
- Iterate blockCount times doing the following:
- Read 4 bytes from the delta file:
- 2 bytes offset
- 2 bytes skipCount
- Write parts of the base block to the patched file until
- the offset given above is reached in the base block.
- Read one LZ77 block in the delta file and add it to the patched file.
- In the base file, proceed the position by skipCount bytes.
- Those bytes must not appear in the patched file.
- Finally, add all remaining bytes of the base block to the patched file.
- deltaType 0:
- In the base file, decompress a number of blocks (given by blockCount)
- and add them to the patched file directly.
- deltaType 3:
- In the delta file, read a number of LZ77 blocks (given by blockCount) and add them to the patched file.
- This is the only operation which does not depend on the base file at all.
- deltaType 4:
- In the base file, skip this number of blocks entirely, i.e. move past them
- but do not add them to the patched file.
- The relevant casPatchType 2 section:
- else: #casPatchType == 2
- baseSha1=entry.elems["baseSha1"].content
- deltaSha1=entry.elems["deltaSha1"].content
- deltaEntry,deltaStream=cat2.getCas(deltaSha1)
- baseEntry,compressedStream=cat.getCas(baseSha1)
- patchedStream=open2(outPath,"wb") #write the new file directly
- while deltaStream.tell()-deltaEntry.offset<deltaEntry.size:
- tmpSize=unpack(">I",deltaStream.read(4))[0]
- deltaType=tmpSize>>28
- deltaBlockSize=tmpSize&0xfffffff #if type!=2, this number actually specifies repetitions. Type 2 is by far the most frequent type so the name is justified
- if deltaType==2:
- #small changes (usually less than FF bytes) within a single decompressed base block
- baseBlockStream=decompressLZ77BlockWrapper(compressedStream)
- patchedBlockSize=unpack(">H",deltaStream.read(2))[0]+1 #usually equals the uncompressed base block, but not always
- deltaPos0=deltaStream.tell()
- currentPatchedBlockSize=0 #use this size to calculate the remaining base bytes to read (after the loop)
- while deltaStream.tell()-deltaPos0 < deltaBlockSize: #go through the individual changes described within the delta block
- #parse the delta
- offset=unpack(">H",deltaStream.read(2))[0]
- skipCount,addCount=unpack("BB",deltaStream.read(2))
- addBytes=deltaStream.read(addCount)
- sizeUntilOffset=offset-baseBlockStream.tell()
- patchedStream.write(baseBlockStream.read(sizeUntilOffset)) #write bytes that require no modification
- baseBlockStream.seek(skipCount,1) #seek past the skip bytes
- patchedStream.write(addBytes) #add the bytes
- currentPatchedBlockSize+=(sizeUntilOffset+addCount)
- #read as many bytes necessary until the patchedStream has the correct size
- patchedStream.write(baseBlockStream.read(patchedBlockSize-currentPatchedBlockSize))
- elif deltaType==1:
- #similar to type 2 but with compressed delta payload and the replacement of larger sections
- baseBlockStream=decompressLZ77BlockWrapper(compressedStream)
- for i in xrange(deltaBlockSize):
- offset,skipCount=unpack(">HH",deltaStream.read(4))
- addBytes=decompressLZ77BlockWrapper(deltaStream).getvalue()
- sizeUntilOffset=offset-baseBlockStream.tell()
- patchedStream.write(baseBlockStream.read(sizeUntilOffset))
- patchedStream.write(addBytes)
- baseBlockStream.seek(skipCount,1)
- patchedStream.write(baseBlockStream.read())
- elif deltaType==4:
- #skip these base blocks entirely. (manually seeking should be faster than using the decompression script)
- for i in xrange(deltaBlockSize):
- decompressedSize, compressionType, compressedSize = unpack(">IHH",compressedStream.read(8))
- if compressionType in (0x70,0x71): compressedStream.seek(decompressedSize,1)
- elif compressionType==0x970: compressedStream.seek(compressedSize,1)
- elif compressionType==0: asdf
- elif deltaType==0:
- #read the base blocks entirely without modifying them
- for i in xrange(deltaBlockSize):
- patchedStream.write(decompressLZ77BlockWrapper(compressedStream).read())
- elif deltaType==3:
- #add payload in between base blocks. This is the only type that does not depend on the base file at all
- for i in xrange(deltaBlockSize):
- patchedStream.write(decompressLZ77BlockWrapper(deltaStream).read())
- else: asdf
- patchedStream.close()
- deltaStream.close()
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement