Untitled

Next up: The patched archives don't work.
As I already have the retail files I might as well work with those. I don't care about the beta.
Try to dump the patched retail files.

Error 1:
	Traceback (most recent call last):
	  File "D:\hexing\release tools\dumper.py", line 368, in <module>
		main()
	  File "D:\hexing\release tools\dumper.py", line 365, in main
		dump(fname,outputfolder)
	  File "D:\hexing\release tools\dumper.py", line 220, in dump
		casHandlePayload(entry,ebxPath+entry.elems["name"].content+".ebx")
	  File "D:\hexing\release tools\dumper.py", line 329, in casHandlePayload
		catEntry=cat.entries[entry.elems["sha1"].content]
	KeyError: '\xdf+\xdb\xc7(\xef\x18\xbe\xa9d\x16\x00TV\xebT\xfc\xd6\x1c\xb3'

So a cascat-dependant sb file asks the cat for the payload with the SHA1 as seen above. But the cat
can't find that SHA1. Well, which cat is it and what's the hexlified form of the SHA1?
	>>> hexlify('\xdf+\xdb\xc7(\xef\x18\xbe\xa9d\x16\x00TV\xebT\xfc\xd6\x1c\xb3')
	'df2bdbc728ef18bea96416005456eb54fcd61cb3'

So that's the sha1 that's apparently nowhere to be found in the cats. Have a look at the cats myself.
Yup, the sha1 is simply not there. Well, fuck.

The archive with this sha1 is Globals.toc/sb. The toc is encrypted and I can't find my unXOR.py right now.
Besides, the sb contains the sha1s etc. so take a look at it first.
Well, the sha1 is definitely there: http://i.imgur.com/SIwp5Yh.png
But I can't find it in the cats.

Cascat is redundant. The cas alone have all info necessary to recreate a cat.
Maybe the cat is broken or something, ask the cas instead:
	from struct import unpack
	for i in xrange(1,22):
		f=open("cas_"+(str(i) if i>9 else "0"+str(i))+".cas","rb")
		f.seek(0,2)
		EOF=f.tell()
		f.seek(0)
		while f.tell()<EOF:
			f.read(4)
			sha1=f.read(20)
			size=unpack("I",f.read(4))[0]
			f.read(4)
			if sha1=='\xdf+\xdb\xc7(\xef\x18\xbe\xa9d\x16\x00TV\xebT\xfc\xd6\x1c\xb3':
				print i
				asdf
			f.seek(size,1)
		f.close()

No hits, so the cas don't know about that sha1 either.
Whatever, just continue with the next file when it can't find the sha1.

Okay neat, it extracts about 1k files now.

Error 2:
	Traceback (most recent call last):
	  File "D:\hexing\release tools\dumper.py", line 379, in <module>
		main()
	  File "D:\hexing\release tools\dumper.py", line 376, in main
		dump(fname,outputfolder)
	  File "D:\hexing\release tools\dumper.py", line 220, in dump
		bundle=sbtoc.Entry(sb)
	  File "D:\hexing\release tools\sbtoc.py", line 90, in __init__
		raise Exception("Entry does not start with \x82 or (rare) \x87 byte. Position: "+str(toc.tell()))
	Exception: Entry does not start with ‚ or (rare) ‡ byte. Position: 22849


Occured in MpCharacter.toc/sb. That error was the issue with the patched beta files too.
Hm well, let's see if the unpatched files extract without complaining... done, 172k files and no issues.

Now back to the error, the toc says where to move in the sb file to read a single bundle.
Now, I have the script output the offset as given by the toc file. Interestingly, the very first
offset given by the toc is 22848. So it wants to read in the middle of the file. One would expect
the offset to be in the single digit region. Well, maybe the bundles are not ordered and the one with
the lower offset comes later on.

Ah this one has base = true while still being cas.

The patched archives have always been tricky. Sometimes it's necessary to cut pieces out
of the unpatched archives to obtain the patched files. Previously that was restricted to
patched non-cas files though.

Before going further, look through my script and recall what it does exactly. The archives
require very different handling depending on some flags in the toc, and I honestly don't
remember everything of it.

dumper.py:
	for each toc file:
		Read the toc in the superbundle format, then check if there is a cas flag.

		if cas is true (globally set by the toc):
			for each bundle metadata given in the toc entries:
				Go to the bundle offset in the sb.
				Read the bundle in a format similar to the superbundle format.

				I now know the name of every file, and its sha1.
				Ask the cat about the sha1, grab the payload and use the name
				to extract a file.

			for each chunk given in the toc entries:
				Just ask the cat directly and extract the file.

			#This approach used to work for both patched and unpatched cascat files.
			#For the patched files, ask the patched cat first, then fall back to the unpatched cat.

		if cas not true (globally set by the toc):
			for each bundle metadata given in the toc entries:
				Go to the bundle offset in the sb.

				if base is true (specified for each bundle):
					#patched non-cas bundle is the same as unpatched bundle
					Skip the process. This tells me to just use the unpatched
					bundle. As I expect the user to extract all unpatched files too,
					I can leave this one out.

				if base false, but delta true:
					#patched non-cas bundle
					In the patched sb, read some delta metadata.
					Then take the unpatched sb, but insert pieces of payload
					according to the metadata into the sb. The result is a
					valid bundle, which is then read.

				if base false, delta false:
					Just read the bundle.

				With the bundle parsed, use the info to extract the files.
				It's almost the same as before, but this time the payload
				is given in the sb directly.

The script does not check base or delta when dealing with cas sbtoc.

Structure of the toc in question:
	bundles
		entry
			id
			offset
			size
			base
		entry
			id
			offset
			size
			delta
	chunks
	cas

First come the bundles with base, then the ones with delta.
Thus the first bundle has base, and an offset of 22848 which makes no sense in the patched sb.
Look at the unpatched sb instead. Yup, that works alright.
The size of the bundle should be 14444. That's confirmed too by the unpatched sb; each entry starts
with a 82 byte which comes right after that size.

So this works similar to the noncas archives. However, I think I can't skip the base bundles this time.
It may be that the bundle requires the file from the patched cat (although this is unlikely as
the sha1 depends on the payload). Nevertheless, implementing that is not that difficult anyway, so I'll do it.

Now, what about delta files.
The offset given is for the patched sbtoc, the size too.

Grab such a patched delta bundle and get its structure:
	path #ignore this, just dump all files to the same place
	magicSalt #not necessary for extraction
	ebx
		entry
			name
			sha1
			size
			originalSize #decompressed size IIRC
		entry
			name
			sha1
			size
			originalSize
			casPatchType #never seen this before
		entry
			name
			sha1
			size
			originalSize
			casPatchType
			baseSha1 #never seen this before
			deltaSha1 #never seen this before
	res
		entry
			name
			sha1
			size
			originalSize
			resType
			resMeta
			resRid #never seen this before
		entry
			name
			sha1
			size
			originalSize
			resType
			resMeta
			resRid
			casPatchType
		entry
			name
			sha1
			size
			originalSize
			resType
			resMeta
			resRid
			casPatchType
			baseSha1
			deltaSha1
	chunks
		entry
			id
			sha1
			size
			logicalOffset #previously, logicalOffset always had to appear together with rangeStart and rangeEnd. Not anymore.
			logicalSize #never seen this before
		entry
			id
			sha1
			size
			rangeStart
			rangeEnd
			logicalOffset
			logicalSize
		entry
			id
			sha1
			size
			logicalOffset
			logicalSize
			casPatchType
	chunkMeta
		entry
			h32
			meta
	alignMembers
	ridSupport #never seen this before
	storeCompressedSizes #never seen this before
	totalSize
	dbxTotalSize #not sure why this is mentioned separately


Ugh, many new keywords:
	casPatchType #integer, either 1 or 2
	baseSha1 #indeed a sha1, I assume of the unpatched file
	deltaSha1 #sha1 of the new, patched file?
	resRid #8 bytes, some kind of hash maybe? http://en.wikipedia.org/wiki/Relative_ID
	logicalSize #integer, no clue, don't care
	ridSupport #bool, set to 1 (left out if 0 I suppose)
	storeCompressedSizes #bool, set to 0
	dbxTotalSize #integer, it's two megabytes for the bundle I'm looking at: 2093471


Pick the first entry in this delta bundle.
	sha1: E4D44ADB1AF9CABDDB1827D288F3344221FE09C9
	Exists in the unpatched cat, but not the patched one.
	That entry has none of these fancy new keywords, so the current
	extraction script should be able to handle this case already.

Entry with casPatchType (set to 1) but no other new keywords:
	sha1: 81C73AB2A760D16B94D22982D916E91264A4C964
	Exists in the patched cat, but not the unpatched one.

Entry with casPatchType (set to 2), also contains baseSha1 and deltaSha1:
	sha1: 3C9AAEA1E3FA1117ED2CD458DC22E28003F6CFB3
	Does not exist in either cat.

	baseSha1: AFB0C31A2D05F331B6BD013E26F435C2EBC2CB68
	Exists in the unpatched cat.

	deltaSha1: 8614645E2C818F207A589F62E8B40752DF1E8F84
	Exists in the patched cat.

I don't care about the other keywords as they don't seem essential for extraction.
Well, this is confusing. In the patched noncas bundles there was metadata at the beginning of the bundle
which told me where to use pieces of the unpatched bundle and where to insert the patched data.
This bundle here however does not have any metadata like that. Additionally, I don't have anything to
work with except for the sha1s. So in a way it should probably not be that hard to deal with this.

if casPatchType is 0 (or not specified):
	Grab the payload from the unpatched cat.
if casPatchType is 1:
	Grab the payload from the patched cat.
if casPatchType is 2:
	No clue.


Look at strings to figure this one out.
	The bundle name is win32/persistence/unlocks/soldiers/visual/mp/ch/camo05/ch_assault_mp_appearance_camo05_bpb
	which also exists in the unpatched sb. Grab the bundle from the sb so I have both the patched and unpatched bundle
	to compare directly.

	The patched casPatchType 2 entry has Name: persistence/unlocks/soldiers/visual/mp/ch/camo05/ch_assault_mp_appearance_camo05_bpb/meshvariationdb_win32

	For whatever reason, this string appears twice (?!) in the unpatched bundle, with different sha1s though.
	Note, a bundle is always read in its entirety, so I have absolutely no idea what is going on.
	sha1: AFB0C31A2D05F331B6BD013E26F435C2EBC2CB68
	sha1: 19A0C8033C1264587F3E30306CB89C0329CE1524

	Well, ignore that for now, though keep it in mind; I hadn't even considered the possibility that a single bundle contains two
	files with different content but the same name. Eventually I'll have to have the script compare the sha1 for every file
	while dumping to make sure that both files are extracted. Err, I'd rather not think about the performance hit and the implementation details.

	The first sha1 is used as baseSha1, so retrieve the 2 payloads from cascat. It's an ebx file.

I suspect that the deltaSha1 in the patched cascat does not refer to an actual file, but instead gives me metadata to cut and glue together pieces
from the unpatched/patched files to obtain the actual file (which then has the sha1 as specified).


The delta file has just 76 bytes. It does look like guids that replace the original ones.
So the challenge is to distinguish metadata from actual data that is to be inserted and make sure
that the resulting sha1 is 3C9AAEA1E3FA1117ED2CD458DC22E28003F6CFB3.

delta file:
	20000046 0D2F0180 424290E0 C8AA785A
	E2119BEB DAE5903E 29166F54 408804F6
	823A942C 8D62362D 11F07A76 CE54AB76
	E211BE13 C8D7C07E 9A021BBE CE0140B7
	3092179E B63C2010 5C3F6414

	The 0046 is pretty close to the number of bytes that remain in the file when counting
	from after 0046, namely 48. So this might incicate the number of bytes to copy.

	Now, I suspect these are guids, and parts of the guids might remain the same even in the patched version.
	Search for E2119BEB DAE5903E.

	Yup, got a hit at c0 in the unpatched file. In fact 785A which comes right before is part of the guid too.

	76E211BE13C8D7C07E9A02 appears at 3ec.

	So somehow the file tells me to move several hundred bytes in between.

	Rearrange a bit:
	2000 0046
	0D2F01804242
	90E0C8AA                         #unpatched: 157 to 15b; maybe random, but unlikely
	785AE2119BEBDAE5903E             #unpatched: BE to C8
	2916
	6F54408804F6823A942C8D62362D11F0 #unpatched: 15F to 16F
	7A76CE54AB
	76E211BE13C8D7C07E9A02           #unpatched: 3ec to 3f7
	1BBECE0140B73092179EB63C20105C3F6414


	Meh, I don't get it. So what do you need if you want to patch stuff?

	I would expect something like this:
		Offset and size of unpatched data
		Size of bytes to copy from the delta file

		Grab pieces of the unpatched data and put delta pieces in between.
		Though it might also work with a relative offset.

	Alright, as interesting it would be to solve this with just a single file, take
	a look at another file too.

	Delta starts with 2000005C, and similiar to before 5C is exactly the number
	of bytes coming after that 5C, minus 2.

	So I suppose that 2000 is some kind of magic without deeper meaning.
	And then there are 4 bytes, the first two being the size of the delta file minus the metadata.
	The third and fourth byte are then still part of the metadata with an unknown purpose.

The second half of the sixth byte is always F it seems.

Well, I do know the sha1 that I will get in the end.
Assume that the delta file (which is very small) somehow described the replacement of a single
piece in the unpatched file. As I'm not sure where the metadata is and where the replacement payload starts,
just try all possibilities and check the sha1. Replace a slice of the unpatched file with a slice of the
delta file:
	f=open("delta","rb")
	delta=list(f.read())
	f.close()
	f=open("actualfile","rb")
	data=list(f.read())
	f.close()

	import copy
	import hashlib
	from binascii import hexlify

	for dataPos in xrange(len(data)):
		for deltaPos in xrange(len(delta)):
			for deltaSize in xrange(len(delta)-deltaPos):
				data2=copy.deepcopy(data)
				data2[dataPos:dataPos+deltaSize+1]=delta[deltaPos:deltaPos+deltaSize+1]
				if hashlib.sha1("".join(data2)).digest()=='<\x9a\xae\xa1\xe3\xfa\x11\x17\xed,\xd4X\xdc"\xe2\x80\x03\xf6\xcf\xb3':
					asdf
		print dataPos

Meh, does not work.

Retrieve all delta files and corresponding unpatched files.

First of all, confirm that casPatchType 0 always relies the unpatched cat
and casPatchType 1 always relies on the patched cat
    def casHandlePayload(entry,outPath): #this version searches the patched cat first
        if os.path.exists(lp(outPath)): return #don't overwrite existing files to speed up things
##        print outPath
        sha1=entry.elems["sha1"].content
        try:
            patchType=entry.elems["casPatchType"].content
        except:
            patchType=0
        if patchType==0:
            if "baseSha1" in entry.elems: asdf
            if "deltaSha1" in entry.elems: asdf
            if sha1 not in cat.entries: asdf
            if sha1 in cat2.entries: asdf

        elif patchType==1:
            if "baseSha1" in entry.elems: asdf
            if "deltaSha1" in entry.elems: asdf
            if sha1 in cat.entries: asdf
            if sha1 not in cat2.entries: asdf
        elif patchType==2:
            if "baseSha1" not in entry.elems: asdf
            if "deltaSha1" not in entry.elems: asdf

            baseSha1=entry.elems["baseSha1"].content
            deltaSha1=entry.elems["deltaSha1"].content
            if baseSha1 not in cat.entries: asdf
            if deltaSha1 not in cat2.entries: asdf

Oh great, this fails. For some reason with casPatchType 1 the sha1 is found in the unpatched cat but not the patched one.
More precisely, for casPatchType 1 the sha1 may be found in either the unpatched or the patched cat. Will have to
do the usual approach; try the patched first, then fall back if necessary.

The other two types work as expected though. Type 0 always has the sha1 in the unpatched cat.
Type 2 has the base in the unpatched cat and the delta in the patched cat.

Also: baseSha1/deltaSha1 <=> patchType 2


Alright, ignore type 0 and 1 for now. Retrieve the deltas and bases.
    def casHandlePayload(entry,outPath): #this version searches the patched cat first
        if os.path.exists(lp(outPath)): return #don't overwrite existing files to speed up things
##        print outPath
        sha1=entry.elems["sha1"].content
        try:
            patchType=entry.elems["casPatchType"].content
        except:
            patchType=0
        if patchType==2:
            baseSha1=entry.elems["baseSha1"].content
            deltaSha1=entry.elems["deltaSha1"].content

            deltaEntry=cat2.entries[deltaSha1]
            baseEntry=cat.entries[baseSha1]
            deltaPath=outPath+" delta"
            basePath=outPath+" base"
            out=open2(deltaPath,"wb")
            out.write(sha1)  #write the sha1 in the beginning of the delta file for convenience
            out.write(cat2.grabPayload(deltaEntry))
            out.close()
            out=open2(basePath,"wb")
            out.write(cat.grabPayload(baseEntry))
            out.close()

Alright, 5140 files. So over 2500 base-delta pairs to work with.
The shortest delta file (shaderdatabase_win32.shaderdatabase) is just 0a bytes payload:
	20000004 203F1974 3000 (this is the entire file)

	Once again, the total size after the first 4 bytes is given by 0004 + 2.
	Once again, the second half of the sixth byte is F.

	Run the script over the base and delta to see if I get a matching sha1.
	The smaller the delta the less likely that there are several substitutions.
	That is of course assuming that this is what actually happens.

	Meh, nothing.

mainmenuscreen.ebx:
	delta: 2000000C 014F0140 08087BBD 728586FA 25F2
	Total size after first 4 bytes: 000c + 2
	Second half of sixths byte: F

	Hmm well. This does look similar to the LZ77 algorithm.
	Proceed length is given by c. And the final two bytes are the offset in little endian?
	mainmenuscreen is a very small file though, so it can't be an offset.

Maybe the sha1s are for the decompressed files (although it was different in bf3).
Nope, just checked that and the sha1 is always taken from the compressed file.


Try something different with the algorithm (without that odd list thing):
	f=open("mainmenuscreen.ebx delta","rb")
	sha1=f.read(20)
	delta=f.read()
	f.close()
	f=open("mainmenuscreen.ebx base","rb")
	data=f.read()
	f.close()

	import hashlib
	from binascii import hexlify

	for dataPos in xrange(len(data)):
		for deltaPos in xrange(len(delta)):
			for deltaSize in xrange(len(delta)-deltaPos):
				data2=data[:dataPos]+delta[deltaPos:deltaPos+deltaSize+1]+data[dataPos+deltaSize+1:]
				if hashlib.sha1(data2).digest()==sha1:
					print dataPos, deltaPos, deltaSize
		print dataPos

Applied to mainmenuscreen:
	259 10 7

	Traceback (most recent call last):
	  File "D:\hexing\bf4 test\Neuer Ordner\trial.py", line 17, in <module>
		asdf
	NameError: name 'asdf' is not defined

Got it. Now I know the resulting file. I suppose I just messed up the previous script.
So at position 259 in the unpatched file, I replace 8 (7+1) bytes with bytes from the delta file.

The delta file is:
	2000000C 014F0140 0808 7BBD728586FA25F2
	And the 8 bytes on the right are the ones that were substituted.

	Phew, so the replacement bytes are at the very end.

	259 in hex is 103, can't find that anywhere in the delta though.
	The two 08s are related to the number of bytes to replace I suppose.


Dataversion.ebx has a rather small delta file too, just 1b bytes. However,
apparently at least two things are substituted here because I can't find the sha1.
So I possibly didn't fail with the script before but the files were just bad.
Find more suitable files (with small bases).

fontcollection_ja_fontlibwin32.ebx:
	260 10 7

	delta: 2000000C 015F0150 0808 8D46479E3AD65FDD

	Very neat, this delta is the same as the one before but differs only
	by exactly one in the offset. As a result it says 15 twice instead
	of 14. Note that a number is stored in 1.5 bytes apparently.
	01 5F, but the number is 15.

	That looks awfully redundant. Both the offset and the replacement
	bytes are specified twice. I think there's more to it though:
	The second offset/size could be there to say where to go on after
	placing the delta payload. In the case of these simple examples, a
	number of bytes is substituted, so the two numbers are identical.
	Will need further investigation of course.

actionscriptlibrary.ebx:
	287 10 7
	2000000C 017F0170 0808 3D9AC231DD6F15FE

campaignmissionsscreen.ebx:
	270 10 7
	2000000C 015F0150 0808 B34AD4896B370A2C

campaignmissionsscreen and fontcollection_ja_fontlibwin32 specify
identical delta values, but the actual offset in the files is different.

mainmenuscreenpc.ebx:
	263 10 7
	2000000C 014F0140 0808 D7647AE7C8329AD3

Hm... there's no offset specified somehow.
But there is a number that appears twice. I just don't know its purpose.

Well, one way or another this number has to tell me the offset. There are no other bytes left.
Have the script also me the offset counting from the end instead of the start. Also fix it so
it gives the right number of bytes that are copied (deltaSize+1):
	for dataPos in xrange(len(data)):
		for deltaPos in xrange(len(delta)):
			for deltaSize in xrange(len(delta)-deltaPos):
				data2=data[:dataPos]+delta[deltaPos:deltaPos+deltaSize+1]+data[dataPos+deltaSize+1:]
				if hashlib.sha1(data2).digest()==sha1:
					print dataPos, deltaPos, deltaSize+1, len(data)-dataPos
					asdf
		print dataPos

mainmenuscreenpc.ebx:
	presumed offset bytes: 014F0140
	263 10 8 16

mainmenuscreen.ebx:
	presumed offset bytes: 014F0140
	259 10 8 16

Looking good so far.

fontcollection_ja_fontlibwin32.ebx:
	presumed offset bytes: 015F0150
	260 10 8 16

	Nope. Not good at all.

actionscriptlibrary.ebx:
	presumed offset bytes: 017F0170
	287 10 8 16

	16 everywhere? Is the script doing something wrong?
	Manually substituting the bytes does yield the wanted sha1 though.
	Odd.

I need some files that differ somehow yet make just one substitution.


ultimax_antstate_chunk.ebx:
	306 14 16 28
	20000024 019F0180 2020 00000000  F6FF267B5E455B0EB41D0644812E0DE7  5C9B01000000000000000000

	The substitute part is F6FF267B5E455B0EB41D0644812E0DE7.

ss_stones_01.ebx:
	319 10 1 61
	20000005 01BF0177 0101 73

nogadget2.ebx:
	579 10 1 66
	20000005 035F0301 0101 43


movietexture_shader_sp_prologue_gasexplosion:
	20000005 01CF0178 0101 73
	No sha1 match which is odd because it looks so similar to the previous ones.


263 10 8 16
2000000c 014f0140 0808 d7647ae7c8329ad3

259 10 8 16
2000000c 014f0140 0808 7bbd728586fa25f2

260 10 8 16
2000000c 015f0150 0808 8d46479e3ad65fdd

313 10 8 16
2000000c 019f0190 0808 178d137e63801f41

289 10 8 16
2000000c 017f0170 0808 e5139873617cd410


Mmmh. ultimax_antstate_chunk.ebx again:
	306 14 16 28
	20000024 019F0180 2020 00000000  F6FF267B5E455B0EB41D0644812E0DE7  5C9B01000000000000000000

	The substitute part is F6FF267B5E455B0EB41D0644812E0DE7.

	In fact the substitute could also include all the bytes at the end too.
	It's strange, those bytes are the same before and after applying the delta.

	The 2020 refer to all bytes to the right of them, including 4 nulls. Those nulls however
	are not substituted.


While it will be horribly slow, use that sha1 script within the dumper script.
Automatically dump all files that have just a single substitution.
	def validateSha1(data,delta,sha1):
		t0=time()
		for dataPos in xrange(len(data)):
			for deltaPos in xrange(len(delta)):
				for deltaSize in xrange(len(delta)-deltaPos):
					data2=data[:dataPos]+delta[deltaPos:deltaPos+deltaSize+1]+data[dataPos+deltaSize+1:]
					if hashlib.sha1(data2).digest()==sha1:
						return [dataPos, deltaPos, deltaSize+1, len(data)-dataPos]
			if time()-t0>30: asdf
		asdf
Limit the time spent with each file to 30 seconds.
Also don't consider delta files larger than 50 bytes.
It should take just a few hours to go through all files.


1235 0A 04 37
20000008 1DAF 1D6C 0404 94B5AA0E

11F3 0A 04 04
20000008 1D2F 1D2C 0404 ACF453A8


jet_mfd_q5_mesh.ebx:
	2000000C 0D0F 0C20 0808 5158166D711EB911
	Decompressed base size: d10

	So 0d0f is exactly one less than the base size. Either way, this number
	contains no useful information.


Let me get this straight. The delta file contains offsets in the decompressed file.
But the sha1 that is calculated is the one you get when compressing such a patched file.

Well, that makes zero sense as it is a waste of computational power to first decompress
a file, then patch it, then compress it to confirm its sha1 is correct. So I can only assume
that once again the sha1 is not checked at all. Ahh, never mind that. Both the base and
delta have a sha1 so there's no way to bypass that anyway. I suppose the resulting sha1
serves no real purpose.

Problem is, I don't have the compression algorithm so I cannot directly compare the
sha1. However, I can do the usual sanity checks on the ebx which should suffice to
deal with this.

Oh. The system is really obvious with the decompressed files.

ultimax_antstate_chunk.ebx again:
	306 14 16 28
	20000024 019F 0180 2020 00000000F6FF267B5E455B0EB41D0644812E0DE75C9B01000000000000000000

	The substitute part are the 0x20 bytes at the end.
	These bytes are inserted at 0x180 in the decompressed file.
	The decompressed file is then read until 19f+1, i.e. till the end of file.


ak5c_gunsway.ebx:
	Has several substitutions.
	Entire delta file:
		2000 002A 12BF
		0E38 0303 0AD7A3
		0EC4 0303 0AD7A3
		0F10 0303 0AD7A3
		0F5C 0303 0AD7A3
		0FA8 0303 0AD7A3
		0FF4 0303 0AD7A3

	Header:
		2 (or 1) bytes: Delta type
		2 (or 3) bytes: Size without header
		2 bytes: Total decompressed file size-1 (at least for file size <0x10000); patched/unpatched file? Don't know yet.

	while current position in file < size without header:
		2 bytes: Replacement offset (move there in the base file)
		1 byte: Number of replacement bytes (in the delta file)
		1 byte: Number of bytes to replace in the base file
		x bytes: The payload to replace.

		The two single bytes are just guesses at the moment. It could be similar to
		LZ77 compression so maybe it's possible to e.g. make the second single byte
		twice as large as the first. Thus the delta payload is read just once, but
		applied twice to the base file. That's my idea so far anyway.

		With that I should be able to handle at least some of the simpler files.

I do recall seeing a few files with a non 2000 type. Grab them.
	import os
	import sys

	for dir0, dirs,ff in os.walk(sys.path[0]):
		for filename in ff:
			if filename[-5:]!="delta": continue
			f=open(dir0+"\\"+filename,"rb")
			f.read(16)
			typ=f.read(2)
			f.close()
			if typ!="\x20\x00":
				print filename

This yields:
	m224_pda_mesh.ebx delta
	layer0_cinematic.ebx delta
	layer1_ui_schematic.ebx delta
	layer18_shipwreck.ebx delta
	layer1_ui_schematic.ebx delta
	hotel_wings_02_mesh.ebx delta
	twigs_01_mesh.ebx delta
	decal_plaster_02_mesh.ebx delta
	foresttree_l_01_rig_mesh.ebx delta
	foresttree_l_03_skin_mesh.ebx delta
	leaftree_full_l_01_mesh.ebx delta

m224_pda_mesh.ebx:
	Substitute the last 9 bytes in the delta file.
	10000001 0C80 0018 0000 0018 0970 000E
	1A000100 90
	80378B11A531D30298

	This one is rather different from the previous format.
	I think I could substitute the 9 bytes, then decompress the file,
	and get the offset after decompression. That should appear
	somewhere in this format I suppose.

	Compressed offset: 84c
	Decompressed offset: c8f

	Compressed base file size: 95c
	Decompressed base file size: f30

	0C80 in the delta is pretty close to the decompressed offset.


decal_plaster_02_mesh.ebx:
	10000001 0C00 0018 0000 0018 0970 000E
	1A000100 90
	803D811935F2361F1E

	Almost the same as before.

	Compressed offset: 7e0
	Decompressed offset: c0f

	Compressed base file size: 85c
	Decompressed base file size: cd0

	Once again, subtract F from the decompressed offset to get 0c00 from the delta.

Wait a sec.
No. Please.

0970 here? So the delta is compressed?
Not only that, it's compressed in such a way that it actually wastes space.

Recall how compression works:
	Structure of a compressed block (big endian):
		4 bytes: decompressed size (0x10000 or less)
		2 bytes: compression type (0970 for LZ77, 0071 for uncompressed data)
		2 bytes: compressed size (0000 for uncompressed data) of the payload (i.e. without the header)
		compressed payload

	Decompress each block and glue the decompressed parts together to obtain the file.

	The compression is an LZ77 variant. It requires 3 parameters:
		Copy offset: Move backwards by this amount of bytes and start copying a certain number of bytes following that position.
		Copy length: How many bytes to copy. If the length is larger than the offset, start at the offset again and copy the same values again.
		Proceed length: The number of bytes that were not compressed and can be read directly.

	Note that the offset is defined in regards to the already decompressed data which e.g. does not contain any compression metadata.

	The three values are split up however; while the copy length and proceed length are
	stated together in a single byte, before an uncompressed section, the relevant offset
	is given after the uncompressed section:
		Use the proceed length to read the uncompressed data, at which point you arrive at the start of the offset value.
		Read this value, then move to the offset and copy a number of bytes (given by copy length)
		to the decompressed data. Afterwards, the next copy and proceed length are given and the process starts anew.

	The offset has a constant size of 2 bytes, in little endian.

	The two lengths share the same byte. The first half of the byte belongs to the proceed length,
	whereas the second half belongs to the copy length.


So the previous file is something like this:
	10000001 0C00 0018

	Compression header:
		00000018 decompressed size
		0970 LZ77
		000E compressed size
	1A, proceed 1 bytes, copy A+4=E times
	00, the payload to be copied E times
	0100, the offset to move backwards to copy, namely the nullbyte

	90, proceed 9 bytes (till the end of the file)
	803D811935F2361F1E, just add this to the decompressed payload

	So the decompressed payload is (1 null, clone E times, then 9 bytes):
		000000000000000000000000000000803D811935F2361F1E

	And thus the delta file becomes (after decompression):
		1000 0001 0C00 0018
		000000000000000000000000000000803D811935F2361F1E

		0c00: offset in base file to substitute
		0018: number of bytes to copy.


	Header (big endian):
		2 (or 1) bytes: Delta type (2000 is uncompressed, 1000 is compressed)
		2 (or 3) bytes: Size without header (set to 1 for compressed)
		2 bytes (only if uncompressed): Total decompressed file size-1

	while current position in file < size without header:
		2 bytes: Replacement offset (move there in the base file)
		1 byte: Number of replacement bytes (in the delta file; 0 for compressed)
		1 byte: Number of bytes to replace in the base file
		x bytes: The payload to replace.

With that I should be able to handle the simple deltas without brute-forcing my way through them.
But surely there are more types out there than just 10 and 20. It would be too easy otherwise.
I can't find any though in those files that I have solved.

Run through the bundles again and throw some errors (for the moment, only consider small deltas):
    if len(deltaData)>50: return
    deltaStream=StringIO(deltaData)
    deltaStream.seek(0)

    typ,deltaSize=unpack(">HH",deltaStream.read(4))

    if typ==0x1000:
        if deltaSize!=1: asdf
    elif typ==0x2000:
        totalDecompressed=unpack(">H",deltaStream.read(2))[0]
        if totalDecompressed+1!=entry.elems["originalSize"].content:
            asdf
    else:
        asdf

Error with file 9k22_tunguska_m.ebx; totalDecompressed does not match originalSize.

Delta:
	2000 000C FFFF
		E926 0101 C0
		E948 0303 000000

	2000 0017 7FBF
		1BA4 0606 CDCC0C3F0000
		56C2 0202 7042
		56CC 0303 333333

	So this delta has two blocks within a single file.
	Each block can only handle FFFF+1=10000 bytes max.

This should be easy to fix. However, ignore the error for now, and continue.
Next error, lav_ad.ebx has type 0.

Delta:
	0000 0001
	2000 0007 6F3F
		1294 0303 E17A14

	Two random ideas what this is about.
		1) A way to seek past several blocks without having to specify all the other crap.
		2) Substitution in the compressed file instead.

	I know that exactly three bytes, E17A14, are substituted in the file.
	Just ask a script where exactly.

	No hits.

	Compressed size: BB86
	Decompressed size: 16F40

	The value of 6f3f is pretty far from the expected size.

No clue what this is about, keep it in mind for later when everything else is done.


Remove the restriction so larger deltas are okay too.

venicesoldierinputconcepts.ebx is type 10 but has deltaSize!=1:
	1000 0002 0004 0024 0000 0024 0070 0024

	So the value is 2 here. Problem is, that delta is 8kb so I can't manually analyze it.
	Keep that in mind too for later.


Redefine the format a bit:
	All values in big endian.
	Header:
		0.5 bytes: Delta type (2 is uncompressed, 1 is compressed, 0/3/4 possible too)
		3.5 bytes: deltaSize, size without header (set to 1 for compressed)
		2 bytes (only if uncompressed): Total decompressed file size-1

	while current position in file < size without header:
		2 bytes: Replacement offset (move there in the base file)
		1 byte: Number of replacement bytes (in the delta file; 0 for compressed)
		1 byte: Number of bytes to replace in the base file
		x bytes: The payload to replace.

Number of files depending on type:
	type 0: 15
	type 1: 1311
	type 2: 10075
	type 3: 12
	type 4: 245

All the more reason to finish type 2.

Hack something together:
    deltaStream=StringIO(deltaData)
    deltaStream.seek(0)

    EOF=len(deltaData)
    baseOffset=0

    while deltaStream.tell()< EOF:
        deltaPos0=deltaStream.tell()
        tmpSize=unpack(">I",deltaStream.read(4))[0]
        typ=tmpSize>>28
        deltaBlockSize=tmpSize&0xfffffff

        if typ!=2:
            if baseOffset!=0: asdf
            else: return

        baseBlockSize=unpack(">H",deltaStream.read(2))[0]+1 #add to baseOffset at the end

        while deltaStream.tell()-deltaPos0 < deltaBlockSize:
            offset=unpack(">H",deltaStream.read(2))[0]
            byte1,byte2=unpack("BB",deltaStream.read(2))
            if byte1!=byte2: asdf
            substitute=deltaStream.read(byte1) #not used yet


multiplayerconsumableunlocksetup.ebx fails due to byte1!=byte2:
	2 0000493 08CF
		0008 0101 20
		001C 0A0A D000000002000000F004
		0286 0101 16
		02A0 0101 16
		030B 0600 036107000387 (not sure what's going on)
		0300 048A FFFFA412 (offset went backwards, bad)
		CB06 3444 922CDD837D4910F8500000002C (offset exceeds file size, completely lost track of it)

		Try that again from the critical part.
		030B 0600
		0361 0700
		0387 0300
		048A FFFF
		A412CB063444922CDD837D4910F8500000002C

		Apparently it accumulates offsets and sizes and terminates with FFFF.

Have it skip the file when something like this occurs.

Error with dataversion.ebx:
	2 0000015 01CF
		0186 0002 4430
		019E 0200
		01B8 0202 4149
		01C4 0101 31

	The second byte specifies the number of substitute bytes in the delta file.

	The first byte does... something. It certainly requires an offset.
	So I substite 2 bytes at that offset, but where do I take them from?
	Or do I just remove them? That could actually work. Ebx files
	always have a size that is a multiple of 16, so in the lines above
	I would add two bytes first, then remove two bytes later, so these cancel
	each other out.

	First byte: Remove this number of bytes at the offset
	Second byte: Place this number of bytes at the offset (read the bytes in the delta file)

Still wondering why it went FFFF above. Or was it just the maximum amount of bytes that could be specified?
Indeed, if I read FF bytes in the multiplayerconsumableunlocksetup delta, the next bytes say 0589 FFFF.
Not sure how to code such a byte removal in an efficient manner. Well, there are more pressing matters
right now anyway.

Adjust the script a bit. Note that baseOffset is not increased yet, so the script returns as
soon as the type is changed:
    while deltaStream.tell()< EOF:
        tmpSize=unpack(">I",deltaStream.read(4))[0]
        typ=tmpSize>>28
        deltaBlockSize=tmpSize&0xfffffff

        if typ!=2:
            if baseOffset!=0: asdf
            else: return

        baseBlockSize=unpack(">H",deltaStream.read(2))[0]+1 #add to baseOffset at the end
        deltaPos0=deltaStream.tell()
        while deltaStream.tell()-deltaPos0 < deltaBlockSize:
            offset=unpack(">H",deltaStream.read(2))[0]
            removeCount,addCount=unpack("BB",deltaStream.read(2))
            substitute=deltaStream.read(addCount) #not used yet

Small issues (like most of the implementation) aside, the script should be able
to extract 9612 files (those which use type 2 delta blocks only).
The total number of casPatchType 2 files is 11658, so that's a fair share of files already.
A few hundred files that start as type 2 have gone missing, so they must've changed the type later on.

Now the final question is whether that delta file is applied sequentially or not.
Removing a few bytes from the middle of the file, then shifting everything after the cut
to the left does not seem viable.
That should be easy to figure out though:
    if removeCount==255 and addCount==0:
        asdf

The relevant delta bytes:
	feef ff 00
	ffee 12 00

So in effect it removes all bytes from feef until ffee+12.
Try the same the other way around:
    if removeCount==255 and addCount==0:
        asdf

The relevant delta bytes:
	83F1 00 FF *255 bytes*
	83F1 00 FF *255 bytes*

So the offsets do not adjust for that either.
I'm not sure what this means in practice. It could be that
the second operation puts the bytes before the bytes of the first operation.
The opposite case seems just as plausible though.
Well alright. Have a second stream and just put the bytes in it.

Keep in mind to decompress the base file before applying the delta.
Also, for some reason a compression type 0x70 appeared in the base files.
So a patched bundle accesses an unpatched file which is never used in any
unpatched bundle. Just handle 0x70 the same as 0x71, i.e. as uncompressed payload:

Snippet:
    deltaEntry,deltaStream=cat2.getCas(deltaSha1)
    baseEntry,compressedBase=cat.getCas(baseSha1)
    baseStream=decompressLZ77(compressedBase,baseEntry.size)
    compressedBase.close()

    patchedStream=open2(outPath,"wb") #here be the new data

    baseOffset=0  #to handle the base offsets when delta contains more than one block
    while deltaStream.tell()-deltaEntry.offset < deltaEntry.size: #read one block
        tmpSize=unpack(">I",deltaStream.read(4))[0]
        typ=tmpSize>>28
        deltaBlockSize=tmpSize&0xfffffff

        if typ!=2:
            patchedStream.seek(0)
            patchedStream.write("abcd") #break the file magic so the ebx script does not get confused
            patchedStream.close() #this is actually better than deleting the file (I skip the file entirely if it already exists)
            return #todo

        baseBlockSize=unpack(">H",deltaStream.read(2))[0]+1 #has no other purpose than to be added to baseOffset after the loop
        deltaPos0=deltaStream.tell()
        while deltaStream.tell()-deltaPos0 < deltaBlockSize: #go through the individual changes described within one block
            offset=unpack(">H",deltaStream.read(2))[0]
            skipCount,addCount=unpack("BB",deltaStream.read(2))

            sizeUntilOffset=baseOffset+offset-baseStream.tell()
            patchedStream.write(baseStream.read(sizeUntilOffset))

            #skip the bytes, move to new position in the base stream and pretend the bytes were read
            baseStream.seek(skipCount,1)
            baseOffset+=skipCount

            #add the bytes
            patchedStream.write(deltaStream.read(addCount))

        baseOffset+=baseBlockSize

    #add the remaining bytes of the base
    patchedStream.write(baseStream.read())
    patchedStream.close()
    deltaStream.close()


It's still really slow. The individual blocks in the LZ77 file and in the delta file suggest that I should
decompress one block, then apply the delta on that block, then decompress the next block etc.
This does require that the deltas are always synchronized with the base file though. I have
added 1 to baseBlockSize to obtain the number of bytes in the decompressed file. Still, verify
this for all files before going further. Though first of all, see if the ebx script can handle
the files without errors.


Fail at file c_marine_01.ebx:
	KeyError: 177537

	Somehow there are random bytes in the keyword section.

Back to the beginning then.
The corresponding delta:
	2 0000028 075F #type 2, 28 bytes replaced, decompressed block is 75f+1 in total
		00C0 20 00 #remove 20 bytes at c0; these are the same bytes that are added in the next step
		0100 00 20 D66030DC07317746BAAC1CEEF8212E0F1B556D54D2A69448B3621769C50B227C #add these 20 bytes at 100

	In effect this should remove the guid pair D660... from the middle of the list of external guids and
	put it at the end of the list.

I know for certain that the keyword section must start at 100 (in both the unpatched and patched file).
The appropriate place to add the bytes is 100-20=e0. Though in fact, I don't even need
the offset for a pure add operation. I just attach the bytes to the end of
the patched stream. For some reason there are keywords at e0 and the guid pair at 100.
Keep a separate counter for removed bytes and subtract it from the offset?
This is starting to confuse me.

So what does the script do exactly:
	00c0 20 00:
		First of all, write all bytes until c0 to the patched stream.
		In the base stream, seek 20 bytes forwards.

	0100 0020:
		Write all bytes until 100 to the patched stream (bad).
		Add the 20 bytes to the patched stream.


Okaaay, with a new variable skipTotal it seems to work correctly:
    skipTotal=0
    while deltaStream.tell()-deltaPos0 < deltaBlockSize: #go through the individual changes described within one block
        offset=unpack(">H",deltaStream.read(2))[0]
        skipCount,addCount=unpack("BB",deltaStream.read(2))

        sizeUntilOffset=baseOffset+offset-baseStream.tell()-skipTotal
        patchedStream.write(baseStream.read(sizeUntilOffset))

        #skip the bytes, move to new position in the base stream and pretend the bytes were read
        baseStream.seek(skipCount,1)
        baseOffset+=skipCount

        #add the bytes
        patchedStream.write(deltaStream.read(addCount))
        skipTotal+=skipCount

If there's an elegant solution to this, I can't see it right now.

Not too bad now, but it fails at 9k22_tunguska_m.ebx. That file is more than 10000 bytes.

Delta:
	2 000000C FFFF
		E926 01 01 C0
		E948 03 03 000000
	2 0000017 7FBF
		1BA4 06 06 CDCC0C3F0000
		56C2 02 02 7042
		56CC 03 03 333333

These are plain substitutions. I really need to check if the delta blocks always match compressed blocks.
That should simplify things at least a bit.

Well, that's interesting. The number of blocks always match, so one delta block means one compressed block.
However, the size given by the delta is not always the block size.

Grab the smallest delta file with size mismatch.

mp_naval_networkregistry_win32.ebx:
	base decompressed block size: 6a0
	delta block size: 680

	delta:
		2 0000023 067F #block size 67f+1?
			0004 09 09 90050000 F000000022 #oh. substitute the metadata right after the magic
			0160 20 00
			05A0 01 01 22
			0610 01 01 22
			069C 04 04 00000000

	The metadata says that the new file size is 90050000+f0000000 = 590+f0 = 680.

	So the delta block size is the size the block must have in the end.

Find a mismatch with at least two blocks and check the metadata again.

sp_airfield delta:
	2 0000103 FFD7
		0008 01 01 40 #patch one byte of the payload size (from 70) to 40
		9998 09 09 060400002F00000070
		99AC 01 01 AC
		99B8 01 01 B8
		99C4 01 01 D0
		99D0 04 04 D8740000
		99DC 01 01 00
		99E8 01 01 2C
		99F4 01 01 34
		9A00 01 01 50
		9A0C 01 01 58
		9A18 01 01 60
		9A24 01 01 6C
		9A30 01 01 84
		9A3C 01 01 8C
		9A48 01 01 A0
		9A54 01 01 A8
		9A60 01 01 B0
		9A6C 01 01 C8
		9A78 01 01 D0
		9A84 04 04 D8A70000
		9A90 02 02 E0A7
		9A9C 02 02 F0A7
		9AA8 01 01 80
		9AB4 01 01 90
		9AC0 02 02 D4BA
		9ACC 04 04 D0BC0000
		9AD8 02 02 D8BC
		9AE4 02 02 E0BC
		9AF0 01 01 00
		9AFC 01 01 30
		9B08 01 01 60
		9B14 01 01 90
		9B20 01 01 B4
		9B2C 01 01 C8
		9B38 02 02 D8BD
		9B44 01 01 00
		9B50 01 01 10
		9B5C 02 02 D4C1
		9B68 02 02 F8CA
		9B74 01 01 2C
		9B80 01 01 34
		9B8C 01 01 48
		9B98 01 01 54
		9BA4 01 01 5C
		F270 01 01 06
		F74C 28 00
	2 0000004 9B17
		7794 08 00

	Meta section size: 9bb0 both before and after patching
	Payload section size: ff70 (before), ff40 (after)
	Total size: 19b20 (before), 19af0 (after)

	delta block sizes: FFD7+1 + 9B17+1 = 19af0

I get the idea of how to implement this.
However, I really want to make sure I can treat each block separately.
That should increase performance and be much simpler to handle.

Try to find a delta file with more than one block. For any block
before the last one, check if its last delta operation has
offset+skipCount >= baseBlockSize. That would indicate
that the operation stretches over two blocks, but requires two
entries, one for each block the operation resides in.

multiplayertemplate.ebx matches this requirement.
However, the next block is of type 3. Ignore that file for now.


weaponsbundlesp/shaderdb.shaderdb:
	offset+skipCount = 65149
	baseBlockSize = 64632

In fact, the offset alone is greater. Hmm, the requirement is wrong.
I should compare offset+skipCount with the decompressed block size (in the base).
At least, that's what I think.


ind_servicebuilding_02_destruction_physics_win32:
	offset+skipCount = 65536
	decompressedBlockSize = 65536

	Delta:
	...
		FFA4 5C 00
	2 0002637 fedf
		0000 3c ff

I would prefer a delta that doesn't have an addCount.

ch_fac_dv15_sp_player.ebx:
	2 00015CB 764F
		0000 3F 6F

And these two are indeed the only files that satisfy the condition.
Well, fuck it. The data is pretty conclusive anyway.
It has offset 0 for both of these files, so immediately
at the start of the block the script skips some bytes.
If the offset was higher the script would first read some unpatched bytes.
As it is however, with offset 0 it does not really matter if it's
a pure skipCount or if there are some bytes added.
More importantly, there is not a single time when the skip count
exceeds the block size (it's equal in the two cases above).

Therefore, rewrite the LZ77 decompression to yield single decompressed blocks.


Still can't get the script to work correctly. Try to gather all remarkable features of
the delta format an describe it once more:

Special cases:
	Skipping more than ff bytes:
		feef ff 00
		ffee 12 00 #offset moves through the base payload
		=> Do not read the bytes in the base file from feef to ffee+12 under any circumstances.

	Adding more than ff bytes:
		83F1 00 FF *255 bytes*
		83F1 00 FF *255 bytes* #the offset remains the same as no bytes of the base payload are read in between
		=> When adding bytes only the base offset remains the same.


General approach:
	Decompress one LZ77 base block and parse the corresponding delta block.
	Apply the delta on the decompressed base to obtain the patched block.

Delta block structure:
	All values in big endian.
	Header:
		0.5 bytes: Delta type (2 is uncompressed, 1 is compressed, 0/3/4 possible too)
		3.5 bytes: Delta block size; size without header (set to 1 for compressed)
		2 bytes (only if uncompressed): Final size of the patched block (must add 1 to get the actual value)

	while current position in the block < delta block size:
		2 bytes: Base offset (add base data to the new patched file until reaching this offset)
		1 byte: Skip count, do not read these bytes from the base file (but seek past these bytes)
		1 byte: Add count
		*The bytes to add, given by add count*

		1) Read all bytes from the current position in the decompressed base block until the base offset is reached.
		2) In the base block, (starting from the base offset) seek past the number of bytes given by skip count.
		3) Add the add-bytes to the end of the patched stream.

	Finally, read more base bytes until the patched block has the size given by the header.

Phew, that summarizing that was a great help, got it working right away and the ebx script could handle all extracted files.

Snippet:
    patchedStream=open2(outPath,"wb") #write the new file in stream, might write directly too though.
    deltaStreamSize=0 #got to keep track of the number of bytes written to the stream
    for baseBlockStream in decompressLZ77(compressedStream,baseEntry.size):
        baseBlockStream.seek(0)

        tmpSize=unpack(">I",deltaStream.read(4))[0]
        typ=tmpSize>>28
        deltaBlockSize=tmpSize&0xfffffff

        if typ!=2:
            patchedStream.seek(0)
            patchedStream.write("abcd") #break the file magic so the ebx script does not get confused
            patchedStream.close() #this is actually better than deleting the file (I skip the file entirely if it already exists)
            return #todo

        patchedBlockSize=unpack(">H",deltaStream.read(2))[0]+1 #usually equals the uncompressed base block, but not always
        deltaPos0=deltaStream.tell()

        currentPatchedBlockSize=0 #use this size to calculate the remaining base bytes to read (after the loop)
        while deltaStream.tell()-deltaPos0 < deltaBlockSize: #go through the individual changes described within the delta block
            #parse the delta
            offset=unpack(">H",deltaStream.read(2))[0]
            skipCount,addCount=unpack("BB",deltaStream.read(2))
            addBytes=deltaStream.read(addCount)

            sizeUntilOffset=offset-baseBlockStream.tell()
            patchedStream.write(baseBlockStream.read(sizeUntilOffset)) #write bytes that require no modification
            baseBlockStream.seek(skipCount,1) #seek past the skip bytes
            patchedStream.write(addBytes) #add the bytes

            currentPatchedBlockSize+=(sizeUntilOffset+addCount)

        #read as many bytes necessary until the patchedStream has the correct size
        patchedStream.write(baseBlockStream.read(patchedBlockSize-currentPatchedBlockSize))
    patchedStream.close()
    deltaStream.close()


Type 1 delta (rankparams.ebx):
	1 #type
	0000001  #maybe number of compressed blocks?
	3000 #decompressed base offset to substitute
	12EB #substitute size, also given in the compression header
	Compressed block:
		Header:
			000012EB 0970 10C0
		*10c0 bytes compressed payload*


Alright, that's not too difficult to implement.
Find a delta with more the presumed number of compressed blocks greater than one.

venicesoldierinputconcepts.ebx (again):
1 0000002 0004 #replace at offset 4
	0024 00000024 0070 0024 *24 bytes*
4BEA #replace at offset 4bea?
	4846 000045F6 0970 1E53 *1e53 bytes*
And the delta ends after that.

But what's the purpose of 4846 here?

First of all, the header is replaced (with uncompressed data even).
So is the size of the file different?
before: 5870 + 3bc0 = 9430
after: 5810 + 39d0 = 91E0

Great, so the delta must remove 250 bytes.
Which is of course exactly the difference 4846-45f6.
4846 is the number of bytes till the end of file.

I suppose then that 4846 specifies the number of bytes to skip
in favor of the new bytes that are added.


Type 1 delta (venicesoldierinputconcepts.ebx):
	1 #type
	0000002  #number of blocks

	for each block:
		0004 #decompressed base offset
		0024 #skip count

		Compression header:
			00000024 0070 0024
		*0024 bytes compressed payload*

		Skip the specified bytes and use decompress the payload to use instead.


battlepacks.ebx fails:
	For whatever reason, I end up 0c bytes before the end of the delta file.
	And that's just after reading one of two blocks.

	The last few bytes:
		B70C 001C 00000000 0000 0000

	I suppose it just tells me to skip these bytes and not add anything?
	For the compression, if type is 0, then return empty-handed.
	I bet the ebx script will fail anyway.


levellistreport.ebx cannot be handled by the ebx script.
In fact it seems to contain every line twice and it missing
lots of stuff. It contains just a single type 1 delta block.

1 0000001
	0000 02A0 000002A0 0970 01BE *01BE bytes compressed payload*

Read till offset 0, i.e. read no bytes.
Remove 2a0 bytes.
Add the decompressed payload, which consists of 2a0 bytes.
Thus, the entire file is replaced.

So why does it fail so horribly? Simple, I forgot to read the rest of
the block once the bytes are skipped and replaced.


vehicleshed_medium_mesh fails. Some parts of the string section are
right in the middle of the metadata. It's type 1 with 2 blocks within.

Delta:
	1 0000002
		0004 005C 0000007C 0970 007B *7b bytes compressed payload*
		0BAC 01E4 000002B4 0970 0193 *193 bytes compressed payload*
	End of file

Just manually separate the pieces, decompress them and the base,
then figure out what to do. Meh.

So the first block changes the metadata and the size of the ebx.

Before: Size = bc0+1d0 = d90
After: Size = be0+2a0 = e80

Size of the file that the script put together using the delta: ee0
That's close, but not good enough.

Also note how the first delta block skips 5c bytes, but adds 7c. That means
that one guid pair (size exactly 20 bytes) is added. In fact, the new metadata
confirms this too (the number of guid pairs is increased from 2 to 3).

The first delta block is safe to apply.
So err, create another file in the hex editor, then grab base 4 bytes.
Then the 7c delta bytes.
Then skip 5c base bytes starting from 4, so move to 60.
Read base from 60 until 0bac.
Add 2b4 delta bytes. AND WITH THIS I HAVE REACHED E80.
Then skip 1e4 base bytes starting from bac, i.e. until D90 (which is the base EOF).

That looks all fine to me, including the file I get when manually doing this.
So why the heck did the script fail? Nvm, just a coding mistake. All fixed.


I get 1951 ebx files out of 2069 and the ebx script can handle them all.
Keep in mind that this number is without the duplicates throughout the different bundles.
That leaves about 100 unique files containing type 0,3 or 4.

Snippet:
    patchedStream=open2(outPath,"wb") #write the new file in stream, might write directly too though.
    deltaStreamSize=0 #got to keep track of the number of bytes written to the stream
    for baseBlockStream in decompressLZ77(compressedStream,baseEntry.size):
        baseBlockStream.seek(0)

        tmpSize=unpack(">I",deltaStream.read(4))[0]
        typ=tmpSize>>28
        deltaBlockSize=tmpSize&0xfffffff

        if typ==2:
            patchedBlockSize=unpack(">H",deltaStream.read(2))[0]+1 #usually equals the uncompressed base block, but not always
            deltaPos0=deltaStream.tell()

            currentPatchedBlockSize=0 #use this size to calculate the remaining base bytes to read (after the loop)
            while deltaStream.tell()-deltaPos0 < deltaBlockSize: #go through the individual changes described within the delta block
                #parse the delta
                offset=unpack(">H",deltaStream.read(2))[0]
                skipCount,addCount=unpack("BB",deltaStream.read(2))
                addBytes=deltaStream.read(addCount)

                sizeUntilOffset=offset-baseBlockStream.tell()
                patchedStream.write(baseBlockStream.read(sizeUntilOffset)) #write bytes that require no modification
                baseBlockStream.seek(skipCount,1) #seek past the skip bytes
                patchedStream.write(addBytes) #add the bytes

                currentPatchedBlockSize+=(sizeUntilOffset+addCount)

            #read as many bytes necessary until the patchedStream has the correct size
            patchedStream.write(baseBlockStream.read(patchedBlockSize-currentPatchedBlockSize))
        elif typ==1:
            for i in xrange(deltaBlockSize):
                offset,skipCount=unpack(">HH",deltaStream.read(4))
                addBytes=decompressLZ77Block(deltaStream).getvalue()

                sizeUntilOffset=offset-baseBlockStream.tell()
                patchedStream.write(baseBlockStream.read(sizeUntilOffset))
                patchedStream.write(addBytes)
                baseBlockStream.seek(skipCount,1)
            patchedStream.write(baseBlockStream.read())
        else:
            patchedStream.seek(0)
            patchedStream.write("abcd")
            patchedStream.close()
            return


Next types:

multiplayertemplate.ebx has type 3 (a bit further down the delta file).

2 0001590 FFFF
	...
3 0000001
	00000210 0970 0098 *98 bytes*
1 0000002
	0D80 0039 00000039 0970 0021 *21 bytes*
	A854 408C 0000435C 0970 1030 *1030 bytes*
EOF

Well isn't that grand. The base file has only two blocks while
the delta has three. Now, should I revert the script to a former
version and deal with the offset madness? For the moment I want
to hope that the new type 3 might explain everything.

To be honest I think the type 1 (compressed) does rely
on the block separation. Without each block that type wouldn't
know how many bytes to read at the end. Or put another way,
that type always expects me to read until the end of the block.

Let's ask other type 3 blocks about their opinion.

rhib.ebx:
	3 0000001 00000060 0970 003b *3b bytes*
		00000001
	EOF

quadbike:
	3 0000001 00000060 0970 005b *5b bytes*
		00000001
	EOF

ch_fav_lyt2021:
	3 0000001 00000060 0970 0032 *32 bytes*
		00000001
	EOF

I can see a pattern there, lol.

vdv_buggy:
	3 00000001 ...
		2 0000007 5aaf 01e4 03 03 000001
	EOF

Type 2 comes after a type 3. Oh man...
But the EOF always seems to come soon after.
Maybe type 0 or 4 are easier to understand.


weaponstatcategories has type 4.
Three meshvariationdb_win32 files too (mp_flooded/content, mp_thedish/content, sp_dam/citybridge).


weaponstatcategories:
	2 ...
	4 0000001 3000 0001  0000CC00 0970 A4D3 *a4d3 bytes*
	EOF

The file also contains two blocks like the delta, phew.
So type 4 defines two shorts and then has the compressed payload.


mp_thedish/content:
	2 ...
	4 0000001 3000 0001  00006629 0970 33E6 *33e6 bytes*
	EOF

Same behaviour.
This type certainly looks more tolerable than type 3, so it
should not be too hard to handle it.


weaponstatcategories:
	Have the script apply all those little changes from type 2 already.
	I.e. I have the first block all patched, but the second block missing
	entirely at the moment.

	Total expected patched size (according to header): 34d0+18b30 = 1C000
	Size of the decompressed delta: CC00
	Size of the patched first block: f400
	cc00+f400 = 1c000. So apparently I do not need to do anything, just
	replace the entire block with the new payload.

Oh. It's actually a prefix. It's 4 bytes of type 4, prefixing type 3.

But maybe type 0 is easier to understand?

mp_resort/content:
	2 000000C FFFF
		C640 08 08 F977E76551DEC342
	0 0000002
	EOF

The file has three base blocks. I suspect that type 0 simply means
that these blocks remain the same.

In fact I can't even verify this either way. The delta simply
replaces 8 bytes in the payload section (some number or guid that
the ebx script certainly will not complain about) and the rest
must remain the same or the file size would not match the metadata
(not to mention, end abruptly).
Still, it makes the most sense to me, plus it confirms that it
is mandatory to deal with each block individually.
However, so far I've expected one delta block for each base block,
which is not the case. The above example in particular shows that
I must loop over the delta blocks and not the base blocks.
When iterating over the base blocks I can't correctly perform
the 0 0000002 instruction above (unless I add some more variables).


3 types done, 2 to go.

Now, about the number of blocks.

Recall that multiplayertemplate showed that type 3 acts independently of the base blocks:

2 0001590 FFFF
	...
3 0000001
	00000210 0970 0098 *98 bytes*
1 0000002
	0D80 0039 00000039 0970 0021 *21 bytes*
	A854 408C 0000435C 0970 1030 *1030 bytes*
EOF

with 2 base blocks, but three delta blocks. Similarly, weaponstatcategories had two base
blocks, and the type 3 delta is probably once again independent of the base blocks:
	2 ...
	4 0000001
	3 0000001 0000CC00 0970 A4D3 *a4d3 bytes*
	EOF

As type 0 keeps one block unchanged, I think type 4 could be about skipping one block entirely.
Type 3 seems to just insert new payload independently of base blocks.

Give that a shot. Though, what happens when type 3 has a number different from 1? Does that mean
several compressed pieces because a single piece would exceed 10000 bytes?


mp_prison/materialgrid_win32 has such a case:
	2 ...
	3 0000002
		00010000 0970 581B ... #yup, 10000 right there
		000051D7 0970 23B8 ...
	1 0000002
		0000 9A64 #read till offset 0, skip 9a64 bytes
			00000000 0000 0000 #so this should return an empty string?
		9B1C 64E4
			00005851 0970 3DD8 ... #ordinary compression
	4 0000001 #skip one block?
	3 0000001
		00008324 0970 574B ... #use this block instead
	4 0000002 #skip two blocks
	3 0000001 00004C89 0970 231C ...  #use this block
	3 0000003 00010000 0970 5B3B ... #use these blocks. But why isn't there just one type 3 assignment with 4 blocks?
	4 0000001 #skip a block
	EOF

The base file has 6 blocks.
Delta: type2, type1, type4, type4(*2),type4 => 6 blocks

So what's up with that odd type 0 compression?

It's been there with battlepacks.ebx too:
	2 ...
	1 0000002
		338B 002E
			00000029 0070 0029 ...
		A7F1 16A4
			0000166C 0970 0B9B ...
	1 0000002
		42F1 65EC
			000064F0 0970 5872 ...
		B70C 001C
			00000000 0000 0000
	EOF

Alright, returning an empty string makes sense. I've also just realized that compression
type 70 apparently uses compressedSize=decompressedSize, whereas type 71
uses compressedSize=0 (note that both types have uncompressed payload).

The ebx script seems to run fine over all files:
	>>> correct
	[68160, 3494, 36387]
	>>> incorrect
	[0, 0, 0]


Well those were two excellent guesses about these last two types I suppose. 2069 files in total now.
Enable the other casPatchTypes again.


Recall that:
	casPatchType 0 has the sha1 in the unpatched cat.
	casPatchType 1 may have the sha1 in either the unpatched or the patched cat.
	casPatchType 2 has the base in the unpatched cat and the delta in the patched cat.

Go for the safest approach and treat casPatchType 1 like casPatchType0, i.e. try patched first, then unpatched.

So what about base and delta. Can they appear together?

Number of files for each combination:
	+base +delta => 0 files
	+base -delta => 955 files
	-base +delta => 686 files
	-base -delta => 0 files

And for the unpatched files there are only files without base and delta.
So there are three different cases. As I have different functions
depending on whether or not there's "Update" in the tocRoot, I can
at ignore the unpatched part.

Put it all together, and it fails miserably after a few thousand files.

sbtoc with error: MpCharacter

It can read the toc file without issues. Apparently the sb causes some trouble.
It fails at a base/nondelta bundle. Ah. I forgot that base requires me to
open up the unpatched sb (in the unpatched folder).
Hack something together to get the unpatched path. The patched noncas required that too,
so maybe I can grab the lines from there. Meh, it was different there.

Unpatched path is: ...\bf4\Data\Win32
Patched path is:   ...\bf4\Update\Patch\Data\Win32

Something like this should work:
	unpatchedPath=toc.fullpath.replace(r"patched\bf4\Update\Patch\Data\Win32",r"bf4\Data\Win32")

Alright, I got a working version.


Time for a format description:
	The sbtoc (superbundle/table of contents) format has a new way of handling patched cascat files.
	Previously, the sbtoc did not contain any patch specific info. The most sensible approach was to
	check if the archives were located in the Update folder and use the patched cascat in that case.
	Now however the sbtoc contains metadata to handle a rather wide range of different types of patches.

	Unchanged from the previous format is the cas flag (found in the toc) which states that all bundles
	inside the sb corresponding to the toc have their assets stored in the cascat archives.
	If the flag does not exist or is set to false, the assets are instead directly stored in the sb file.

	For a cas-enabled sbtoc, the toc does now give additional metadata for every single bundle.
	Each patched bundle may have either a base or delta flag.

	If the base flag is set (for the bundle), then the entire bundle does not require any patching. Note that the game
	apparently relies on the patched files only, which then make references back to the unpatched files.
	The sbtoc in the Update folder contain all the necessary info to retrieve files from the unpatched archives.

	If the delta flag is set (for the bundle), then a casPatchType is specified for each
	file within the bundle, which may take one of three values:
		casPatchType 0 has the sha1 in the unpatched cat.
		casPatchType 1 may have the sha1 in either the unpatched or the patched cat.
		casPatchType 2 has the base in the unpatched cat and the delta in the patched cat.

	If the type is not specified, assume type 0.

	casPatchType 2 defines two more variables, baseSha1 and deltaSha1, which specify the
	sha1s of the unpatched (base) file in the unpatched cascat and of the delta file
	in the patched cascat. The delta file contains the info to patch the base file.

	Note that an ordinary (third) sha1 is still specified. That sha1 belongs to the compressed
	patched file. This is rather odd because the patching process is applied to the
	decompressed file. As both the base sha1 and delta sha1 are given though, there is
	no way to bypass file integrity checks (assuming there are any).


	casPatchType 2 in detail:
		The operations described in the delta file rely on the individual LZ77 blocks.
		It is not possible to decompress the base completely and then apply the patch,
		nor is it possible to apply the patch to the compressed base file.

		A typical delta file contains several blocks with no global header.
		The file is in big endian.

		Each block starts with:
			0.5 bytes deltaType
			3.5 bytes deltaBlockSize/blockCount

		The 3.5 bytes specify a blockCount for all deltaTypes except type 2 (which is the most common type).
		I've sorted the types by frequency of occurence.

		deltaType 2:
			(This type contains information about lots of small changes, usually less than ff bytes,
			which are applied to the base block to obtain the patched block.)

			In the base file, decompress a single block. Hereafter, when talking about
			the base block I mean the decompressed base block. The compression is of no importance.

			In the delta file, read 2 bytes: The expected size of the resulting patched block.
			Add 1 to obtain the actual size, so e.g. ffff becomes 10000.

			In the delta file, read all operations belonging to this delta block.
			Its size is given by deltaBlockSize (note that the previous bytes do not count towards the size).
			Each operation has this structure:
				First come 4 bytes:
					2 bytes offset
					1 byte skipCount
					1 byte addCount

				Write parts of the base block to the patched file until
				the offset given above is reached in the base block.

				In the base block, proceed the position by skipCount bytes.
				Those bytes must not appear in the patched file.

				Read addCount bytes from the delta file and add them to the patched file.

			Finally, read as many bytes necessary from the base block
			until the patched file size equals the expected size (as read from the delta file).

		deltaType 1:
			(This type is contains information about larger changes made to the file.
			The delta itself contains LZ77 blocks which are decompressed and used to
			substitute large chunks of the base block.)

			In the base file, decompress a single block.

			Iterate blockCount times doing the following:
				Read 4 bytes from the delta file:
					2 bytes offset
					2 bytes skipCount

				Write parts of the base block to the patched file until
				the offset given above is reached in the base block.

				Read one LZ77 block in the delta file and add it to the patched file.

				In the base file, proceed the position by skipCount bytes.
				Those bytes must not appear in the patched file.

			Finally, add all remaining bytes of the base block to the patched file.

		deltaType 0:
			In the base file, decompress a number of blocks (given by blockCount)
			and add them to the patched file directly.

		deltaType 3:
			In the delta file, read a number of LZ77 blocks (given by blockCount) and add them to the patched file.
			This is the only operation which does not depend on the base file at all.

		deltaType 4:
			In the base file, skip this number of blocks entirely, i.e. move past them
			but do not add them to the patched file.


The relevant casPatchType 2 section:
    else: #casPatchType == 2
        baseSha1=entry.elems["baseSha1"].content
        deltaSha1=entry.elems["deltaSha1"].content

        deltaEntry,deltaStream=cat2.getCas(deltaSha1)
        baseEntry,compressedStream=cat.getCas(baseSha1)

        patchedStream=open2(outPath,"wb") #write the new file directly
        while deltaStream.tell()-deltaEntry.offset<deltaEntry.size:
            tmpSize=unpack(">I",deltaStream.read(4))[0]
            deltaType=tmpSize>>28
            deltaBlockSize=tmpSize&0xfffffff #if type!=2, this number actually specifies repetitions. Type 2 is by far the most frequent type so the name is justified

            if deltaType==2:
                #small changes (usually less than FF bytes) within a single decompressed base block
                baseBlockStream=decompressLZ77BlockWrapper(compressedStream)
                patchedBlockSize=unpack(">H",deltaStream.read(2))[0]+1 #usually equals the uncompressed base block, but not always
                deltaPos0=deltaStream.tell()

                currentPatchedBlockSize=0 #use this size to calculate the remaining base bytes to read (after the loop)
                while deltaStream.tell()-deltaPos0 < deltaBlockSize: #go through the individual changes described within the delta block
                    #parse the delta
                    offset=unpack(">H",deltaStream.read(2))[0]
                    skipCount,addCount=unpack("BB",deltaStream.read(2))
                    addBytes=deltaStream.read(addCount)

                    sizeUntilOffset=offset-baseBlockStream.tell()
                    patchedStream.write(baseBlockStream.read(sizeUntilOffset)) #write bytes that require no modification
                    baseBlockStream.seek(skipCount,1) #seek past the skip bytes
                    patchedStream.write(addBytes) #add the bytes

                    currentPatchedBlockSize+=(sizeUntilOffset+addCount)

                #read as many bytes necessary until the patchedStream has the correct size
                patchedStream.write(baseBlockStream.read(patchedBlockSize-currentPatchedBlockSize))

            elif deltaType==1:
                #similar to type 2 but with compressed delta payload and the replacement of larger sections
                baseBlockStream=decompressLZ77BlockWrapper(compressedStream)
                for i in xrange(deltaBlockSize):
                    offset,skipCount=unpack(">HH",deltaStream.read(4))
                    addBytes=decompressLZ77BlockWrapper(deltaStream).getvalue()

                    sizeUntilOffset=offset-baseBlockStream.tell()
                    patchedStream.write(baseBlockStream.read(sizeUntilOffset))
                    patchedStream.write(addBytes)
                    baseBlockStream.seek(skipCount,1)
                patchedStream.write(baseBlockStream.read())

            elif deltaType==4:
                #skip these base blocks entirely. (manually seeking should be faster than using the decompression script)
                for i in xrange(deltaBlockSize):
                    decompressedSize, compressionType, compressedSize = unpack(">IHH",compressedStream.read(8))
                    if compressionType in (0x70,0x71): compressedStream.seek(decompressedSize,1)
                    elif compressionType==0x970: compressedStream.seek(compressedSize,1)
                    elif compressionType==0: asdf

            elif deltaType==0:
                #read the base blocks entirely without modifying them
                for i in xrange(deltaBlockSize):
                    patchedStream.write(decompressLZ77BlockWrapper(compressedStream).read())

            elif deltaType==3:
                #add payload in between base blocks. This is the only type that does not depend on the base file at all
                for i in xrange(deltaBlockSize):
                    patchedStream.write(decompressLZ77BlockWrapper(deltaStream).read())

            else: asdf

        patchedStream.close()
        deltaStream.close()