Advertisement
Tritonio

Lar 1.1

Aug 5th, 2019
554
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Lua 29.02 KB | None | 0 0
  1. #!/usr/bin/lua
  2. local FORMAT_VERSION=1
  3. local PROGRAM_VERSION=1
  4. local HASH_SIZE=64
  5. math.randomseed(io.popen("tr -cd \"[:digit:]\" < /dev/urandom | head -c 14","r"):read("*n")-os.time())
  6. local fifofn="/tmp/lar-fifo-"..math.random()
  7. local blockdir="./lar-blocks-"..math.random()
  8. local total=0
  9. local saved=0
  10. local totalblocks=0
  11. local cut=string.char(
  12.     146,29,145,10,
  13.     42,57,141,0,
  14.     201,198,43,27,
  15.     124,130,66,183
  16. )
  17.  
  18. ---------------
  19. --- Helpers ---
  20. ---------------
  21. function showusage()
  22.     io.stderr:write([==[
  23. lar - Liquid Archive version ]==]..FORMAT_VERSION.."."..PROGRAM_VERSION..[==[
  24.  
  25. Lar reads archives from stdin and writes archives to stdout. Existing files are
  26. overwritten during extraction, if permissions allow that. Liquid archives are
  27. designed to be streamable both during creation and during extraction. They also
  28. eliminate redundancy on a 1MiB block level (or smaller for files smaller than
  29. 1MiB). A side effect of these two design goals is that during extraction with
  30. inclusion or exclusion filters, all the data needs to extracted on disk and then
  31. partially discarded. Lar ignores file ownership both when archiving and when
  32. extracting. For excellent and efficient compression of lar archives we suggest
  33. pipping the output through "mbuffer -q -m 1G" and then through "lzip -9" if you
  34. have these tools available.
  35.  
  36. Usage: lar -c [-v] [-p|-P] [-B SIZ] [-s SNC] [-d NUM] [-b DIR] [-i INC] [-e EXC]
  37.        lar -x [-v] [-p|-P] [-d NUM] [-i INC] [-e EXC] [-f]
  38.        lar -l [-v] [-p|-P] [-d NUM] [-i INC] [-e EXC]
  39.        lar -m [-v]
  40.        lar -C N,T
  41.        lar -H [-v]
  42.  
  43.     -c  Create archive. Outputs to stdout an archive of DIR as seen from
  44.         the current directory.
  45.  
  46.     -x  Extract archive. Reads an archive form stdin and extracts the
  47.         contents to the current working directory.
  48.  
  49.     -l  List contents. Same as extract but it will not touch your local
  50.         filesystem. You can even combine it with -i and -e to view just
  51.         some of the files. It does not fully test the archive for all
  52.         kinds of inconsistencies though. Some will only be detected when
  53.         you actually extract the archive.
  54.  
  55.     -m  Create an empty archive, useful if you want to create empty base
  56.         archives for a differential archive.
  57.  
  58.     -i  Only include files and directories that match the INC Lua
  59.         pattern. Works with -c, -x and -l. Additionally, "|" may be used
  60.         to separate multiple Lua patterns as a logical OR.
  61.         https://www.lua.org/manual/5.3/manual.html#6.4.1
  62.  
  63.     -e  Exclude files that match the EXC Lua pattern. Works with -c,
  64.         -x and -l.
  65.  
  66.     -v  Be more verbose.
  67.  
  68.     -b  DIR is the directory or file that will be added to the archive.
  69.         It can only be used with -c.
  70.  
  71.     -d  Create or extract a differential archive. You need to
  72.         sequentially pipe NUM base archives in the stdin when you create
  73.         a differential archive. The same base archives need to be passed
  74.         in the stdin (in any order) when you want to extract the
  75.         resulting differential archive. Differential archives will not
  76.         repeat blocks of data that exist in the base archives, so they
  77.         are ideal for incremental backups. You may use differential
  78.         archives as base archives for a new differential archive. Doing
  79.         so will not cause the new differential archive to require the
  80.         base archives of its base archives during its extraction. This
  81.         allows you to create a sequence of differential archives where
  82.         each one depends on the NUM previous archives.
  83.  
  84.     -p  Create a text only archive that contains only printable
  85.         characters and whitespace at the cost of an increased archive
  86.         size and slower archival. It can only be combined with -c.
  87.  
  88.     -P  Same as -p but this time even whitespace is disallowed.
  89.  
  90.     -f  Force extraction of files with missing blocks. This will allow
  91.         you to partially extract a differential archive even if some of
  92.         its base archives are missing.
  93.  
  94.     -B  Change the block size to SIZ. Default is 1048576 (1MiB). Bigger
  95.         blocks are faster but deduplication is done at the block level
  96.         so smaller blocks (but not very small) will result in more data
  97.         savings. Don't change this if you are not sure. It can only be
  98.         used with -c.
  99.  
  100.     -s  Enable self-synchronizing block splitting for files matching the
  101.         SNC Lua pattern. Data blocks will not be of a fixed size
  102.         anymore, they will vary between 66% and 134% of the size defined
  103.         by -B (or the default 1MiB). The sizes are picked in such a way
  104.         that if you have two files with different sizes but a sustantial
  105.         amount of common data at their end, there is a high probability
  106.         that the blocks will synchronize, improving the deduplication of
  107.         data. This is especially useful if, for example, you are trying
  108.         to make differential backups of a MySQL dump file every day when
  109.         only a little data changes but also the size of the file
  110.         changes. Archival will be much slower. It can only be used
  111.         with -c.
  112.  
  113.         Suppose you have these two data streams:
  114.         abdefghijklmnopqrstuvwxyz
  115.         abdefghijklm123nopqrstuvwxyz
  116.  
  117.         If you split them with fixed size blocks they will look like:
  118.         ab|de|fg|hi|jk|lm|no|pq|rs|tu|vw|xy|z
  119.         ab|de|fg|hi|jk|lm|12|3n|op|qr|st|uv|wx|yz
  120.  
  121.         Deduplication will work fine for the blocks before the digits
  122.         but after the digits the blocks are offset in such a way that
  123.         they will never resynchronize and even though the ends of both
  124.         streams are the same, deduplication is impossible.
  125.  
  126.         Self-synchronizing blocks will split these data streams like:
  127.         ab|def|g|h|ij|kl|mno|pq|rs|t|u|vwx|yz
  128.         ab|def|g|h|ij|kl|m1|23n|op|q|rs|t|u|vwx|yz
  129.  
  130.         The blocks eventually resynchronize (after "rs" in the example)
  131.         so deduplication of the end of this data stream is possible.
  132.  
  133.     -C  This is a handy calculator for differential backups. N is the
  134.         number that you intend to pass to -d, i.e. the number of base
  135.         archives that each of your differential archives will be based
  136.         on. T is the number of differential archives that you intend to
  137.         keep. Type those two numbers in (e.g. -C 4,8) and the calculator
  138.         will give you information about how much space they will take
  139.         and how many archives will be recoverable.
  140.  
  141.     -H  If you pipe a Liquid archive through "lar -H", you will get
  142.         a special "thin" version of the archive in the output that
  143.         contains only the hashes of the blocks in the input. This can be
  144.         used to greatly reduce the amount of data transmitted over the
  145.         wire in cases where you are creating a differential backup with
  146.         the base files being piped over the network as it will move the
  147.         block hashing to the remote server instead of transferring the
  148.         data to be hashed locally.
  149.  
  150. Examples:
  151.  
  152. lar -cv -b Images > Images.lar
  153.     Archive all files in the Images folder and name the archive Images.lar.
  154.     Verbose output.
  155.  
  156. lar -xv < Images.lar
  157.     Recreate the Images folder that archived in the previous example under
  158.     the current working directory. Verbose output.
  159.  
  160. lar -xv -i '%.c$|%.h$' -e 'example' < code.lar
  161.     Extract the archive code.lar into the current directory. Only extract
  162.     files and directories that end with ".c" or ".h". Do not extract files
  163.     or directories that include the word "example" in their full path.
  164.  
  165. cat old.lar older.lar | lar -cd 2 -b Images > new.lar
  166.     Archive all files in the Images folder and name the archive new.lar. The
  167.     output archive will be a differential archive and will not contain any
  168.     blocks of data that exist in old.lar and older.lar.
  169.  
  170. cat old.lar older.lar new.lar | lar -xd 2
  171.     Extract the archive that was created in the previous example. Only
  172.     new.lar will be extracted but old.lar and older.lar are needed because
  173.     they contain data that was omitted from new.lar during its creation. The
  174.     order of old.lar and older.lar may be reversed but new.lar, the archive
  175.     that you are actually trying to extract, must be the last one.
  176.  
  177. (cat DB2.lar.gz | gunzip; cat DB.lar.gz | gunzip) | lar -cvb DB -d 2 | gzip > DB3.lar.gz
  178.     Archive all files in the DB folder, pass through gzip to compress the
  179.     archive and name it DB3.lar.gz. The output archive will be a
  180.     differential archive and will not contain any blocks of data that exist
  181.     in DB2.lar.gz and DB.lar.gz.
  182.  
  183. cat yesterday.lar > lar -d 1 -cvb serverbackup/ -s '%.sql$|%.csv$' > today.lar
  184.     Archive all the files in the serverbackup folder and turn on
  185.     self-synchronizing blocks for files with ".sql" or ".csv". Do not
  186.     include the data blocks that exist in yesterday.lar so the resulting
  187.     today.lar will be a differential archive. Self-synchronization helps
  188.     with redundancy detection in files like SQL dumps or other files that
  189.     may have data inserted or removed from one archive to the next one.
  190.  
  191. (ssh 'user@192.168.0.5' "cat /home/user/backup1.lar.gz" | gunzip | lar -H; ssh 'user@192.168.0.5' "cat /home/user/backup2.lar.gz" | gunzip | lar -H; ) | lar -v -d 2 -c -b . | gzip | ssh 'root@192.168.0.5' "cat > /home/user/backup0.lar.gz"
  192. ssh 'root@192.168.0.5' "rm /home/user/backup5.lar.gz"
  193. ssh 'root@192.168.0.5' "mv /home/user/backup4.lar.gz /home/user/backup5.lar.gz"
  194. ssh 'root@192.168.0.5' "mv /home/user/backup3.lar.gz /home/user/backup4.lar.gz"
  195. ssh 'root@192.168.0.5' "mv /home/user/backup2.lar.gz /home/user/backup3.lar.gz"
  196. ssh 'root@192.168.0.5' "mv /home/user/backup1.lar.gz /home/user/backup2.lar.gz"
  197. ssh 'root@192.168.0.5' "mv /home/user/backup0.lar.gz /home/user/backup1.lar.gz"
  198.     Create a differential archive of the current directory with 2 base
  199.     archives. The base archives are stored on a remote server at 192.168.0.5
  200.     and are remotely processed with "lar -H" so that only the hashes of the
  201.     blocks they contain are transferred over the wire. The resulting
  202.     differential archive is also stored back at the same remote server.
  203.     After the archival is done the archives are renamed and 5 of them in
  204.     total are kept. backup1.lar.gz will be the newest archive and
  205.     backup5.lar.gz will be the oldest.
  206.  
  207.     To see what 5 differential archives (each based on the previous 2
  208.     archives) mean, you can run "lar -C 2,4" which will give you the
  209.     following output:
  210.  
  211.     If your differential archives are based on the last 2 archives (-d 2)
  212.     and you keep a total of 5 archives, then you should expect to have 3
  213.     recoverable archives. Archives older than the last 3 will not have all
  214.     their base archives available and you will therefore be unable to
  215.     extract them. You should expect that all 5 archives together will take
  216.     about the same space as 1.7 full size (non-differential) archives, but
  217.     in some cases they will take up about the same space as 2 full size
  218.     archives.
  219.  
  220.  
  221. Verbose output:
  222.  
  223. 3% (input=134B output=111.74MiB) ./readme.txt (regular file) [NNNHP]
  224. |          |              |            |            |           |
  225. |          |   Data written to stdout  |        File type       |
  226. |          |                    Current filename                |
  227. |  Data read from stdin                                         |
  228. |                            Current file's blocks (N=new, P=previously seen in
  229. Percentage of files done     current archive, H=previously seen in base archive)
  230.  
  231.  
  232. Copyright 2019-2020 Tritonio (www.inshame.com)
  233.  
  234. This program is free software: you can redistribute it and/or modify it under
  235. the terms of the GNU General Public License version 3 as published by the Free
  236. Software Foundation.
  237.  
  238. This program is distributed in the hope that it will be useful, but WITHOUT ANY
  239. WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
  240. PARTICULAR PURPOSE.  See the GNU General Public License for more details.
  241.  
  242. If you have not received a copy of the GNU General Public License along with
  243. this program then see https://www.gnu.org/licenses/gpl-3.0.txt or download it
  244. using BitTorrent: magnet:?xt=urn:btih:ftlm2r4zr3eepxypisejceebq7gx3wn7
  245.  
  246. ]==])
  247. end
  248. do
  249.     local last=0
  250.     function timehaspassed()
  251.         local now=os.time()
  252.         if now-last>=1 then
  253.             last=now
  254.             return true
  255.         else
  256.             return false
  257.         end
  258.     end
  259. end
  260. function round(n,d)
  261.     d=d or 0
  262.     return math.floor(0.5+n*10^d)/10^d
  263. end
  264. function formatbytes(bytes)
  265.     if bytes>=1024^4 then
  266.         return round(bytes/1024^4,2).."TiB"
  267.     elseif bytes>=1024^3 then
  268.         return round(bytes/1024^3,2).."GiB"
  269.     elseif bytes>=1024^2 then
  270.         return round(bytes/1024^2,2).."MiB"
  271.     elseif bytes>=1024 then
  272.         return round(bytes/1024,2).."KiB"
  273.     else
  274.         return bytes.."B"
  275.     end
  276. end
  277. function getext(filename)
  278.     return string.upper(string.match(filename,"[^/]%.([^/%.]+)$") or "")
  279. end
  280. function stripdirs(filename)
  281.     return string.upper(string.match(filename,"([^/]*)$"))
  282. end
  283.  
  284. function getdepth(filename)
  285.     return #string.gsub(filename,"[^/]+","")
  286. end
  287. function parselarheader(instream)
  288.     local magic=instream:read(3)
  289.     assert(magic,"Not enough data. Did you forget to pipe some file to stdin?")
  290.     assert(magic=="LAR","Not a Liquid Archive.")
  291.     local archiveversion=readnumber(instream)
  292.     assert(archiveversion==FORMAT_VERSION,"Unsupported Liquid Archive version: "..archiveversion)
  293. end
  294. do
  295.     local counters={}
  296.     function count(what,howmany)
  297.         counters[what]=(counters[what] or 0)+howmany
  298.     end
  299.     function getcounters()
  300.         local str={}
  301.         for i,v in pairs(counters) do
  302.             table.insert(str,i.."="..formatbytes(v))
  303.         end
  304.         return table.concat(str," ")
  305.     end
  306. end
  307. function reallyread(instream,size,couldbeless)
  308.     local data=""
  309.     while #data<size do
  310.         local moredata=instream:read(size-#data)
  311.         if not moredata then
  312.             if couldbeless then
  313.                 break
  314.             else
  315.                 error("Unexpected end of archive.")
  316.             end
  317.         end
  318.         data=(data or "")..moredata
  319.     end
  320.     if data=="" then
  321.         return nil
  322.     else
  323.         return data
  324.     end
  325. end
  326. do
  327.     local h={[0]="0","1","2","3","4","5","6","7","8","9","A","B","C","D","E","F"}
  328.     local hex={}
  329.     for i=0,255 do
  330.         hex[string.char(i)]=h[math.floor(i/16)]..h[i%16]
  331.     end
  332.     function totextual(data,nospaces)
  333.         local textual=string.gsub(data,"[^%w`!@#$%%%^&%*%(%)_%+%-=%[%]{};':\",%./<>%?\\|"..(nospaces and "" or " \x0A\x0D\t").."]",function (badchar)
  334.             return "~"..hex[badchar].."~"
  335.         end)
  336.         textual=string.gsub(textual,"~([^~])~",hex)
  337.         return string.gsub(textual,"~~","")
  338.     end
  339.     function fromtextual(text)
  340.         return string.gsub(text,"~(%x+)~",function (hexword)
  341.             return string.gsub(hexword,"%x%x",function (hh)
  342.                 return string.char(tonumber(hh,16))
  343.             end)
  344.         end)
  345.     end
  346.     assert(fromtextual(totextual("la r"))=="la r")
  347.     assert(fromtextual(totextual("\0la\nr~a\0a\0a\0aaa\0aa\0"))=="\0la\nr~a\0a\0a\0aaa\0aa\0")
  348.     assert(fromtextual(totextual("la r",true))=="la r")
  349.     assert(fromtextual(totextual("\0la\nr~a\0a\0a\0aaa\0aa\0",true))=="\0la\nr~a\0a\0a\0aaa\0aa\0")
  350. end
  351. function multimatches(str,patterns)
  352.     for pattern in string.gmatch(patterns,"[^|]+") do
  353.         if string.match(str,pattern) then
  354.             return true
  355.         end
  356.     end
  357.     return false
  358. end
  359. function exploitcheck(filename)
  360.     if string.match(filename,"^%.%.") or string.match(filename,"^/") or string.match(filename,"/%.%.[%W ]") or string.match(filename,"//") then
  361.         error("Invalid filename: "..filename..' Filenames must be not be absolute and must not contain parent directories (i.e. "..") and double slashes.')
  362.     end
  363. end
  364. function escapequotes(filename)
  365.     return string.gsub(filename,"'","'\"'\"'")
  366. end
  367. function parent(filename)
  368.     return string.match(filename,"^(.-)/[^/]*$")
  369. end
  370.  
  371. ---------------------
  372. --- Data encoding ---
  373. ---------------------
  374. function writedatapacket(data,outstream,textual,nospaces)
  375.     data=textual and totextual(data,nospaces) or data
  376.     outstream:write(#data..">")
  377.     outstream:write(data)
  378. end
  379. function writenumber(number,outstream)
  380.     outstream:write(tonumber(number).."|")
  381. end
  382. function readnumber(instream)
  383.     --io.stderr:write("RN"..instream:seek().."\n")
  384.     local n=tonumber(instream:read("*n"))
  385.     assert(n,"Corrupt archive. Unexpected non-numerical data.")
  386.     assert(instream:read(1)=="|","Corrupt archive. Unexpected number separator.")
  387.     return n
  388. end
  389. function readdatapacket(instream,textual)
  390.     --io.stderr:write("RDP"..instream:seek().."\n")
  391.     local size=tonumber(instream:read("*n"))
  392.     assert(size,"Corrupt archive. Missing data packet size.")
  393.     assert(instream:read(1)==">","Corrupt archive. Unexpected data packet separator.")
  394.     local somedata=reallyread(instream,size)
  395.     return textual and fromtextual(somedata) or somedata
  396. end
  397.  
  398. --------------------------
  399. --- Modes of operation ---
  400. --------------------------
  401. function create(dir,instream,outstream,include,exclude,verbose,textual,nospaces,differential,blocksize,selfsync)
  402.     if verbose then io.stderr:write("Adding what matches \""..include.."\" and does not match \""..exclude.."\"...\n") end
  403.     textual=textual or nospaces --nospaces implies textual
  404.     outstream:write("LAR")
  405.     writenumber(FORMAT_VERSION,outstream)
  406.     for i=1,differential do
  407.         if verbose then io.stderr:write("("..getcounters()..") Reading blocks from base archive number "..i.."...\n") end
  408.         parselarheader(instream)
  409.         while not witnessstream(instream,true) do
  410.             if verbose and timehaspassed() then io.stderr:write("("..getcounters()..") Still reading blocks from base archive number "..i.."...\n") end
  411.         end
  412.     end
  413.     local filenames=listfiles(dir,include,exclude,verbose)
  414.     if verbose then io.stderr:write("("..getcounters()..") Creating archive...\n") end
  415.     for i,filename in ipairs(filenames) do
  416.         addfile(filename,outstream,textual,nospaces,round(i/#filenames*100),blocksize,selfsync)
  417.     end
  418.     flushcommandbuffer()
  419.     outstream:write(".")
  420. end
  421. function extract(instream,include,exclude,verbose,dryrun,differential,thinoutputstream,force)
  422.     if verbose then
  423.         if not thinoutputstream then
  424.             io.stderr:write("Extracting "..(dryrun and "(dry run) " or "").."what matches \""..include.."\" and does not match \""..exclude.."\"...\n")
  425.             if force then
  426.                 io.stderr:write("Forcing extraction of files even if some of their blocks are missing...\n")
  427.             end
  428.         else
  429.             io.stderr:write("Hashing blocks in input to create a thin version in output...\n")
  430.         end
  431.     end
  432.     for i=1,differential do
  433.         if verbose then io.stderr:write("("..getcounters()..") Reading blocks from base archive number "..i.."...\n") end
  434.         parselarheader(instream)
  435.         while not witnessstream(instream,dryrun) do
  436.             if verbose and timehaspassed() then io.stderr:write("("..getcounters()..") Still reading blocks from base archive number "..i.."...\n") end
  437.         end
  438.     end
  439.     if verbose then io.stderr:write("("..getcounters()..") Reading archive...\n") end
  440.     parselarheader(instream)
  441.     if thinoutputstream then
  442.         thinoutputstream:write("LAR")
  443.         writenumber(FORMAT_VERSION,thinoutputstream)
  444.     end
  445.     local commandbuffer={}
  446.     while true do
  447.         local stop,delayedcommand=executestream(instream,include,exclude,dryrun,thinoutputstream,force)
  448.         if stop then break end
  449.         table.insert(commandbuffer,delayedcommand)
  450.     end
  451.     if thinoutputstream then
  452.         thinoutputstream:write(".")
  453.     end
  454.     if verbose then io.stderr:write("Applying "..(dryrun and "(dry run) " or "").."permissions, modification dates etc...\n") end
  455.     table.sort(commandbuffer,function (a,b) return a[1]>a[1] end)
  456.     for _,command in ipairs(commandbuffer) do
  457.         command[2]()
  458.     end
  459. end
  460. do
  461.     local hashedblocks={}
  462.     local hbc=0
  463.     function witnessstream(instream,justlook)
  464.         local chunktype=instream:read(1)
  465.         if not chunktype then
  466.             error("Unexpected end of archive.")
  467.         elseif chunktype=="T" or chunktype=="B" then
  468.             local blockdata=readdatapacket(instream,chunktype=="T")
  469.             local ha=hash(blockdata)
  470.             local fn=blockdir.."/w"..hbc
  471.             hbc=hbc+1
  472.             hashedblocks[ha]=fn
  473.             if not justlook then
  474.                 local bh=io.open(fn,"wb")
  475.                 bh:write(blockdata)
  476.                 bh:close()
  477.             end
  478.         elseif chunktype=="t" then
  479.             local ha=readdatapacket(instream,false)
  480.             hbc=hbc+1
  481.             hashedblocks[ha]=true
  482.             if not justlook then
  483.                 error("You are trying to extract a differential archive but instead of passing the actual base archives in the input you are passing the thin versions of them created by \"lar -H\". The actual data on the base archives is needed so you should pass the actual base archives.")
  484.             end
  485.         elseif chunktype=="H" then
  486.             readdatapacket(instream,true)
  487.         elseif chunktype=="D" then
  488.             readdatapacket(instream,true)
  489.             readnumber(instream)
  490.             readnumber(instream)
  491.         elseif chunktype=="P" then
  492.             readdatapacket(instream,true)
  493.             readnumber(instream)
  494.             readnumber(instream)
  495.         elseif chunktype=="S" then
  496.             readdatapacket(instream,true)
  497.             readnumber(instream)
  498.             readdatapacket(instream,true)
  499.         elseif chunktype=="F" then
  500.             readdatapacket(instream,true)
  501.             readnumber(instream)
  502.             readnumber(instream)
  503.             local blockcount=readnumber(instream)
  504.             for i=1,blockcount do
  505.                 readnumber(instream)
  506.             end
  507.         elseif chunktype=="." then
  508.             return true
  509.         else
  510.             error("Corrupt archive. Unknown chunk type: "..chunktype)
  511.         end
  512.     end
  513.     function iswitnessedblock(ha)
  514.         return not not hashedblocks[ha]
  515.     end
  516.     function getwitnessedblock(ha,force)
  517.         local bfn=hashedblocks[ha]
  518.         if bfn then
  519.             local bh=io.open(bfn,"rb")
  520.             local blockdata=bh:read("*a")
  521.             bh:close()
  522.             return blockdata,bfn
  523.         else
  524.             if force then
  525.                 return "",false
  526.             else
  527.                 error("Unwitnessed hashed block. You are trying to extract a differential archive but you are probably missing one of its base archives.")
  528.             end
  529.         end
  530.     end
  531. end
  532. do
  533.     local blocks={}
  534.     function printfileinfoline(filename,ftype,dontprintcounters)
  535.         io.stderr:write((dontprintcounters and "" or ("("..getcounters()..") "))..filename.." ("..ftype..")\n")
  536.     end
  537.     function executestream(instream,include,exclude,dryrun,thinoutputstream,force)
  538.         local chunktype=instream:read(1)
  539.         if not chunktype then
  540.             error("Unexpected end of archive.")
  541.         elseif chunktype=="T" or chunktype=="B" then
  542.             local blockdata=readdatapacket(instream,chunktype=="T")
  543.             local blockid=#blocks+1
  544.             local block={tmp=true,location=blockdir.."/"..blockid,offset=0,size=#blockdata}
  545.             if thinoutputstream then
  546.                 thinoutputstream:write("t")
  547.                 writedatapacket(hash(blockdata),thinoutputstream,false,false)
  548.             else
  549.                 if not dryrun then
  550.                     local bh=io.open(blockdir.."/"..blockid,"wb")
  551.                     bh:write(blockdata)
  552.                     bh:close()
  553.                 end
  554.             end
  555.             table.insert(blocks,block)
  556.             if verbose and timehaspassed() then io.stderr:write("("..getcounters()..") Still reading archive...\n") end
  557.         elseif chunktype=="H" then
  558.             local ha=readdatapacket(instream,true)
  559.             if not thinoutputstream then
  560.                 local blockdata,location
  561.                 if not dryrun then blockdata,location=getwitnessedblock(ha,force) end
  562.                 local blockid=#blocks+1
  563.                 local block={tmp=true,location=location,offset=0,size=not dryrun and #blockdata}
  564.                 table.insert(blocks,block)
  565.                 if verbose and timehaspassed() then io.stderr:write("("..getcounters()..") Still reading archive...\n") end
  566.             end
  567.         elseif chunktype=="t" then
  568.             error("You are trying to extract or list the contents of a thin archive created with the -H option. Thin archives do not contain data so they can only be used instead of base archives when creating differential archives.")
  569.         elseif chunktype=="D" then
  570.             local filename=readdatapacket(instream,true)
  571.             exploitcheck(filename)
  572.             local attrs=readnumber(instream)
  573.             local modtimestamp=readnumber(instream)
  574.             if not thinoutputstream then
  575.                 if multimatches(filename,exclude) or not multimatches(filename,include) then return end
  576.                 if not dryrun then mkdir(filename) end
  577.                 if verbose or dryrun then printfileinfoline(filename,"directory",dryrun and not verbose) end
  578.                 return false,{getdepth(filename),function ()
  579.                     if not dryrun then
  580.                         setattrs(filename,attrs)
  581.                         settimestamp(filename,modtimestamp)
  582.                     end
  583.                 end}
  584.             end
  585.         elseif chunktype=="P" then
  586.             local filename=readdatapacket(instream,true)
  587.             exploitcheck(filename)
  588.             local attrs=readnumber(instream)
  589.             local modtimestamp=readnumber(instream)
  590.             if not thinoutputstream then
  591.                 if multimatches(filename,exclude) or not multimatches(filename,include) then return end
  592.                 if not dryrun then
  593.                     mkdir(parent(filename))
  594.                     mkfifo(filename)
  595.                 end
  596.                 if verbose or dryrun then printfileinfoline(filename,"fifo",dryrun and not verbose) end
  597.                 return false,{getdepth(filename),function ()
  598.                     if not dryrun then
  599.                         setattrs(filename,attrs)
  600.                         settimestamp(filename,modtimestamp)
  601.                     end
  602.                 end}
  603.             end
  604.         elseif chunktype=="S" then
  605.             local filename=readdatapacket(instream,true)
  606.             exploitcheck(filename)
  607.             local modtimestamp=readnumber(instream)
  608.             local target=readdatapacket(instream,true)
  609.             if not thinoutputstream then
  610.                 if multimatches(filename,exclude) or not multimatches(filename,include) then return end
  611.                 if not dryrun then
  612.                     mkdir(parent(filename))
  613.                 end
  614.                 if verbose or dryrun then printfileinfoline(filename,"symbolic link",dryrun and not verbose) end
  615.                 return false,{getdepth(filename),function ()
  616.                     if not dryrun then
  617.                         mksymlink(filename,target)
  618.                         settimestamp(filename,modtimestamp)
  619.                     end
  620.                 end}
  621.             end
  622.         elseif chunktype=="F" then
  623.             local filename=readdatapacket(instream,true)
  624.             exploitcheck(filename)
  625.             local attrs=readnumber(instream)
  626.             local modtimestamp=readnumber(instream)
  627.             local blockcount=readnumber(instream)
  628.             if thinoutputstream or multimatches(filename,exclude) or not multimatches(filename,include) then
  629.                 for i=1,blockcount do
  630.                     readnumber(instream)
  631.                 end
  632.                 return
  633.             end
  634.             if not dryrun then
  635.                 mkdir(parent(filename))
  636.                 local fh=io.open(filename,"wb")
  637.                 local fhoffset=0
  638.                 for i=1,blockcount do
  639.                     local blockid=readnumber(instream)
  640.                     local block=blocks[blockid]
  641.                     if not block then
  642.                         error("Corrupt archive. Invalid block id.")
  643.                     else
  644.                         if block.location~=false then
  645.                             local bh=io.open(block.location,"rb")
  646.                             bh:seek("set",block.offset)
  647.                             local lastoffset=fh:seek()
  648.                             local blockdata=reallyread(bh,block.size,true)
  649.                             fh:write(blockdata)
  650.                             bh:close()
  651.                             if block.tmp then
  652.                                 deletefile(block.location)
  653.                                 fh:flush() --flush otherwise we may not find the block later in the new location
  654.                                 block.location=filename
  655.                                 block.offset=lastoffset
  656.                                 block.tmp=false
  657.                             end
  658.                         end
  659.                     end
  660.                 end
  661.                 fh:close()
  662.             else
  663.                 for i=1,blockcount do
  664.                     readnumber(instream)
  665.                 end
  666.             end
  667.             if verbose or dryrun then printfileinfoline(filename,"regular file",dryrun and not verbose) end
  668.             return false,{getdepth(filename),function ()
  669.                 if not dryrun then
  670.                     setattrs(filename,attrs)
  671.                     settimestamp(filename,modtimestamp)
  672.                 end
  673.             end}
  674.         elseif chunktype=="." then
  675.             return true
  676.         else
  677.             error("Corrupt archive. Unknown chunk type.")
  678.         end
  679.     end
  680. end
  681. do
  682.     local known={}
  683.     local hasharray={}
  684.     local commandbuffer={}
  685.     function addfile(filename,outstream,textual,nospaces,progress,blocksize,selfsync)
  686.         local fileinfo=getfileinfo(filename)
  687.         if not fileinfo then
  688.             if verbose then io.stderr:write("File disappeared: "..filename.."\n") end
  689.             return
  690.         end
  691.         fileinfostring=progress.."% ("..getcounters()..")".." "..filename.." ("..(fileinfo.type or "unknown type")..") ["
  692.         if fileinfo.type=="regular file" or fileinfo.type=="regular empty file" then
  693.             if verbose then io.stderr:write(fileinfostring) end
  694.             local fh=io.open(filename,"rb")
  695.             if not fh then
  696.                 io.stderr:write((verbose and "]\n" or "").."File disappeared: "..filename.."\n")
  697.                 return
  698.             end
  699.             local fileblockindices={}
  700.             local leftovers=""
  701.             --[[
  702.             function morph(t)
  703.                 local r={} for l in string.gmatch(t,".") do table.insert(r,string.byte(l)) end
  704.                 return table.concat(r,",")
  705.             end
  706.             --]]
  707.             while 1 do
  708.                 local buf
  709.                 if selfsync and multimatches(filename,selfsync) then
  710.                     buf=reallyread(fh,math.floor(blocksize*1.25)-#leftovers,true)
  711.                     if not buf and leftovers=="" then break end
  712.                     buf=leftovers..(buf or "")
  713.                     local best,winner=string.rep("\0",17),#buf --was +1
  714.                     for i=math.floor(0.66*blocksize),math.min(#buf,math.floor(1.34*blocksize)) do
  715.                         local candidate=string.sub(buf,i-15,i)
  716.                         if #candidate==16 and ((candidate>=best and (best>cut or cut>=candidate)) or (candidate<=cut and best>cut)) then
  717.                             best=candidate
  718.                             winner=i
  719.                         end
  720.                     end
  721.                     buf,leftovers=string.sub(buf,1,winner+#best),string.sub(buf,winner+#best+1)
  722.                     --io.stderr:write(morph(best).."|"..#buf.."\n")
  723.                     --string.sub(buf,-16)
  724.                 else
  725.                     buf=reallyread(fh,blocksize,true)
  726.                     if not buf then break end
  727.                 end
  728.                 local bufhash=hash(buf)
  729.                 if not known[bufhash] then
  730.                     totalblocks=totalblocks+1
  731.                     if #bufhash==HASH_SIZE and iswitnessedblock(bufhash) then
  732.                         if verbose then io.stderr:write("H") end
  733.                         outstream:write("H")
  734.                         writedatapacket(bufhash,outstream,true,nospaces)
  735.                         saved=saved+#buf-#bufhash
  736.                     else
  737.                         if verbose then io.stderr:write("N") end
  738.                         if textual then
  739.                             outstream:write("T")
  740.                             writedatapacket(buf,outstream,true,nospaces)
  741.                         else
  742.                             outstream:write("B")
  743.                             writedatapacket(buf,outstream,false,nospaces)
  744.                         end
  745.                     end
  746.                     table.insert(hasharray,bufhash)
  747.                     known[bufhash]=#hasharray
  748.                 else
  749.                     if verbose then io.stderr:write("P") end
  750.                     saved=saved+#buf
  751.                 end
  752.                 total=total+#buf
  753.                 table.insert(fileblockindices,known[bufhash])
  754.             end
  755.             fh:close()
  756.             table.insert(commandbuffer,function ()
  757.                 outstream:write("F")
  758.                 writedatapacket(filename,outstream,true,nospaces)
  759.                 writenumber(fileinfo.attrs,outstream)
  760.                 writenumber(fileinfo.modts,outstream)
  761.                 writenumber(#fileblockindices,outstream)
  762.                 for i,blockhash in ipairs(fileblockindices) do
  763.                     writenumber(blockhash,outstream)
  764.                 end
  765.             end)
  766.         elseif fileinfo.type=="directory" then
  767.             if verbose then io.stderr:write(fileinfostring) end
  768.             table.insert(commandbuffer,function ()
  769.                 outstream:write("D")
  770.                 writedatapacket(filename,outstream,true,nospaces)
  771.                 writenumber(fileinfo.attrs,outstream)
  772.                 writenumber(fileinfo.modts,outstream)
  773.             end)
  774.         elseif fileinfo.type=="fifo" then
  775.             if verbose then io.stderr:write(fileinfostring) end
  776.             table.insert(commandbuffer,function ()
  777.                 outstream:write("P")
  778.                 writedatapacket(filename,outstream,true,nospaces)
  779.                 writenumber(fileinfo.attrs,outstream)
  780.                 writenumber(fileinfo.modts,outstream)
  781.             end)
  782.         elseif fileinfo.type=="symbolic link" then
  783.             if verbose then io.stderr:write(fileinfostring) end
  784.             local symbolictarget=getsymbolictarget(filename)
  785.             if not symbolictarget then
  786.                 io.stderr:write((verbose and "]\n" or "").."File disappeared: "..filename.."\n")
  787.                 return
  788.             end
  789.             table.insert(commandbuffer,function ()
  790.                 outstream:write("S")
  791.                 writedatapacket(filename,outstream,true,nospaces)
  792.                 writenumber(fileinfo.modts,outstream)
  793.                 writedatapacket(symbolictarget,outstream,true,nospaces)
  794.             end)
  795.         end
  796.         if verbose then io.stderr:write("]\n") end
  797.         if #commandbuffer>=10000 then
  798.             flushcommandbuffer()
  799.         end
  800.     end
  801.     function flushcommandbuffer()
  802.         for _,command in ipairs(commandbuffer) do
  803.             command()
  804.         end
  805.         commandbuffer={}
  806.     end
  807. end
  808.  
  809. ---------------------
  810. --- OS / external ---
  811. ---------------------
  812. function getfileinfo(filename)
  813.     local p=io.popen("stat -c '%a|%Y|%F' '"..escapequotes(filename).."'")
  814.     if p then
  815.         local infoline=p:read("*l")
  816.         if not infoline or infoline=="" then return false end
  817.         local attrs,modts,typ=string.match(infoline,"([^|]+)|([^|]+)|([^|]+)")
  818.         p:close()
  819.         return {attrs=attrs,modts=modts,type=typ}
  820.     else
  821.         return false
  822.     end
  823. end
  824. --[[
  825. function getowner(filename)
  826.     local p=io.popen("stat -c %u '"..escapequotes(filename).."'")
  827.     local owner=p:read("*l")
  828.     p:close()
  829.     return owner
  830. end
  831. function getattrs(filename)
  832.     local p=io.popen("stat -c %a '"..escapequotes(filename).."'")
  833.     local attrs=p:read("*l")
  834.     p:close()
  835.     return attrs
  836. end
  837. function getmodtimestamp(filename)
  838.     local p=io.popen("stat -c %Y '"..escapequotes(filename).."'")
  839.     local modts=p:read("*l")
  840.     p:close()
  841.     return modts
  842. end
  843. function getfiletype(filename)
  844.     local p=io.popen("stat -c %F '"..escapequotes(filename).."'")
  845.     local typ=p:read("*l")
  846.     p:close()
  847.     return typ
  848. end
  849. --]]
  850. function getsymbolictarget(filename)
  851.     local p=io.popen("readlink '"..escapequotes(filename).."'")
  852.     local target=p:read("*l")
  853.     p:close()
  854.     return target
  855. end
  856. function settimestamp(filename,modtimestamp)
  857.     os.execute("touch -h --date=@"..modtimestamp.." '"..escapequotes(filename).."'")
  858. end
  859. function setattrs(filename,attrs)
  860.     os.execute("chmod "..attrs.." '"..escapequotes(filename).."'")
  861. end
  862. function mkdir(filename)
  863.     if filename then
  864.         os.execute("mkdir -p '"..escapequotes(filename).."'")
  865.     end
  866. end
  867. function deletefile(filename)
  868.     os.execute("rm "..escapequotes(filename))
  869. end
  870. function deletedir(filename)
  871.     os.execute("rm -rf "..escapequotes(filename))
  872. end
  873. function mksymlink(filename,target)
  874.     os.execute("ln -s '"..escapequotes(target).."' '"..escapequotes(filename).."'")
  875. end
  876. function mkfifo(filename)
  877.     os.execute("mkfifo '"..escapequotes(filename).."'")
  878. end
  879. function listfiles(dir,include,exclude,verbose)
  880.     if verbose then io.stderr:write("Listing files...\n") end
  881.     local p=io.popen("find '"..escapequotes(dir).."'")
  882.     local allfiles={}
  883.     while true do
  884.         local filename=p:read("*l")
  885.         if not filename then break end
  886.         if not multimatches(filename,exclude) and multimatches(filename,include) then
  887.             table.insert(allfiles,filename)
  888.         end
  889.     end
  890.     p:close()
  891.     if verbose then io.stderr:write("Sorting "..#allfiles.." filenames...\n") end
  892.     table.sort(allfiles,function (a,b)
  893.         local exta,extb=getext(a),getext(b)
  894.         if exta==extb then
  895.             local fna,fnb=stripdirs(a),stripdirs(b)
  896.             if fna==fnb then
  897.                 return a<b
  898.             else
  899.                 return fna<fnb
  900.             end
  901.         else
  902.             return exta<extb
  903.         end
  904.     end)
  905.     return allfiles
  906. end
  907. function hash(data)
  908.     if #data<HASH_SIZE then return data end
  909.     ---[[
  910.     local p=io.popen("sha256sum "..fifofn,"r")
  911.     local fifoh=io.open(fifofn,"wb")
  912.     fifoh:write(data)
  913.     fifoh:close()
  914.     local h=reallyread(p,HASH_SIZE)
  915.     p:close()
  916.     return h
  917.     --]]
  918.     --[[
  919.     local hs={0,7,42,1337,1,2,3,4}
  920.     local rots={1,9,25,31,3,7,15,11}
  921.     for ci=1,#data do
  922.         local v=string.byte(data,ci)
  923.         for p,h in pairs(hs) do
  924.             hs[p]=bit32.bxor(bit32.rrotate(h,rots[p]),v*(p+ci%19))
  925.         end
  926.     end
  927.     print("Hashed: "..#data,bit32.bxor(unpack(hs)),table.concat(hs,"|"))
  928.     return table.concat(hs,"|")
  929.     --]]
  930. end
  931.  
  932. ------------------------
  933. --- Argument parsing ---
  934. ------------------------
  935.  
  936. function parseargs(args,types)
  937.     local parsed={}
  938.     local seen={}
  939.     local i=1
  940.     while args[i] do
  941.         if string.match(args[i],"^%-") then
  942.             for letter in string.gmatch(args[i],"%w") do
  943.                 assert(not seen[letter],"Command line argument -"..letter.." is defined twice.")
  944.                 seen[letter]=true
  945.                 if types[letter]=="boolean" then
  946.                     parsed[letter]=true
  947.                 elseif types[letter]=="string" then
  948.                     assert(string.match(args[i],"%w$")==letter,"Command line argument -"..letter.." expects a string after it so it must be the last one in a group of letters.")
  949.                     parsed[letter]=args[i+1]
  950.                     i=i+1
  951.                 else
  952.                     error("Unknown command line argument: -"..letter)
  953.                 end
  954.             end
  955.             i=i+1
  956.         else
  957.             error("Unexpected argument value: "..args[i])
  958.         end
  959.     end
  960.     return parsed
  961. end
  962.  
  963. ------------------------
  964. --- Hook read/writes ---
  965. ------------------------
  966. do
  967.     local realin,realout=io.stdin,io.stdout
  968.     io.stdin={read=function (s,a)
  969.         local data=realin:read(a)
  970.         if type(data)=="number" then
  971.             count("input",#tostring(data))
  972.         elseif type(data)=="string" then
  973.             count("input",#data)
  974.         else
  975.             count("input",1)
  976.         end
  977.         return data
  978.     end}
  979.     io.stdout={write=function (s,data)
  980.         local res=realout:write(data)
  981.         count("output",#data)
  982.         return res
  983.     end}
  984. end
  985. ------------
  986. --- Main ---
  987. ------------
  988.  
  989. local mustremoveblockdir,mustremovefifo,mustshowinfo=false,false,false
  990. local res,errmsg=pcall(function ()
  991.     local args=parseargs(arg,{
  992.         v='boolean',
  993.         c='boolean',
  994.         x='boolean',
  995.         l='boolean',
  996.         b='string',
  997.         i='string',
  998.         e='string',
  999.         h='boolean',
  1000.         p='boolean',
  1001.         P='boolean',
  1002.         d='string',
  1003.         C='string',
  1004.         m='boolean',
  1005.         B='string',
  1006.         s='string',
  1007.         H='boolean',
  1008.         f='boolean',
  1009.     })
  1010.     args.i=args.i or "."
  1011.     args.e=args.e or "/////"
  1012.     verbose=args.v
  1013.     args.d=args.d or 0
  1014.     args.d=tonumber(args.d)
  1015.     assert(args.d and args.d%1==0,"-d must be followed by an integer.")
  1016.     assert((args.c and 1 or 0)+(args.x and 1 or 0)+(args.l and 1 or 0)+(args.C and 1 or 0)+(args.n and 1 or 0)+(args.H and 1 or 0)<=1,"Exactly one of -c, -x, -l, -C, -m, -H must be set.")
  1017.     assert(not (args.p and args.P),"You cannot set both -p and -P. Perhaps you meant to set only -P?")
  1018.     if args.h then
  1019.         showusage()
  1020.     elseif args.c then
  1021.         args.b=args.b or "."
  1022.         assert(not args.B or (tonumber(args.B) and args.B%1==0 and tonumber(args.B)>0),"-B must be followed by a positive integer")
  1023.         args.B=tonumber(args.B or 1024*1024)
  1024.         mkfifo(fifofn)
  1025.         mustremovefifo=true
  1026.         mustshowinfo=true
  1027.         exploitcheck(args.b)
  1028.         create(args.b,io.stdin,io.stdout,args.i,args.e,args.v,args.p,args.P,args.d,args.B,args.s)
  1029.     elseif args.m then
  1030.         assert(not args.f and not args.b and not args.B and not args.s and args.i=="." and args.e=="/////" and not args.p and not args.P and args.d==0,"You cannot set -b, -B, -s, -i , -e, -p, -P, -f nor -d when you are creating an empty archive with -m.")
  1031.         args.b="/dev/null"
  1032.         mkfifo(fifofn)
  1033.         mustremovefifo=true
  1034.         mustshowinfo=true
  1035.         create(args.b,io.stdin,io.stdout,args.i,args.e,args.v,args.p,args.P,args.d)
  1036.     elseif args.x then
  1037.         assert(not args.b and not args.B and not args.s and not args.p and not args.P,"Command line option -x cannot be combined with -p, -B, -s, -P nor -b.")
  1038.         mkfifo(fifofn)
  1039.         mustremovefifo=true
  1040.         mkdir(blockdir)
  1041.         mustremoveblockdir=true
  1042.         extract(io.stdin,args.i,args.e,args.v,false,args.d,false,args.f)
  1043.     elseif args.l then
  1044.         assert(not args.b and not args.B and not args.s and not args.p and not args.f and not args.P,"Command line option -l cannot be combined with -p, -B, -s, -P, -f nor -b.")
  1045.         mkfifo(fifofn)
  1046.         mustremovefifo=true
  1047.         extract(io.stdin,args.i,args.e,args.v,true,args.d,false,false)
  1048.     elseif args.C then
  1049.         assert(not args.f and not args.b and not args.p and not args.P and args.d==0 and not args.s and not args.B,"Command line option -C cannot be combined with -p, -B, -s, -P, -f, -b nor -d.")
  1050.         local n,t=string.match(args.C,"^(%d+)%s*,%s*(%d+)$")
  1051.         n,t=tonumber(n),tonumber(t)
  1052.         assert(n and t and n%1==0 and t%1==0,"N and T (after -C) must be integers.")
  1053.         local avg=round(t/(n+1),1)
  1054.         local max=math.ceil(t/(n+1))
  1055.         assert(n<t,"If your differential archives are based on the last "..n.." archives, then you need to keep at least "..n+1 .." archives available.")
  1056.         print("If your differential archives are based on the last "..n.." archives (-d "..n..") and you keep a total of "..t.." archives, then you should expect to have "..t-n.." recoverable archives. Archives older than the last "..t-n.." will not have all their base archives available and you will therefore be unable to extract them. You should expect that all "..t.." archives together will take about the same space as "..avg.." full size (non-differential) archives"..(avg~=max and ", but in some cases they will take up about the same space as "..max.." full size archives." or "."))
  1057.     elseif args.H then
  1058.         assert(not args.f and not args.b and args.i=="." and args.e=="/////" and not args.p and not args.P and args.d==0 and not args.s and not args.B,"Command line option -H cannot be combined with -b, -i, -e, -p, -P, -f, -s, -B nor -d.")
  1059.         mkfifo(fifofn)
  1060.         mustremovefifo=true
  1061.         extract(io.stdin,args.i,args.e,args.v,true,0,io.stdout,false)
  1062.     else
  1063.         showusage()
  1064.     end
  1065. end)
  1066. if mustremovefifo then deletefile(fifofn) end
  1067. if mustremoveblockdir then deletedir(blockdir) end
  1068. if mustshowinfo and verbose then
  1069.     io.stderr:write("Deduplication savings: "..formatbytes(saved).." of "..formatbytes(total).." bytes. ("..round(saved/total*100).."%)\n")
  1070.     io.stderr:write("Total blocks: "..totalblocks.."\n")
  1071. end
  1072. if verbose then io.stderr:write("Final counters: "..getcounters().."\n") end
  1073. if errmsg then
  1074.     io.stderr:write(errmsg.."\n")
  1075.     os.exit(1)
  1076. end
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement