Advertisement
Tritonio

Aperito 1.6.1 help page

Jul 28th, 2023 (edited)
75
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 20.88 KB | None | 0 0
  1. Aperito - Duplicate file management - v1.6.1 by Tritonio (www.inshame.com)
  2.  
  3. Aperito is a somewhat scriptable duplicate file manager. Cleaning up duplicate
  4. files from a directory is as simple as running:
  5.  
  6. aperito scancleanup MyDirectory
  7.  
  8. But Aperito allows you to perform much more complex deduplication. For example,
  9. if you want to delete all files under Dir3 that also exist under Dir1 or Dir2
  10. but not touch any files under Dir1 and Dir2 nor deduplicate any files which show
  11. up multiple times within Dir3, you could run this:
  12.  
  13. aperito scan Dir1 scan Dir2 cleanup Dir3
  14.  
  15. If you wanted to do the same but also deduplicate the files that show up more
  16. than once inside Dir3, you slightly change the command:
  17.  
  18. aperito scan Dir1 scan Dir2 scancleanup Dir3
  19.  
  20. Or, let assume Dir1 is actually an external drive that you don't keep mounted
  21. all the time. In that case you could scan Dir1 when it's mounted with:
  22.  
  23. aperito scan Dir1 save dir1-files.asd
  24.  
  25. which would create a file that can later be used like this:
  26.  
  27. aperito load dir1-files.asd scan Dir2 scancleanup Dir3
  28.  
  29. Aperito will try to parallelize operations to some extent if it thinks that the
  30. results will be predictable. For example, in the above command, it will load the
  31. savefile while scanning Dir2 at the same time.
  32.  
  33. You could also ignore files that don't match certain criteria. For example to
  34. only consider files in Dir1, modified between 2 and 3 days ago, which have "A"
  35. in their full filename (including the path) but don't have "B" in it, you could
  36. run this:
  37.  
  38. aperito include A exclude B onlybefore 2d onlyafter 3d scancleanup Dir1
  39.  
  40. Aperito will never delete any duplicate files, instead it will create a new
  41. directory and move them there. For example if you run:
  42.  
  43. aperito scancleanup Dir4
  44.  
  45. any duplicate files like Dir4/subdir/filename.ext will be moved to:
  46. Dir4-Aperito-duplicates/subdir/filename.ext. That way, if you want to revert the
  47. deduplication you can simply move the contents of Dir4-Aperito-duplicates to
  48. Dir4 and let your OS handle the merges. Aperito will apply the same permissions
  49. and modification dates to directories created under the "-Aperito-duplicates"
  50. hierarchy as those that they had in their original places. Owner, group and
  51. other metadata are not guaranteed to be copied over for the directories. The
  52. files are moved using the standard OS facilities so they should retain all the
  53. metadata that is retained when files are moved.
  54.  
  55. Aperito can also copy files that it has never seen before to a new location. For
  56. example, you can incrementally backup new files from your home directory to an
  57. external drive with something like this:
  58.  
  59. aperito load backed.asd scancopynew ~ /mnt/external_hdd save backed.asd
  60.  
  61. Aperito starts up with an empty internal state, assuming that no files have been
  62. seen and starts reading commands from its command line in sequence. These
  63. commands may add files into Aperito's internal state as "seen" or they may
  64. deduplicate (move away to a separate directory, as described above) files that
  65. have been "seen" more than one time.
  66.  
  67. The available commands are:
  68.  
  69. scan "directory" Scans a directory tree and adds all the files
  70. in it to the internal state of Aperito as
  71. "seen". It will not deduplicate anything
  72. though.
  73.  
  74. cleanup "directory" Scans a directory tree and deduplicates all
  75. the files that have already been seen. It will
  76. not add the scanned files into the internal
  77. state as "seen" though, therefore, if a file
  78. shows up twice in this directory tree, it will
  79. not be deduplicated. To be deduplicated a file
  80. under this tree needs to be have been "seen"
  81. before the cleanup command was run.
  82.  
  83. copynew "dir1" "dir2" Like the cleanup command but instead of moving
  84. seen files from dir1 to the duplicates
  85. directory, it will instead copy any unseen
  86. (new) files into dir2, maintaining the
  87. directory structure.
  88.  
  89. scancleanup "directory" Like the cleanup command but this time it will
  90. not only deduplicate "seen" files, but it will
  91. also add all the files into the internal state
  92. of Aperito as "seen". Therefore if a file shows
  93. up twice (or more times) under this directory
  94. tree, it will be deduplicated. Of course, files
  95. "seen" before this command is run will be also
  96. be deduplicated even the first time they are
  97. encountered within this directory.
  98.  
  99. copynew "dir1" "dir2" Like the cleanup command but instead of moving
  100. seen files from dir1 to the duplicates
  101. directory it will instead copy unseen (new)
  102. files into dir2, maintaining the directory
  103. structure. Any files it sees will not be added
  104. to the list of "seen" files.
  105.  
  106. scancopynew "dir1" "dir2" Like the scancleanup command but instead of
  107. moving seen files from dir1 to the duplicates
  108. directory it will instead copy unseen (new)
  109. files into dir2, maintaining the directory
  110. structure.
  111.  
  112. copyallnew "dir1" "dir2" Like copynew but if a never seen before file
  113. shows up multiple times under dir1, all of them
  114. will be copied over to dir2, not just the first
  115. one.
  116.  
  117. scancopyallnew "dir1" "dir2" Like scancopynew but if a never seen before
  118. file shows up multiple times under dir1, all of
  119. them will be copied over to dir2, not just the
  120. first one.
  121.  
  122. save "savefile.asd" Saves the internal state of Aperito to a file
  123. so that you can load it some other time. Useful
  124. for scanning external drives once and then
  125. being able to deduplicate files from other
  126. drives as if the "saved" drive was present. Can
  127. also be used to speed up scanning of
  128. directories that you know to be unchanged.
  129.  
  130. load "savefile.asd" Loads a saved state. The saved state is merged
  131. with the current internal state of Aperito so
  132. you can write this command multiple times to
  133. load multiple files.
  134.  
  135. reset Resets the internal state of Aperito. All
  136. "seen" files will be forgotten after this
  137. command.
  138.  
  139. keep shallow These two commands affect the behavior of any
  140. keep deep scancleanup commands that follow. "Keep
  141. shallow" will cause scancleanup to keep the
  142. file which is closest to the root when one or
  143. more duplicates are found while "keep deep"
  144. does the opposite and keeps the most deeply
  145. nested file (this is the default behavior).
  146.  
  147. ask Similar to the previous two commands but this
  148. time it will make Aperito ask you which file
  149. you want to keep. You will also be given the
  150. choice to select any parent directory of each
  151. file so that all files under that directory
  152. will be kept. If you select two directories so
  153. that all files under them will be kept, and
  154. then a duplicate file which exists under both
  155. of them is found, it will be kept in both
  156. directories.
  157.  
  158. wait Waits for all previous command to finish before
  159. proceeding to the next command(s) even if they
  160. could be run in parallel.
  161.  
  162. threads n Number of threads that will be used to hash the
  163. contents of files per command that runs in
  164. parallel. By default n=2. Affects commands
  165. after it only.
  166.  
  167. compare "saved.asd" Compare the currently "seen" files with the
  168. comparediff "saved.asd" hashes stored in the given saved state file.
  169. compareboth "saved.asd" The difference between these four commands is
  170. comparefileonly "saved.asd" what information is printed. Compareboth will
  171. compareseenonly "saved.asd" show files that were both seen and mentioned in
  172. the saved.asd file. Comparefileonly will show
  173. files mentioned in saved.asd but were not seen.
  174. Compareseenonly will show files that were not
  175. seen but are mentioned in saved.asd.
  176. Comparediff will show the combined information
  177. from compareseenonly and comparefileonly. The
  178. plain compare will show all the above info.
  179. Useful for checking if two locations have the
  180. same data, without comparing the actual
  181. directory tree structure.
  182.  
  183. include "regex" Include files whose path and filename contain
  184. a substring that matches the given regular
  185. expression. This command affects commands that
  186. follow it. Loading a saved state is not
  187. affected by inclusions. Exclusions (see below)
  188. will be applied after inclusions. Inclusions
  189. cannot be stacked, using this command a second
  190. time will override the previous use. If you
  191. need combined inclusions use the "|" character
  192. in your regex.
  193.  
  194. exclude "regex" Exclude files whose path and filename contain a
  195. substring that matches the given regular
  196. expression. Matching files will not be scanned
  197. at all. This command affects any commands that
  198. follow it. Loading a saved state is not
  199. affected by exclusions. Exclusions are applied
  200. after inclusions. Exclusions cannot be stacked,
  201. using this command a second time will override
  202. the previous use. If you need combined
  203. inclusions use the "|" character in your regex.
  204.  
  205. onlybefore timestamp Will only include files that were modified
  206. strictly before the given unix timestamp. As an
  207. exception you may also type Nd, where N is any
  208. positive number, which will be converted to a
  209. timestamp that is N days in the past. E.g. 7d
  210. will include only files modified in the last 7
  211. days. Similarly you can use Nh for hours and Nm
  212. for minutes. Using this command a second time
  213. will override the previous use.
  214.  
  215. onlyafter timestamp Will include only files that were modified
  216. after or at the given unix timestamp. As an
  217. exception you may also type Nd, where N is any
  218. positive number, which will be converted to a
  219. timestamp that is N days in the past. E.g. 7d
  220. will exclude any files that were modified in
  221. the last 7 days. Similarly you can use Nh for
  222. hours and Nm for minutes. Using this command
  223. a second time will override the previous use.
  224.  
  225. noinclude If you have used the include command, noinclude
  226. can be used to remove all inclusions for all
  227. the commands that follow it.
  228.  
  229. noexclude If you have used the exclude command, noexclude
  230. can be used to remove all exclusions for all
  231. the commands that follow it.
  232.  
  233. dry Any commands that follow this command will not
  234. alter the filesystem. Make sure you use this
  235. command as the very first one, unless you know
  236. what you are doing.
  237.  
  238. nonstop Ignore errors while trying to move duplicate
  239. files or copying new files instead of halting
  240. the whole process. This will affect only
  241. commands that come after it. Be cautious when
  242. using this command with scancopyallnew or
  243. copyallnew while also using the "and" command
  244. to combine multiple source directories as if a
  245. file comes from multiple directories and has
  246. different data in it, only one instance will be
  247. copied but all instances will become "seen" so
  248. they will never be copied in the future either.
  249. Some errors will still stop the process.
  250.  
  251. dostop Reverts the effect of "nonstop" for commands
  252. that follow. This is the default behavior.
  253.  
  254. and Not exactly a command by itself but can be used
  255. right after the directory paths of scan,
  256. cleanup and scancleanup as well as after the
  257. first directory path of the copy commands to
  258. instruct those commands to modify multiple
  259. paths as if they were one. The difference
  260. between using "and" and simply using the
  261. command twice, once for each directory, for the
  262. scan and cleanup is a minor one: When using
  263. "and" the number of threads will be used to
  264. scan these directories as if they were a single
  265. directory, while using the commands multiple
  266. times will allow Aperito to run the multiple
  267. scan or cleanup commands in parallel,
  268. multiplying the number of threads used. On the
  269. other hand, the effect on the scancleanup
  270. command is more pronounced: Using "and" instead
  271. of two scancleanup commands will cause any
  272. files that are duplicated in these two
  273. directories to be deduplicated properly
  274. according to the rules (deepest, shallowest or
  275. by asking the user), while using two
  276. scancleanup commands (one for each directory)
  277. will cause files that exist in both directories
  278. to be deduplicated-away from the second
  279. directory even if, for example, you have
  280. elected to keep the deepest duplicate and the
  281. duplicate in the second directory is the
  282. deepest. The reason for this behavior is that
  283. scancleanup commands do not run in parallel and
  284. they behave like a regular clean command with
  285. regards to files seen by previous commands (so
  286. files seen by the first scancleanup command
  287. will be always removed if seen by following
  288. scancleanup commands, regardless of rules). You
  289. may use "and" as many times as you like. E.g:
  290. "aperito scancleanup dir1 and dir2 and dir3"
  291.  
  292. Remember that the internal state (which files have been "seen") is not preserved
  293. between runs unless you use the save command to save it to a file and then load
  294. it with the load command during the next run.
  295.  
  296. Commands that can be run in parallel if they appear sequentially are:
  297.  
  298. * Save, cleanup, copynew, copyallnew and compare*. Even multiple of them.
  299. * Load and scan. Even multiple of them.
  300.  
  301. Reset, wait, scancopynew, scancopyallnew and scancleanup are never run in
  302. parallel with other commands. Keep and ask will wait for any pending scancleanup
  303. to finish before being run.
  304.  
  305. Any other command not mentioned (e.g. "dry") will be run immediately but will
  306. affect only commands that follow it in the script. Any commands that were before
  307. it in the script, even if they are still running, will not be affected.
  308.  
  309. If you have 3 scan commands one after the other, and the default number of
  310. threads (i.e. 2) that will give you 2*3=6 threads processing file contents in
  311. parallel. If all three directories you are scanning are in the same disk and if
  312. the disk is rotational and not an SSD this may cause more overhead due to seek
  313. time so you should consider either reducing threads per scan (threads 1) or
  314. putting wait commands between the scan commands.
  315.  
  316. When Aperito explains why it's moving a file to the duplicates directory, the
  317. second path may start with [?] which means that this is a path loaded from a
  318. saved state with the load command and therefore may not currently exist or, if
  319. it is a relative path, may not be relative to the current working directory.
  320.  
  321. ---
  322.  
  323. Copyright 2021-2024 Tritonio (www.inshame.com)
  324.  
  325. This program is free software: you can redistribute it and/or modify it under
  326. the terms of the GNU General Public License as published by the Free Software
  327. Foundation, either version 3 of the License, or (at your option) any later
  328. version.
  329.  
  330. This program is distributed in the hope that it will be useful, but WITHOUT ANY
  331. WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
  332. PARTICULAR PURPOSE. See the GNU General Public License for more details.
  333.  
  334. You should have received a copy of the GNU General Public License
  335. along with this program. If not, see https://www.gnu.org/licenses/
  336.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement