Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Aperito - Duplicate file management - v1.5.4 by Tritonio (www.inshame.com)
- Aperito is a somewhat scriptable duplicate file manager. Cleaning up duplicate
- files from a directory is as simple as running:
- aperito scancleanup MyDirectory
- But Aperito allows you to perform much more complex deduplication. For example,
- if you want to delete all files under Dir3 that also exist under Dir1 or Dir2
- but not touch any files under Dir1 and Dir2 nor deduplicate any files which show
- up multiple times within Dir3, you could run this:
- aperito scan Dir1 scan Dir2 cleanup Dir3
- If you wanted to do the same but also deduplicate the files that show up more
- than once inside Dir3, you slightly change the command:
- aperito scan Dir1 scan Dir2 scancleanup Dir3
- Or, let assume Dir1 is actually an external drive that you don't keep mounted
- all the time. In that case you could scan Dir1 when it's mounted with:
- aperito scan Dir1 save dir1-files.asd
- which would create a file that can later be used like this:
- aperito load dir1-files.asd scan Dir2 scancleanup Dir3
- You could also ignore files that don't match certain criteria. For example to
- only consider files in Dir1, modified between 2 and 3 days ago, which have "A"
- in their full filename (including the path) but don't have "B" in it, you could
- run this:
- aperito include A exclude B onlybefore 2d onlyafter 3d scancleanup Dir1
- Aperito will try to parallelize operations to some extent if it thinks that the
- results will be predictable. For example, in the above command, it will load the
- savefile while scanning Dir2 at the same time.
- Aperito will never delete any duplicate files, instead it will create a new
- directory and move them there. For example if you run:
- aperito scancleanup Dir4
- any duplicate files like Dir4/subdir/filename.ext will be moved to:
- Dir4-Aperito-duplicates/subdir/filename.ext. That way, if you want to revert the
- deduplication you can simply move the contents of Dir4-Aperito-duplicates to
- Dir4 and let your OS handle the merges. Aperito will apply the same permissions
- and modification dates to directories created under the "-Aperito-duplicates"
- hierarchy as those that they had in their original places. Owner, group and
- other metadata are not guaranteed to be copied over for the directories. The
- files are moved using the standard OS facilities so they should retain all the
- metadata that is retained when files are moved.
- Aperito can also copy files that it has never seen before to a new location. For
- example, you can incrementally backup new files from your home directory to an
- external drive with something like this:
- aperito load backed.asd scancopynew ~ /mnt/external_hdd save backed.asd
- Aperito starts up with an empty internal state, assuming that no files have been
- seen and starts reading commands from its command line in sequence. These
- commands may add files into Aperito's internal state as "seen" or they may
- deduplicate (move away to a separate directory, as described above) files that
- have been "seen" more than one time.
- The available commands are:
- scan "directory" Scans a directory tree and adds all the files
- in it to the internal state of Aperito as
- "seen". It will not deduplicate anything
- though.
- cleanup "directory" Scans a directory tree and deduplicates all
- the files that have already been seen. It will
- not add the scanned files into the internal
- state as "seen" though, therefore, if a file
- shows up twice in this directory tree, it will
- not be deduplicated. To be deduplicated a file
- under this tree needs to be have been "seen"
- before the cleanup command was run.
- copynew "dir1" "dir2" Like the cleanup command but instead of moving
- seen files from dir1 to the duplicates
- directory, it will instead copy any unseen
- (new) files into dir2, maintaining the
- directory structure.
- scancleanup "directory" Like the cleanup command but this time it will
- not only deduplicate "seen" files, but it will
- also add all the files into the internal state
- of Aperito as "seen". Therefore if a file shows
- up twice (or more times) under this directory
- tree, it will be deduplicated. Of course, files
- "seen" before this command is run will be also
- be deduplicated even the first time they are
- encountered within this directory.
- copynew "dir1" "dir2" Like the cleanup command but instead of moving
- seen files from dir1 to the duplicates
- directory it will instead copy unseen (new)
- files into dir2, maintaining the directory
- structure. Any files it sees will not be added
- to the list of "seen" files.
- scancopynew "dir1" "dir2" Like the scancleanup command but instead of
- moving seen files from dir1 to the duplicates
- directory it will instead copy unseen (new)
- files into dir2, maintaining the directory
- structure.
- copyallnew "dir1" "dir2" Like copynew but if a never seen before file
- shows up multiple times under dir1, all of them
- will be copied over to dir2, not just the first
- one.
- scancopyallnew "dir1" "dir2" Like scancopynew but if a never seen before
- file shows up multiple times under dir1, all of
- them will be copied over to dir2, not just the
- first one.
- save "savefile.asd" Saves the internal state of Aperito to a file
- so that you can load it some other time. Useful
- for scanning external drives once and then
- being able to deduplicate files from other
- drives as if the "saved" drive was present. Can
- also be used to speed up scanning of
- directories that you know to be unchanged.
- load "savefile.asd" Loads a saved state. The saved state is merged
- with the current internal state of Aperito so
- you can write this command multiple times to
- load multiple files.
- reset Resets the internal state of Aperito. All
- "seen" files will be forgotten after thi
- command.
- keep shallow These two commands affect the behavior of any
- keep deep scancleanup commands that follow. "Keep
- shallow" will cause scancleanup to keep the
- file which is closest to the root when one or
- more duplicates are found while "keep deep"
- does the opposite and keeps the most deeply
- nested file (this is the default behavior).
- ask Similar to the previous two commands but this
- time it will make Aperito ask you which file
- you want to keep. You will also be given the
- choice to select any parent directory of each
- file so that all files under that directory
- will be kept. If you select two directories so
- that all files under them will be kept, and
- then a duplicate file which exists under both
- of them is found, it will be kept in both
- directories.
- wait Waits for all previous command to finish before
- proceeding to the next command(s) even if they
- could be run in parallel.
- threads n Number of threads that will be used to hash the
- contents of files per command that runs in
- parallel. By default n=2. Affects commands
- after it only.
- compare "savefile.asd" Compare the currently "seen" files with the
- hashes stored in the given saved state file. It
- will print out the hashes (and one location for
- each hash) that exist only on one of the two.
- Useful for checking if two locations have the
- same data, without comparing the actual
- directory tree structure.
- include "regex" Include files whose path and filename contain
- a substring that matches the given regular
- expression. This command affects commands that
- follow it. Loading a saved state is not
- affected by inclusions. Exclusions (see below)
- will be applied after inclusions. Inclusions
- cannot be stacked, using this command a second
- time will override the previous use. If you
- need combined inclusions use the "|" character
- in your regex.
- exclude "regex" Exclude files whose path and filename contain a
- substring that matches the given regular
- expression. Matching files will not be scanned
- at all. This command affects any commands that
- follow it. Loading a saved state is not
- affected by exclusions. Exclusions are applied
- after inclusions. Exclusions cannot be stacked,
- using this command a second time will override
- the previous use. If you need combined
- inclusions use the "|" character in your regex.
- onlybefore timestamp Will only include files that were modified
- strictly before the given unix timestamp. As an
- exception you may also type Nd, where N is any
- positive number, which will be converted to a
- timestamp that is N days in the past. E.g. 7d
- will include only files modified in the last 7
- days. Similarly you can use Nh for hours and Nm
- for minutes. Using this command a second time
- will override the previous use.
- onlyafter timestamp Will include only files that were modified
- after or at the given unix timestamp. As an
- exception you may also type Nd, where N is any
- positive number, which will be converted to a
- timestamp that is N days in the past. E.g. 7d
- will exclude any files that were modified in
- the last 7 days. Similarly you can use Nh for
- hours and Nm for minutes. Using this command
- a second time will override the previous use.
- noinclude If you have used the include command, noinclude
- can be used to remove all inclusions for all
- the commands that follow it.
- noexclude If you have used the exclude command, noexclude
- can be used to remove all exclusions for all
- the commands that follow it.
- dry Any commands that follow this command will not
- alter the filesystem. Make sure you use this
- command as the very first one, unless you know
- what you are doing.
- nonstop Ignore errors while trying to move duplicate
- files or copying new files instead of halting
- the whole process. This will affect only
- commands that come after it. Be cautious when
- using this command with scancopyallnew or
- copyallnew while also using the "and" command
- to combine multiple source directories as if a
- file comes from multiple directories and has
- different data in it, only one instance will be
- copied but all instances will become "seen" so
- they will never be copied in the future either.
- Some errors will still stop the process.
- dostop Reverts the effect of "nonstop" for commands
- that follow. This is the default behavior.
- and Not exactly a command by itself but can be used
- right after the directory paths of scan,
- cleanup and scancleanup as well as after the
- first directory path of the copy commands to
- instruct those commands to modify multiple
- paths as if they were one. The difference
- between using "and" and simply using the
- command twice, once for each directory, for the
- scan and cleanup is a minor one: When using
- "and" the number of threads will be used to
- scan these directories as if they were a single
- directory, while using the commands multiple
- times will allow Aperito to run the multiple
- scan or cleanup commands in parallel,
- multiplying the number of threads used. On the
- other hand, the effect on the scancleanup
- command is more pronounced: Using "and" instead
- of two scancleanup commands will cause any
- files that are duplicated in these two
- directories to be deduplicated properly
- according to the rules (deepest, shallowest or
- by asking the user), while using two
- scancleanup commands (one for each directory)
- will cause files that exist in both directories
- to be deduplicated-away from the second
- directory even if, for example, you have
- elected to keep the deepest duplicate and the
- duplicate in the second directory is the
- deepest. The reason for this behavior is that
- scancleanup commands do not run in parallel and
- they behave like a regular clean command with
- regards to files seen by previous commands (so
- files seen by the first scancleanup command
- will be always removed if seen by following
- scancleanup commands, regardless of rules). You
- may use "and" as many times as you like. E.g:
- "aperito scancleanup dir1 and dir2 and dir3"
- Remember that the internal state (which files have been "seen") is not preserved
- between runs unless you use the save command to save it to a file and then load
- it with the load command during the next run.
- Commands that can be run in parallel if they appear sequentially are:
- * Save(s), cleanup(s), copynew(s), copyallnew(s) and compare(s).
- * Load(s) and scan(s)
- Reset, wait, scancopynew, scancopyallnew and scancleanup are never run in
- parallel with other commands. Keep and ask will wait for any pending scancleanup
- to finish before being run.
- Any other command not mentioned (e.g. "dry") will be run immediately but will
- affect only commands that follow it in the script. Any commands that were before
- it in the script, even if they are still running, will not be affected.
- If you have 3 scan commands one after the other, and the default number of
- threads (i.e. 2) that will give you 2*3=6 threads processing file contents in
- parallel. If all three directories you are scanning are in the same disk and if
- the disk is rotational and not an SSD this may cause more overhead due to seek
- time so you should consider either reducing threads per scan (threads 1) or
- putting wait commands between the scan commands.
- When Aperito explains why it's moving a file to the duplicates directory, the
- second path may start with [?] which means that this is a path loaded from a
- saved state with the load command and therefore may not currently exist or, if
- it is a relative path, may not be relative to the current working directory.
- ---
- Copyright 2021-2023 Tritonio (www.inshame.com)
- This program is free software: you can redistribute it and/or modify it under
- the terms of the GNU General Public License as published by the Free Software
- Foundation, either version 3 of the License, or (at your option) any later
- version.
- This program is distributed in the hope that it will be useful, but WITHOUT ANY
- WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
- PARTICULAR PURPOSE. See the GNU General Public License for more details.
- You should have received a copy of the GNU General Public License
- along with this program. If not, see https://www.gnu.org/licenses/
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement