SHOW:
|
|
- or go back to the newest paste.
1 | Data Hoarding General /dhg/ (sauce - https://github.com/simon987/awesome-datahoarding) | |
2 | ||
3 | ### Web Archiving | |
4 | * Collect - https://github.com/xarantolus/Collect: A server to collect & archive websites that also supports video downloads | |
5 | * grab-site - https://github.com/ludios/grab-site: The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns | |
6 | * Heritrix - https://github.com/internetarchive/heritrix3: Extensible, web-scale, archival-quality web crawler | |
7 | * HTTrack - https://www.httrack.com/: Download a website from the Internet to a local directory | |
8 | * wail - https://github.com/machawk1/wail: Web Archiving Integration Layer: One-Click User Instigated Preservation | |
9 | * wikiteam - https://github.com/WikiTeam/wikiteam: set of tools for archiving wikis | |
10 | ||
11 | ### General | |
12 | * annie - https://github.com/iawia002/annie: Youtube-DL alternative writtent in Golang | |
13 | * aria2 - https://aria2.github.io/: A lightweight multi-protocol & multi-source command-line download utility | |
14 | * CrowLeer - https://github.com/ERap320/CrowLeer: Powerful C++ web crawler based on libcurl | |
15 | * curl - https://github.com/curl/curl: Tool and library for transferring data with URL syntax, supporting many protocols | |
16 | * Plowshare - https://github.com/mcrapet/plowshare: Command-line tool to manage file-sharing site | |
17 | * Rclone - https://github.com/ncw/rclone: A command line program to sync files and directories to and from various cloud storage providers | |
18 | * wget - https://savannah.gnu.org/git/?group=wget: Utility for non-interactive download of files from | |
19 | * you-get - https://github.com/soimort/you-get: Dumb downloader that scrapes the web | |
20 | * Youtube-DL - https://github.com/rg3/youtube-dl: A command-line program to download videos from YouTube and a few hundred more sites | |
21 | ||
22 | ### Application-specific | |
23 | * ChanThreadWatch - https://github.com/SuperGouge/ChanThreadWatch: Saves threads from \*chan-style boards and checks for updates until the thread dies | |
24 | * floatplane_ripper - https://gist.github.com/simon987/0756c378ca2dfb0003931e26ff7fe270: Script to rip all videos from https://floatplane.rip/ | |
25 | * gallery-dl - https://github.com/mikf/gallery-dl: Fownload image galleries and collections from pixiv, exhentai, danbooru and more | |
26 | * dzi-dl - https://github.com/ryanfb/dzi-dl: Deep Zoom Image Downloader | |
27 | * FanFicFare - https://github.com/JimmXinu/FanFicFare: Tool for making eBooks from stories on fanfiction and other web sites | |
28 | * FicSave - https://github.com/waylaidwanderer/FicSave: Online fanfiction downloader | |
29 | * Google Images Download - https://github.com/hardikvasa/google-images-download: Python script for downloading images | |
30 | * iiif-dl - https://github.com/ryanfb/iiif-dl: Command-line tile downloader/assembler for IIIF endpoints/manifests | |
31 | * Instagram Scraper - https://github.com/dankmemes/instagram-scraper: Instagram-scraper is a command-line application written in Python that scrapes and downloads an instagram user's photos and videos. Use responsibly. | |
32 | * PyInstaLive - https://github.com/notcammy/PyInstaLive: Instagram live stream downloader. | |
33 | * RedditDownloader - https://github.com/shadowmoose/RedditDownloader: Scrapes Reddit to download media of your choice | |
34 | * Scribd-Downloader - https://github.com/ritiek/scribd-downloader: Allows downloading of Scribd documents | |
35 | * RipMe - https://github.com/RipMeApp/ripme: RipMe is an album ripper for various websites. Runs on your computer. Requires Java 8. | |
36 | * yt-mango - https://github.com/terorie/yt-mango: Youtube metadata archiver | |
37 | the Web (HTTP & FTP) | |
38 | * Youtube-MA - https://github.com/CorentinB/YouTube-MA: Youtube metadata archiver | |
39 | ||
40 | ### Download automation | |
41 | * bazarr - https://github.com/morpheus65535/bazarr: Companion application to Sonarr and Radarr for downloading subtitles | |
42 | * FlexGet - https://github.com/Flexget/Flexget: Multipurpose automation tool for content like torrents, nzbs, podcasts, comics, series, movies, etc | |
43 | * Jackett - https://github.com/Jackett/Jackett: API support for torrent trackers (works with Sonarr, Radarr and others) | |
44 | * Lidarr - https://github.com/lidarr/Lidarr: Music collection manager for Usenet and BitTorrent users | |
45 | * Mylar - https://github.com/evilhero/mylar: An automated Comic Book downloader (cbr/cbz) for use with SABnzbd, NZBGet and torrents | |
46 | * Sick-Beard - https://github.com/midgetspy/Sick-Beard: PVR for newsgroup users (with limited torrent support) | |
47 | * Radarr - https://github.com/Radarr/Radarr: A fork of Sonarr to work with movies à la Couchpotato | |
48 | * Sonarr - https://github.com/Sonarr/Sonarr: PVR for Usenet and BitTorrent users | |
49 | ||
50 | ## Handling Data Rot and it's Corruption | |
51 | * m5 deep - http://md5deep.sourceforge.net/: md5deep is a set of programs to compute MD5, SHA-1, SHA-256, Tiger, or Whirlpool message digests on an arbitrary number of files. | |
52 | ||
53 | ## Compression | |
54 | * KGB Archiver - https://github.com/RandallFlagg/kgbarchiver: compression tool with unbelievable high compression rate | |
55 | * peazip - http://www.peazip.org/: File archiver utility | |
56 | * PIGZ - https://zlib.net/pigz/: Multi-threaded gzip | |
57 | * WinRAR - https://www.rarlab.com/download.htm: Can decompress RAR and zip files. | |
58 | ||
59 | ## Network | |
60 | * NetLimiter - https://www.netlimiter.com/: Internet traffic control and monitoring tool for Windows | |
61 | ||
62 | ## File systems | |
63 | * httpdirfs - https://github.com/fangfufu/httpdirfs/: A filesystem which allows you to mount HTTP directory listings | |
64 | * mergerfs - https://github.com/trapexit/mergerfs: a featureful union filesystem | |
65 | * NTFS drivers for MacOS - https://www.seagate.com/ca/en/support/downloads/item/ntfs-driver-for-mac-os-master-dl/ | |
66 | ||
67 | ## File conversion | |
68 | * AAXtoMP3 - https://github.com/KrumpetPirate/AAXtoMP3: convert AAX files to common MP3, M4A, M4B, flac and ogg formats through a basic bash script frontend to FFMPEG | |
69 | * html2warc - https://github.com/steffenfritz/html2warc: Convert web resources to a single warc file | |
70 | ||
71 | ||
72 | ## Utility Scripts | |
73 | * Backblaze B2 sync backup script - https://gist.github.com/AlexanderProd/cb645cf858fd5c89780e7df267226b80: Script to sync mutliple directories with Backblaze B2 | |
74 | * Misc download scripts - https://github.com/simon987/Misc-Download-Scripts: Scripts for downloading content from various websites | |
75 | * rclone_dirsize - https://gist.github.com/simon987/7aff5ca3e9ae6c755055ca7b350ef9f8: Get size of http directory listing with rclone | |
76 | * rm_empty_subdir - https://gist.github.com/simon987/f5c2cd7602898615ac9bc8c762d9fe1d: Remove empty sub-directories on Windows | |
77 | * void-cat-uploader - https://github.com/takky1154/void-cat-uploader: This script automatically uploads all files inside a directory to https://void.cat. | |
78 | * youtube-dl_soundcloud - https://gist.github.com/simon987/2dd7c57d65a741c93f5791bc984b97d1: snippet for using youtube-dl to download soundcloud playlists | |
79 | ||
80 | ## Content sharing | |
81 | * h5ai - https://github.com/lrsjng/h5ai: HTTP web server index for Apache httpd, lighttpd, nginx and Cherokee | |
82 | * ipfs - https://ipfs.io/: Protocol and network designed to create a content-addressable, peer-to-peer method of storing and sharing hypermedia in a distributed file system | |
83 | * opds - https://opds.io/: Easy to use, Open & Decentralized Content Distribution | |
84 | ||
85 | ## Data curation | |
86 | * baobab - https://github.com/GNOME/baobab: Graphical disk usage analyzer | |
87 | * beets - https://github.com/beetbox/beets: Music library manager and MusicBrainz tagger | |
88 | * Calibre - https://github.com/kovidgoyal/calibre: Ebook manager | |
89 | * DeepSort - https://github.com/CorentinB/DeepSort/: AI powered image tagger backed by DeepDetect | |
90 | * diskover - https://github.com/shirosaidev/diskover: File system crawler, disk space usage, file search engine and file system analytics powered by Elasticsearch | |
91 | * Everything - https://www.voidtools.com/: Locate files and folders by name instantly (Windows) | |
92 | * FileBot - https://www.filebot.net/: FileBot is the ultimate tool for organizing and renaming your Movies, TV Shows and Anime | |
93 | * fucking-weeb - https://github.com/cosarara/fucking-weeb: A library manager for animu (and TV shows, and whatever). | |
94 | * grepWin - https://github.com/stefankueng/grepWin: A powerful and fast search tool using regular expressions (Windows) | |
95 | * jdupes - https://github.com/jbruchon/jdupes: Powerful duplicate file finder | |
96 | * MediaElch - https://github.com/komet/mediaelch: Media manager for Kodi | |
97 | * MediaInfo - https://github.com/MediaArea/MediaInfo: Convenient unified display of the most relevant technical and tag data for video and audio files | |
98 | * Mp3tag - https://www.mp3tag.de: Powerful and easy-to-use tool to edit metadata of audio files (Windows/Mac) | |
99 | * phockup - https://github.com/ivandokov/phockup: Media sorting tool to organize photos and videos from your camera | |
100 | * picard - https://github.com/metabrainz/picard: MusicBrainz tagger | |
101 | * TeraCopy - https://www.codesector.com/downloads: Copy your files faster and more securely | |
102 | * tree - http://mama.indstate.edu/users/ice/tree/: 'tree' command for linux | |
103 | * WinDirStat - https://windirstat.net/: Disk usage statistics viewer and cleanup tool for Windows | |
104 | * SyncToy - https://www.microsoft.com/en-us/download/details.aspx?id=15155: Microsoft windows file parity across locations tool | |
105 | * DupeGuru - https://dupeguru.voltaicideas.net/: finds duplicate files | |
106 | ||
107 | ## File Utilities | |
108 | * __Batch Renamer__: GPRename - http://gprename.sourceforge.net/ -> qmv (renameutils) - http://www.nongnu.org/renameutils/ | |
109 | * __File Archiver__: PeaZip -> Xarchiver -> Atool - http://www.nongnu.org/atool/ | |
110 | * __File Search__: DocFetcher - http://docfetcher.sourceforge.net/en/index.html -> ANGRYsearch - https://github.com/DoTheEvo/ANGRYsearch -> Puggle - http://puggle.sourceforge.net/ -> regain - http://regain.sourceforge.net/index.php -> find | |
111 | * __File Synchronization__: Unison - https://github.com/bcpierce00/unison -> git-annex - https://git-annex.branchable.com/ -> Rsync | |
112 | * __Image Organizer__: hydrus network -> Shotwell -> GTKRawGallery -> digiKam -> gThumb (+ gphoto) -> Mapivi - http://mapivi.sourceforge.net/mapivi.shtml -> BASH-Booru - https://github.com/ChristianSilvermoon/BASH-Booru | |
113 | * __RegEx Builder__: regexxer - https://directory.fsf.org/wiki/Regexxer -> Visual REGEXP - http://laurent.riesterer.free.fr/regexp/ -> txt2regex - https://github.com/aureliojargas/txt2regex | |
114 | ||
115 | ## Filesharing | |
116 | * __Direct Connect__: LinuxDC++ -> ncdc - https://github.com/srijan/ncdc -> microdc2 -http://corsair626.no-ip.org/microdc/ | |
117 | * __Download Manager__: giFT - https://sourceforge.net/projects/gift/ + giFTcurs - http://www.nongnu.org/giftcurs/ -> aria2 - https://aria2.github.io/ -> cURL -> Wget | |
118 | * __File Scraper__: megatools -> JDownloader - https://github.com/Bobmk/JDownloader) -> Plowshare - https://github.com/mcrapet/plowshare | |
119 | * __FTP Client__: FileZilla -> lftp - https://github.com/lavv17/lftp | |
120 | * __LAN Sharing__: NitroShare -> Dukto | |
121 | * __Media Center__: Plex -> Emby -> Popcorn Time -> Kodi ("XBMC", + Sonarr) | |
122 | * __Media Miner__: FlexGet -> Sonarr - https://github.com/Sonarr/Sonarr | |
123 | * __Offline Reader__: Kiwix - http://www.kiwix.org/ -> Darcy Ripper -> HTTrack -> Wget | |
124 | * __Soulseek__: Nicotine Plus -> Museek (mucous) - https://museek-plus.org/ | |
125 | * __Stream Catcher__: Streamripper -> youtube-dl -> cclive - https://github.com/legatvs/cclive -> youtube-pl - http://ronja.twibright.com/youtube-pl.php -> quvi - https://github.com/mogaal/quvi, RTMPDump - https://github.com/mstorsjo/rtmpdump | |
126 | * __Torrent Client__: qBittorrent -> RTorrent -> transmission-daemon (comes with a web interface - https://github.com/transmission/transmission/wiki/Web-Interface by default but other frontends - https://github.com/fagga/transmission-remote-cli exist. | |
127 | * __Torrent Tracker Scraper__: Torrtux - https://github.com/l333k0/torrtux -> Torrench - https://github.com/kryptxy/torrench -> Jackett - https://github.com/Jackett/Jackett | |
128 | * __Usenet (File Grabber)__: LottaNZB -> SABnzbd -> NZBGet - https://github.com/nzbget/nzbget -> nzb - https://directory.fsf.org/wiki/Nzb -> nzbperl - https://github.com/eghm/nzbperl | |
129 | ||
130 | ## Command Line Tools | |
131 | * __Command Line Cheatsheet__: CLI Companion - https://launchpad.net/clicompanion -> xman -> cheat / howdoi / clf / fu / bro -> cheat.sh - https://github.com/chubin/cheat.sh | |
132 | * __Directory Browsing__: fasd - https://github.com/clvv/fasd, xd - https://github.com/fbb-git/xd, fzy - https://github.com/jhawthorn/fzy | |
133 | * __Framebuffer Environment__: Fbterm - https://code.google.com/archive/p/fbterm/ -> yaft (because sixel) - https://github.com/uobikiemukot/yaft -> hterm (because regis) - https://github.com/new299/hackterm | |
134 | * __Hacker Culture__: ddate, fortune, The Hacker Test, The Jargon File | |
135 | * __Multiplexer__: Tmux -> Byobu -> GNU Screen (+sixel patch - https://gist.github.com/saitoha/7546579 | |
136 | * __Progress Viewers__: progress - https://github.com/Xfennec/progress) -> pv - Pipe Viewer - https://github.com/icetee/pv -> Advanced Copy - https://github.com/atdt/advcpmv | |
137 | ||
138 | ## Disk Tools | |
139 | * __CD-DVD Burn and Copy (Backends)__: cdrtools -> cdrkit -> cdrskin - https://dev.lovelyhq.com/libburnia/web/wikis/cdrskin | |
140 | * __CD-DVD Burn and Copy (Frontends)__: K3b -> Brasero -> cdw - http://cdw.sourceforge.net/ | |
141 | * __CD-DVD Ripping__: Sound Juicer -> fre ac -> cdparanoia - https://www.xiph.org/paranoia/ (+ ABCDE - http://lly.org/~rcw/abcde/page/ | |
142 | * __Custom Install CD__: Respin -> Remastersys -> Distroshare -> PinguyBuilder -> Customizer -> Ubuntu Customization Kit -> Mklivecd | |
143 | * __Device Management__: Udisks (+ udevil) -> pmount -> bashmount -https://github.com/jamielinux/bashmount/blob/master/INSTALL | |
144 | * __Disk Cloning and Writing__: dd -> dcfldd -> dc3dd - https://sourceforge.net/projects/dc3dd/ | |
145 | * __Live USB__: UNetbootin -> MultiCD - https://multicd.us/ | |
146 | * __Partitioning__: Gparted -> cfdisk -> GNU Parted -> fdisk / sfdisk | |
147 | * __System Backup__: Systemback - https://sourceforge.net/projects/systemback/ -> Bacula - https://blog.bacula.org/ -> FSArchiver - http://www.fsarchiver.org/ -> CYA - https://www.cyberws.com/bash/cya/ | |
148 | ||
149 | ## APIs & Online tools | |
150 | * iqdb - https://iqdb.org/: Multi-service reverse image search | |
151 | * thetvdb - https://www.thetvdb.com/: TV shows metadata (used by plex) | |
152 | ||
153 | ## Hardware / Monitoring | |
154 | * CrystalDiskInfo - https://crystalmark.info/en/software/crystaldiskinfo/: A HDD/SSD utility software which supports a part of USB, Intel RAID and NVMe. | |
155 | * Hard Drive Sentinel - https://www.hdsentinel.com/: Multi-OS SSD and HDD monitoring and analysis software | |
156 | * smartmontools - https://www.smartmontools.org/: Control and monitor storage systems using the (SMART) built into most modern ATA/SATA, SCSI/SAS and NVMe disks | |
157 | ||
158 | ## Data recovery | |
159 | * PhotoRec - https://www.cgsecurity.org/wiki/PhotoRec: FOSS powerful gui data recovery tool. | |
160 | * TestDisk - https://www.cgsecurity.org/wiki/TestDisk_Download: Another FOSS tool by the author of PhotoRec, but this one is for cli |