Advertisement
treyhunner

Find duplicate files with pathlib

Nov 20th, 2018
392
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.39 KB | None | 0 0
  1. from collections import defaultdict
  2. from pathlib import Path
  3. from pprint import pprint
  4. from hashlib import md5
  5.  
  6.  
  7. def find_files(directory):
  8.     for path in directory.rglob('*'):
  9.         if path.is_file():
  10.             yield path
  11.  
  12.  
  13. hashes = defaultdict(list)
  14. for path in find_files(Path.cwd()):
  15.     key = md5(path.read_bytes()).hexdigest()
  16.     hashes[key].append(path)
  17.  
  18. pprint(hashes)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement