Guest User

convertpdfs.py

a guest
Feb 10th, 2022
63
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.44 KB | None | 0 0
  1. import os
  2. import subprocess
  3. from multiprocessing import Pool
  4. from tqdm import tqdm
  5.  
  6. pdfs = sorted([int(f.split('.')[0]) for f in os.listdir('pdfs')])
  7.  
  8.  
  9. def process(pdf):
  10.     command = ['pdftotext', f'pdfs/{pdf}.pdf', f'txts/{pdf}.txt']
  11.     subprocess.call(command)
  12.  
  13.  
  14. if __name__ == '__main__':
  15.     npoc = 30
  16.     os.makedirs('txts', exist_ok=True)
  17.     with Pool(npoc) as p:
  18.         list(tqdm(p.imap(process, pdfs), total=len(pdfs)))
Advertisement
Add Comment
Please, Sign In to add comment