Advertisement
Guest User

Untitled

a guest
Aug 23rd, 2019
96
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.54 KB | None | 0 0
  1. def pdfinfo(infile):
  2. """
  3. Wraps command line utility pdfinfo to extract the PDF meta information.
  4. Returns metainfo in a dictionary.
  5. sudo apt-get install poppler-utils
  6.  
  7. This function parses the text output that looks like this:
  8. Title: PUBLIC MEETING AGENDA
  9. Author: Customer Support
  10. Creator: Microsoft Word 2010
  11. Producer: Microsoft Word 2010
  12. CreationDate: Thu Dec 20 14:44:56 2012
  13. ModDate: Thu Dec 20 14:44:56 2012
  14. Tagged: yes
  15. Pages: 2
  16. Encrypted: no
  17. Page size: 612 x 792 pts (letter)
  18. File size: 104739 bytes
  19. Optimized: no
  20. PDF version: 1.5
  21. """
  22. import os.path as osp
  23.  
  24. cmd = '/usr/bin/pdfinfo'
  25. if not osp.exists(cmd):
  26. raise RuntimeError('System command not found: %s' % cmd)
  27.  
  28. if not osp.exists(infile):
  29. raise RuntimeError('Provided input file not found: %s' % infile)
  30.  
  31. def _extract(row):
  32. """Extracts the right hand value from a : delimited row"""
  33. return row.split(':', 1)[1].strip()
  34.  
  35. output = {}
  36.  
  37. labels = ['Title', 'Author', 'Creator', 'Producer', 'CreationDate',
  38. 'ModDate', 'Tagged', 'Pages', 'Encrypted', 'Page size',
  39. 'File size', 'Optimized', 'PDF version']
  40.  
  41. cmd_output = subprocess.check_output([cmd, infile])
  42. for line in cmd_output.splitlines():
  43. for label in labels:
  44. if label in line:
  45. output[label] = _extract(line)
  46.  
  47. return output
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement