Guest User

Automated procedure to TL;DR text reports in batch

a guest
May 22nd, 2019
70
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.04 KB | None | 0 0
  1. General aim:
  2. How to pick out possible salient info from long tedious text reports without reading them line by line.
  3. aka How to TL;DR in batch
  4. Alternative to hurling the text into a web-based TagCloud
  5.  
  6. linux bash:
  7. uses
  8. pdftotext
  9. pdftotext version 0.62.0
  10. Copyright 2005-2017 The Poppler Developers - http://poppler.freedesktop.org
  11. Copyright 1996-2011 Glyph & Cog, LLC
  12.  
  13. How to install:
  14. sudo apt-get install poppler-utils
  15.  
  16. (o) Fetch the raw PDF data
  17. Download all pdf files from Rep. Doug Collins' in batch
  18. List and one line shell script is here:
  19. https://pastebin.com/KMiAAUkE
  20.  
  21. (o) Convert the pdfs to flat text files
  22. sed 's/^/pdftotext /g' list > t
  23. chmod 755
  24. ./t
  25.  
  26. (o) Make list of txt files to process
  27. sed 's/pdf/txt/g' list > list2
  28.  
  29. (o) Process those text files using bash shell script "check"
  30. available here:
  31. https://pastebin.com/H9LSEAFp
  32.  
  33. while IFS= read f; do echo $f; ./check "$f"; done < list2
  34.  
  35. (o) List all ".out" result files from the above and dig if necessary
  36. more *.out
  37.  
  38. TL;DR:
  39. What you get:
  40. https://pastebin.com/qTbHPbVG
Add Comment
Please, Sign In to add comment