Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- General aim:
- How to pick out possible salient info from long tedious text reports without reading them line by line.
- aka How to TL;DR in batch
- Alternative to hurling the text into a web-based TagCloud
- linux bash:
- uses
- pdftotext
- pdftotext version 0.62.0
- Copyright 2005-2017 The Poppler Developers - http://poppler.freedesktop.org
- Copyright 1996-2011 Glyph & Cog, LLC
- How to install:
- sudo apt-get install poppler-utils
- (o) Fetch the raw PDF data
- Download all pdf files from Rep. Doug Collins' in batch
- List and one line shell script is here:
- https://pastebin.com/KMiAAUkE
- (o) Convert the pdfs to flat text files
- sed 's/^/pdftotext /g' list > t
- chmod 755
- ./t
- (o) Make list of txt files to process
- sed 's/pdf/txt/g' list > list2
- (o) Process those text files using bash shell script "check"
- available here:
- https://pastebin.com/H9LSEAFp
- while IFS= read f; do echo $f; ./check "$f"; done < list2
- (o) List all ".out" result files from the above and dig if necessary
- more *.out
- TL;DR:
- What you get:
- https://pastebin.com/qTbHPbVG
Add Comment
Please, Sign In to add comment