SHARE
TWEET

Extract compiled fields from a PDF form and save as CSV

TringaliLuca Mar 15th, 2017 (edited) 166 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. #!/bin/bash
  2. filename=$1
  3.  
  4. if [[ -z $1 ]]; then
  5. echo "USAGE: ./pdf-to-csv.sh compiledform.pdf outputtable.csv H"
  6. echo "The last argument can be both V or H. If V (or not specified) the table is going to be vertical, if H the table is going to be horizontal. The letter O is an alias for H, they both mean horizontal."
  7. echo "If the second argument is not specified, the program just prints on screen a vertical tables."
  8. echo "Please notice that this script works if you build your PDF form with LibreOffice Writer, and if the user compiles it with LibreOffice Writer. Did not try other tools, they might works as well but I can't guarantee."
  9. echo "Script by Luca Tringali, tringalinvent [at] libero.it"
  10. else
  11.  
  12.  
  13. pdftk $filename dump_data_fields output filled_form.txt
  14.  
  15. names=$(less filled_form.txt | grep "FieldName:" | cut -d ":" -f 2 | cut -c 2-)
  16. values=$(less filled_form.txt | grep "FieldValue:" | cut -d ":" -f 2 | cut -c 2-)
  17.  
  18. if [[ -z $3 ]] || [[ $3 == "V" ]]; then
  19. header="FieldName;FieldValue"
  20. data=$(paste <(echo "$names") <(echo "$values") --delimiters ';')
  21. else
  22.  
  23. if [[ $3 == "H" ]] || [[ $3 == "O" ]]; then
  24. header=""
  25. while read -r line
  26. do
  27.     header="$header$line;"
  28. done <<<"$names"
  29.  
  30. data=""
  31. while read -r line
  32. do
  33.     data="$data$line;"
  34. done <<<"$values"
  35. fi
  36. fi
  37.  
  38. csvtext="${header}\n${data}"
  39.  
  40.  
  41. if [[ -z $2 ]]; then
  42. echo -e "${csvtext}"
  43. else
  44. echo -e "${csvtext}"  > $2
  45. fi
  46.  
  47. rm filled_form.txt
  48.  
  49. fi
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
 
Top