Advertisement
Guest User

linux bash script; pick out unusual words; and rank them

a guest
May 22nd, 2019
143
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.62 KB | None | 0 0
  1. #!/bin/bash
  2. #
  3. # check
  4. #
  5. # pdftotext ain't seem to work in batch
  6. # so do that separately before this.
  7. # We take it from Step 2
  8. # Use spell to write out unusual English words
  9. # Extract any word that starts capitalized
  10. # Count those, sort, list the top 15
  11. #
  12. # Enter the text file as argument
  13. # Usage: ./<check> '081618 Toscas Transcript_Redacted.txt'
  14. echo "$1"
  15. spell "$1" | sort -u | grep '^[A-Z]' | grep -v \'s > "${1%.*}".lst;
  16. while IFS= read y;
  17. do
  18. number=`grep $y "$1" | wc -l`
  19. printf '%s %d\n' $y $number
  20. done < "${1%.*}".lst > "${1%.*}".dat
  21. sort --key=2n "${1%.*}".dat | tail -15 | tac > "${1%.*}".out
  22. #
  23. #
  24. #
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement