Advertisement
Guest User

HTML cleanup script

a guest
Mar 15th, 2011
461
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Bash 1.73 KB | None | 0 0
  1. #!/bin/bash
  2. # HTML Clean-up script using Dave Ragget's HTML-Tidy and SED.
  3. # Script para limpeza de HTML usando o HTML-Tidy, de Dave Ragget e o SED.
  4. # http://tidy.sourceforge.net  -- http://www.gnu.org/software/sed #
  5. # This work was assembled from several loose answers collected over many
  6. # different Internet forums --- I do not claim authorship. Public Domain.
  7. # Este trabalho reúne diversas respostas avulsas coletadas em muitos
  8. # fóruns da Internet --- Não reivindico a autoria. Domínio Público.
  9. # José Geraldo Gouvêa -- jggouvea at gmail.com
  10.  
  11. for file in post.txt; do
  12. sed 's/<br \/>/<p>/g' post.txt > post.1
  13. tidy --char-encoding utf8 --wrap 0 --logical-emphasis true --enclose-block-text true \
  14. --drop-empty-paras true post.1 > post.2
  15. sed '1,7d' post.2 > post.3
  16. sed 'N;$!P;$!D;$d' post.3 > post.4
  17. sed -e :a -e '$b;N;s/\n//;ba' post.4 > post.5
  18. sed 's/ - / \&ndash; /g' post.5 > post.6
  19. sed 's/ -- /\&mdash;/g' post.6 > post.7
  20. sed 's/\.\.\./\&hellip;/g' post.7 > post.8
  21. mv post.8 newpost.html
  22. rm post.*
  23. mv newpost.html post.txt
  24. done
  25.  
  26. for f in post.txt;
  27. do
  28. sed 's/^"/\&ldquo;/g' $f > $f.2 ; mv $f.2 $f
  29. sed 's/"$/\&rdquo;/g' $f > $f.2 ; mv $f.2 $f
  30. sed 's/ "/ \&ldquo;/g' $f > $f.2 ; mv $f.2 $f
  31. sed 's/" /\&rdquo; /g' $f > $f.2 ; mv $f.2 $f
  32. sed 's/[TAB]"/[TAB]\&ldquo;/g' $f > $f.2 ; mv $f.2 $f
  33. sed 's/"[TAB]/\&rdquo;[TAB]/g' $f > $f.2 ; mv $f.2 $f
  34. sed 's/")/\&rdquo;)/g' $f > $f.2 ; mv $f.2 $f
  35. sed 's/("/(\&ldquo;/g' $f > $f.2 ; mv $f.2 $f
  36. sed 's/";/\&rdquo;;/g' $f > $f.2 ; mv $f.2 $f
  37. sed 's/":/\&rdquo;:/g' $f > $f.2 ; mv $f.2 $f
  38. sed 's/,"/,\&rdquo;/g' $f > $f.2 ; mv $f.2 $f
  39. sed 's/",/\&rdquo;,/g' $f > $f.2 ; mv $f.2 $f
  40. sed 's/\."/\.\&rdquo;/g' $f > $f.2 ; mv $f.2 $f
  41. sed 's/"\./\&rdquo;./g' $f > $f.2 ; mv $f.2 $f
  42. done
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement