Advertisement
andybuckley

Script to remove line-breaks from plain text paragraphs

Apr 8th, 2015
330
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.90 KB | None | 0 0
  1. #! /usr/bin/env python
  2.  
  3. """\
  4. %prog <infile>
  5.  
  6. Remove line breaks from block paragraphs of plain text, but retain multiple
  7. consecutive newlines used for e.g. LaTeX, MarkDown, or plain text formats.
  8.  
  9. TODO:
  10. * Provide a -o flag to write out to file
  11. * Read in from stdin if no files provided
  12. """
  13.  
  14. import optparse
  15. op = optparse.OptionParser(usage=__doc__)
  16. opts, args = op.parse_args()
  17.  
  18. lines = None
  19. infile = args[0]
  20. with open(infile, "r") as f:
  21.     lines = f.readlines()
  22.  
  23. newtext = ""
  24. inpara = False
  25. for l in lines:
  26.     l2 = l.rstrip() # lose newline and trailing whitespace
  27.     if not inpara:
  28.         if l2:
  29.             newtext += l2
  30.             inpara = True
  31.         else:
  32.             newtext += l
  33.         continue
  34.     else: # in para
  35.         if l2:
  36.             newtext += " " + l2.lstrip()
  37.         else:
  38.             newtext += "\n" + l
  39.             inpara = False
  40.  
  41. print newtext
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement