acclivity

Python Regex to split text into words and punctuation

Aug 3rd, 2021 (edited)
220
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.76 KB | None | 0 0
  1. # Using Regex to split text into words, preserving all punctuation
  2. import re
  3.  
  4. list_of_long_texts = ["Hello Fred, what time is it?", "Too early, for me!"]
  5.  
  6. # mydict contains words which are to be replaced in the text
  7. mydict = {"time": "hour", "early": "late", "me": "you", "Fred": "Jim"}
  8.  
  9. for text in list_of_long_texts:
  10.     outline = ""
  11.     arr = re.split('(\W)', text)    # Create a list containing every word and separator
  12.     for w in arr:
  13.         if w in mydict:             # If word exists in the dictionary ...
  14.             w = mydict[w]           # ... replace it with the dictionary entry
  15.         outline += w                # Append word or separator onto output line
  16.     print(outline)
  17.  
  18. # Result:-
  19. # Hello Jim, what hour is it?
  20. # Too late, for you!
Add Comment
Please, Sign In to add comment