Advertisement
Guest User

Untitled

a guest
Aug 18th, 2019
73
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.45 KB | None | 0 0
  1. import re
  2.  
  3. def text_cleaner(text):
  4. # lower case text
  5. newString = text.lower()
  6. newString = re.sub(r"'s\b","",newString)
  7. # remove punctuations
  8. newString = re.sub("[^a-zA-Z]", " ", newString)
  9. long_words=[]
  10. # remove short word
  11. for i in newString.split():
  12. if len(i)>=3:
  13. long_words.append(i)
  14. return (" ".join(long_words)).strip()
  15.  
  16. # preprocess the text
  17. data_new = text_cleaner(data_text)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement