Advertisement
Guest User

Untitled

a guest
Nov 24th, 2014
143
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.71 KB | None | 0 0
  1. Traceback (most recent call last):
  2. File "<pyshell#5>", line 1, in <module>
  3. clean_symbol()
  4. File "/home/corpus/clean_symbol.py", line 8, in clean_symbol
  5. good_words = symbols.sub("",words)
  6. TypeError: expected string or buffer
  7.  
  8. import codecs
  9. import re
  10. def clean_symbol() :
  11. symbols = re.compile(r'[{} &+( )" =!.?.:.. / | » © : >< # « ,] _ - + ; [ ] %',flags=re.UNICODE)
  12. with codecs.open("e.txt","r") as fileobject:
  13. for line in fileobject:
  14. words = line.split()
  15. good_words = symbols.sub("",words)
  16.  
  17. print(good_words)
  18. with codecs.open("/home/corpus/Clean_tex1t.txt",'a',encoding="utf-8") as out:
  19. out.write(good_words)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement