Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Traceback (most recent call last):
- File "<pyshell#5>", line 1, in <module>
- clean_symbol()
- File "/home/corpus/clean_symbol.py", line 8, in clean_symbol
- good_words = symbols.sub("",words)
- TypeError: expected string or buffer
- import codecs
- import re
- def clean_symbol() :
- symbols = re.compile(r'[{} &+( )" =!.?.:.. / | » © : >< # « ,] _ - + ; [ ] %',flags=re.UNICODE)
- with codecs.open("e.txt","r") as fileobject:
- for line in fileobject:
- words = line.split()
- good_words = symbols.sub("",words)
- print(good_words)
- with codecs.open("/home/corpus/Clean_tex1t.txt",'a',encoding="utf-8") as out:
- out.write(good_words)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement