Advertisement
jbozhich

ScratchStemmer

Dec 6th, 2017
76
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.44 KB | None | 0 0
  1. import argparse
  2. from nltk import pos_tag, word_tokenize
  3. from nltk.stem import snowball
  4.  
  5. parser = argparse.ArgumentParser()
  6. parser.add_argument("file")
  7.  
  8. options = parser.parse_args()
  9.  
  10. with open(options.file, 'r') as f:
  11.         gum_text = f.read()
  12.  
  13. gum_docs = word_tokenize(gum_text)
  14.  
  15. my_stemmer = snowball.SnowballStemmer("english")
  16. stemmed = []
  17.  
  18. for documents in gum_docs:
  19.     stemmed.append(my_stemmer.stem(documents))
  20. print(stemmed)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement