daily pastebin goal
84%
SHARE
TWEET

Untitled

a guest Jan 17th, 2019 67 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. from nltk import word_tokenize          
  2. from nltk.stem import WordNetLemmatizer
  3.  
  4. class LemmaTokenizer(object):
  5.   def __init__(self):
  6.     self.wnl = WordNetLemmatizer()
  7.     def __call__(self, doc):
  8.       return [self.wnl.lemmatize(t) for t in word_tokenize(doc)]
  9.  
  10. from sklearn.feature_extraction.text import TfidfVectorizer
  11.  
  12. vectorizer = TfidfVectorizer( tokenizer = LemmaTokenizer(), analyzer = 'word', max_df = 0.7, min_df = 50, stop_words = 'english' )
  13. vectorizer.fit(corpus)
  14. corpus_tf_idf = vectorizer.transform(corpus)
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
 
Top