Guest User

Untitled

a guest
Mar 21st, 2018
61
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.77 KB | None | 0 0
  1. sample_train_data = ['Dispute delays National Assembly formation process',
  2. 'Country is looking to encourage entrepreneurs and startup process',
  3. 'Airline fuel surcharges to go up from Tuesday']
  4.  
  5. from sklearn.feature_extraction.text import CountVectorizer
  6. # instantiate Vectorizer
  7. vec = CountVectorizer()
  8.  
  9. # feed/learn the 'vocabulary' of the training data
  10. vec.fit(sample_train_data)
  11.  
  12.  
  13. CountVectorizer(analyzer='word', binary=False, decode_error='strict',
  14. dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
  15. lowercase=True, max_df=1.0, max_features=None, min_df=1,
  16. ngram_range=(1, 1), preprocessor=None, stop_words=None,
  17. strip_accents=None, token_pattern='(?u)\\b\\w\\w+\\b',
  18. tokenizer=None, vocabulary=None)
Add Comment
Please, Sign In to add comment