Advertisement
Guest User

Untitled

a guest
Jul 23rd, 2019
82
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.52 KB | None | 0 0
  1. import pandas as pd
  2. from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
  3.  
  4. chunks = pd.read_csv("data.csv.bz2",
  5. chunksize=1000000,
  6. nrows=120000000,
  7. )
  8.  
  9. print(type(chunks)) # <class 'pandas.io.parsers.TextFileReader'>
  10.  
  11. count_vectorizer = CountVectorizer()
  12. X_train_counts = count_vectorizer.fit_transform(data_train.comment)
  13.  
  14. tfidf_transformer = TfidfTransformer()
  15. X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement