Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- from sklearn.feature_extraction.text import CountVectorizer
- from sklearn.cross_validation import train_test_split
- from sklearn.naive_bayes import MultinomialNB
- X = vectorizer.fit_transform(df.quote)
- X = X.tocsc()
- Y = (df.fresh == 'fresh').values.astype(np.int)
- xtrain, xtest, ytrain, ytest = train_test_split(X, Y)
- clf = MultinomialNB().fit(xtrain, ytrain)
- new_review = ['this is a new review, movie was awesome']
- new_review = vectorizer.fit_transform(new_review)
- print df.quote[15]
- print(clf.predict(df.quote[10])) #predict existing review in dataframe
- print(clf.predict(new_review)) #predict new review
- Technically, Toy Story is nearly flawless.
- ---------------------------------------------------------------------------
- TypeError Traceback (most recent call last)
- <ipython-input-91-27a0698bbd1f> in <module>()
- 15
- 16 print df.quote[15]
- ---> 17 print(clf.predict(df.quote[10])) #predict existing quote in dataframe
- 18 print(clf.predict(new_review)) #predict new review
- //anaconda/lib/python2.7/site-packages/sklearn/naive_bayes.pyc in predict(self, X)
- 60 Predicted target values for X
- 61 """
- ---> 62 jll = self._joint_log_likelihood(X)
- 63 return self.classes_[np.argmax(jll, axis=1)]
- 64
- //anaconda/lib/python2.7/site-packages/sklearn/naive_bayes.pyc in _joint_log_likelihood(self, X)
- 439 """Calculate the posterior log probability of the samples X"""
- 440 X = atleast2d_or_csr(X)
- --> 441 return (safe_sparse_dot(X, self.feature_log_prob_.T)
- 442 + self.class_log_prior_)
- 443
- //anaconda/lib/python2.7/site-packages/sklearn/utils/extmath.pyc in safe_sparse_dot(a, b, dense_output)
- 178 return ret
- 179 else:
- --> 180 return fast_dot(a, b)
- 181
- 182
- TypeError: Cannot cast array data from dtype('float64') to dtype('S32') according to the rule 'safe'
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement