Guest User

Untitled

a guest
Jan 23rd, 2019
95
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.86 KB | None | 0 0
  1. #Vectorization for Data Visualization
  2. def vectorization(table):
  3. #CountVectorizer will convert a collection of text documents to a matrix of token counts
  4. #Produces a sparse representation of the counts
  5. #Initialize
  6. vector = CountVectorizer()
  7. #We fit and transform the vector created
  8. frequency_matrix = vector.fit_transform(table.tweet)
  9. #Sum all the frequencies for each word
  10. sum_frequencies = np.sum(frequency_matrix, axis=0)
  11. #Now we use squeeze to remove single-dimensional entries from the shape of an array that we got from applying np.asarray to
  12. #the sum of frequencies.
  13. frequency = np.squeeze(np.asarray(sum_frequencies))
  14. #Now we get into a dataframe all the frequencies and the words that they correspond to
  15. frequency_df = pd.DataFrame([frequency], columns=vector.get_feature_names()).transpose()
  16. return frequency_df
Add Comment
Please, Sign In to add comment