a guest Jan 23rd, 2019 60 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. #Vectorization for Data Visualization
  2. def vectorization(table):
  3.     #CountVectorizer will convert a collection of text documents to a matrix of token counts
  4.     #Produces a sparse representation of the counts
  5.     #Initialize
  6.     vector = CountVectorizer()
  7.     #We fit and transform the vector created
  8.     frequency_matrix = vector.fit_transform(table.tweet)
  9.     #Sum all the frequencies for each word
  10.     sum_frequencies = np.sum(frequency_matrix, axis=0)
  11.     #Now we use squeeze to remove single-dimensional entries from the shape of an array that we got from applying np.asarray to
  12.     #the sum of frequencies.
  13.     frequency = np.squeeze(np.asarray(sum_frequencies))
  14.     #Now we get into a dataframe all the frequencies and the words that they correspond to
  15.     frequency_df = pd.DataFrame([frequency], columns=vector.get_feature_names()).transpose()
  16.     return frequency_df
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand