Advertisement
Guest User

Untitled

a guest
Jul 18th, 2019
116
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.96 KB | None | 0 0
  1. import pandas
  2. import ast
  3.  
  4. df = pandas.read_csv("twitter_cleanedsample.csv")
  5. all_words = []
  6. for raw_stopwords in df["Tweet_stopped"]:
  7. #raw_stopwords is a string that _looks_ like a list of strings, for example:
  8. #"['micosapiens', 'faqstv', 'hannahbcn', 'joancbaez', 'tvcat']"
  9. #This looks like a list with a length of 5, but if you called len on it,
  10. #you would actually get 60, because that's how many characters it has,
  11. #counting the brackets and commas and quote marks and such.
  12. #this is useless to us. If we want sensible length data, we need to convert to an actual list.
  13. #ast.literal_eval is an effective way of turning list-looking strings into actual lists
  14. #without opening us up to security problems. So let's use that.
  15. stopwords = ast.literal_eval(raw_stopwords)
  16.  
  17. #now add the words to the list of all words.
  18. all_words.extend(stopwords)
  19.  
  20. print ("Found {} words.".format(len(all_words)))
  21. #result:
  22. #Found 7489 words.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement