Guest User

Untitled

a guest
Oct 19th, 2018
90
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.84 KB | None | 0 0
  1. # Drop columns we don't need
  2. data = data.drop(['date', 'tweet_size', 'mention'], axis=1)
  3.  
  4. # Subset data set for faster training
  5. # Choose all positive and all negative samples
  6. positive = data[data['sentiment'] == 1]
  7. negative = data[data['sentiment'] == 0]
  8.  
  9. # Choose 5% of positives and 5% of negatives
  10. positive = positive.sample(frac=0.05)
  11. negative = negative.sample(frac=0.05)
  12.  
  13. # Merge both datasets
  14. reduced_set = pd.concat([positive, negative])
  15.  
  16. # Shuffle data
  17. reduced_set = reduced_set.reindex(np.random.permutation(reduced_set.index))
  18. reduced_set.head(5)
  19.  
  20. # Split into train and test
  21. X, y = reduced_set.drop(['sentiment'], axis=1), reduced_set['sentiment']
  22.  
  23. X_train, X_test, y_train, y_test = train_test_split(X, y,
  24. test_size=0.33,
  25. random_state=0)
Add Comment
Please, Sign In to add comment