Guest User

Untitled

a guest
Jan 18th, 2018
53
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.57 KB | None | 0 0
  1. import numpy as np
  2. import pandas as pd
  3.  
  4. with open('./holographic.npz') as d:
  5. indices = d['arr_0']
  6. X_train = d['arr_1']
  7. X_val = d['arr_2']
  8. y_train = d['arr_3']
  9. y_val = d['arr_4']
  10.  
  11. """
  12. While loading data in the dataframe, some lines are incorrectly read, i.e., their tweet length is >140 since multiple tweets are read as single record. I have removed these records as:
  13. """
  14. data = pd.read_csv("./datasets/train/SemEval2018-T3-train-taskA_emoji.txt", sep="\t")
  15. data = data[data['Tweet text'].map(len)<=140]
  16.  
  17. # Now you can use the "indices" on the lists in this data dictionary.
Add Comment
Please, Sign In to add comment