Advertisement
Guest User

Untitled

a guest
Mar 25th, 2017
82
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.72 KB | None | 0 0
  1. census = pd.read_csv('hw5_census_dist/train_data.csv')
  2. census = census.drop('fnlwgt', axis=1)
  3. for category in census.keys():
  4.     replace = None
  5.     if type(census[category][0]) == str:
  6.         replace = census[category].mode()[0]
  7.     else:
  8.         replace = int(census[category].mean())
  9.     census[category] = census[category].replace('?', replace)
  10. census = shuffle(census)
  11. length = int(0.2 * len(census))
  12. census_training_data = pd.DataFrame.as_matrix(census.drop('label', axis=1)[length:])
  13. census_validation_data = pd.DataFrame.as_matrix(census.drop('label', axis=1)[:length])
  14. census_training_labels = pd.DataFrame.as_matrix(census['label'][length:])
  15. census_validation_labels = pd.DataFrame.as_matrix(census['label'][:length])
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement