Advertisement
Guest User

Untitled

a guest
Feb 9th, 2016
69
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.92 KB | None | 0 0
  1. import numpy as np
  2. from sklearn import preprocessing
  3. from sklearn.svm import SVC
  4.  
  5. '''
  6.    Load and process the data
  7.        > Separate positive and negative examples
  8.        > Cut the two new sets in half
  9.        > Concatenate to make two sets of equal pos/neg distributions
  10.        > Standardize the data based off mean and std deviation of the train set
  11.        > Shuffle the training and test sets
  12. '''
  13. print 'Loading data...'
  14. data = np.loadtxt('spambase.txt', delimiter=',')
  15. print 'Processing data...'
  16. pos = data[data[:,57] == 1]
  17. neg = data[data[:,57] == 0]
  18. pos1 = pos[:len(pos)/2]
  19. pos2 = pos[len(pos)/2:]
  20. neg1 = neg[:len(neg)/2]
  21. neg2 = neg[len(neg)/2:]
  22. train_set = np.vstack((pos1, neg1))
  23. test_set = np.vstack((pos2, neg2))
  24. scaler = preprocessing.StandardScaler().fit(train_set)
  25. train_set = scaler.transform(train_set)
  26. test_set = scaler.transform(test_set)
  27. np.random.shuffle(train_set)
  28. np.random.shuffle(test_set)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement