Guest User

Untitled

a guest
Mar 1st, 2017
61
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 3.69 KB | None | 0 0
  1. Today, we're going to:
  2.  
  3. 1. Discuss train vs test data and the idea of "Cross-Validation"
  4.  
  5. 2. Look at a 'real' dataset, the pima-indian-diabetes.csv
  6.  
  7. 3. Look at the 'iris' flower dataset. Use that to classify a 'hypothetical' flower.
  8.  
  9. 4. Look at how to use other languages, like R, to load a dataset (the 'Credit Card Default' dataset'), which is homework
  10.  
  11. 5. Discuss the "hill climbing" / "rolling ball" analogies for back propagation
  12.  
  13. 5a. Discuss the 'activation function' (https://keras.io/activations/), which
  14.  
  15. - obviously cannot choose linear because linear combinations of linear functions results in more linear functions. ( http://stackoverflow.com/questions/9782071/why-must-a-nonlinear-activation-function-be-used-in-a-backpropagation-neural-net )
  16. - sigmoid ("logistic") has bias towards positive values, while tanh does not, so we prefer tanh over sigmoid (except occasionally at the end when we want values to be between 0 and 1, but even then it's trivial to shift)
  17. - ReLU is preferred due to a) constant gradient, b) sparsity, c) still can construct any function
  18.  
  19. 5b. Discuss the 'loss' function ("Objective", https://keras.io/objectives/),
  20.  
  21. - MSE/MAE vs Cross-Entropy ( https://jamesmccaffrey.wordpress.com/2013/11/05/why-you-should-use-cross-entropy-error-instead-of-classification-error-or-mean-squared-error-for-neural-network-classifier-training/ )
  22.  
  23. 5c. Discussing the 'optimizer' ("https://keras.io/optimizers/")
  24.  
  25. SGD - Standard hill climbing
  26. Momentum ( https://www.quora.com/What-are-differences-between-update-rules-like-AdaDelta-RMSProp-AdaGrad-and-AdaM )
  27.  
  28. More advanced methods: http://sebastianruder.com/content/images/2016/09/saddle_point_evaluation_optimizers.gif
  29.  
  30. 'Adam' is the current state of the art
  31.  
  32. 6. If we have time, discussing regularization and dropout
  33.  
  34.  
  35. Homework
  36.  
  37. Summary: We will predict the probability of "Credit Card Default" on a simulated dataset of 10,000 customers using a "Dense Neural Network". The data comes from:
  38.  
  39. Games, G., Witten, D., Hastie, T., and Tibshirani, R. (2013) An Introduction to Statistical Learning with applications in R, www.StatLearning.com, Springer-Verlag, New York
  40.  
  41. << In class section >>
  42. 0. Recall the Code used for the Iris flower dataset, we will be adapting that for this problem
  43. 1. Use library(ISLR) in R to load the "ISLR" Default dataset
  44. 2. Write the Default variable to a csv
  45. 3. Copy the csv to the k05 server (or use linux ssh/winscp to copy the dataset into your own directory)
  46.  
  47. << After class >>
  48.  
  49. 0. Use a Jupyter notebook to prototype your code, but run it on k05 (om.ktbyte.com:60122) for GPU accelerated performance
  50.  
  51. 1. Predict the probability of loan default using the other 3 columns. Try playing around with these hyper-parameters to increase your in-sample and out-of-sample accuracy. Make a list of each parameters that you tried along with the two accuracy figures:
  52.  
  53. a. The number of epochs
  54. b. The batch size
  55. c. The depth of neural network (number of layers)
  56. d. The number of nodes in each layer
  57.  
  58. You may also want to adjust the activation function, optimizer and loss functions
  59.  
  60. We will be discussing the effects of dropout for regularization next class.
  61.  
  62. Now, using your best model, answer the remaining questions:
  63.  
  64. 2. How accurate can you get if you only use 1 of the 3 columns? (provide 3 answers)
  65. 3. How accurate can you get if you only use 2 of the 3 columns? (provide 3 answers)
  66. 4. Come up with a creative way to communicate your findings, such as:
  67. - Which hyper-parameters were important? How important? E.g. does setting some hyper-parameters to certain values make the algorithm 'not learn' or 'do learn'?
  68. - What was the 'best' you could do on the 20% Test validation score?
Add Comment
Please, Sign In to add comment