Untitled

Today, we're going to:

1. Discuss train vs test data and the idea of "Cross-Validation"

2. Look at a 'real' dataset, the pima-indian-diabetes.csv

3. Look at the 'iris' flower dataset. Use that to classify a 'hypothetical' flower.

4. Look at how to use other languages, like R, to load a dataset (the 'Credit Card Default' dataset'), which is homework

5. Discuss the "hill climbing" / "rolling ball" analogies for back propagation

5a. Discuss the 'activation function' (https://keras.io/activations/), which

 - obviously cannot choose linear because linear combinations of linear functions results in more linear functions. ( http://stackoverflow.com/questions/9782071/why-must-a-nonlinear-activation-function-be-used-in-a-backpropagation-neural-net )
 - sigmoid ("logistic") has bias towards positive values, while tanh does not, so we prefer tanh over sigmoid (except occasionally at the end when we want values to be between 0 and 1, but even then it's trivial to shift)
 - ReLU is preferred due to a) constant gradient, b) sparsity, c) still can construct any function

5b. Discuss the 'loss' function ("Objective", https://keras.io/objectives/),

 - MSE/MAE vs Cross-Entropy ( https://jamesmccaffrey.wordpress.com/2013/11/05/why-you-should-use-cross-entropy-error-instead-of-classification-error-or-mean-squared-error-for-neural-network-classifier-training/ )

5c. Discussing the 'optimizer' ("https://keras.io/optimizers/")

SGD - Standard hill climbing
Momentum  ( https://www.quora.com/What-are-differences-between-update-rules-like-AdaDelta-RMSProp-AdaGrad-and-AdaM )

More advanced methods: http://sebastianruder.com/content/images/2016/09/saddle_point_evaluation_optimizers.gif

'Adam' is the current state of the art

6. If we have time, discussing regularization and dropout


Homework

Summary: We will predict the probability of "Credit Card Default" on a simulated dataset of 10,000 customers using a "Dense Neural Network". The data comes from:

    Games, G., Witten, D., Hastie, T., and Tibshirani, R. (2013) An Introduction to Statistical Learning with applications in R, www.StatLearning.com, Springer-Verlag, New York

<< In class section >>
0. Recall the Code used for the Iris flower dataset, we will be adapting that for this problem
1. Use library(ISLR) in R to load the "ISLR" Default dataset
2. Write the Default variable to a csv
3. Copy the csv to the k05 server (or use linux ssh/winscp to copy the dataset into your own directory)

<< After class >>

0. Use a Jupyter notebook to prototype your code, but run it on k05 (om.ktbyte.com:60122) for GPU accelerated performance

1. Predict the probability of loan default using the other 3 columns. Try playing around with these hyper-parameters to increase your in-sample and out-of-sample accuracy. Make a list of each parameters that you tried along with the two accuracy figures:

    a. The number of epochs
    b. The batch size
    c. The depth of neural network (number of layers)
    d. The number of nodes in each layer

You may also want to adjust the activation function, optimizer and loss functions

We will be discussing the effects of dropout for regularization next class.

Now, using your best model, answer the remaining questions:

2. How accurate can you get if you only use 1 of the 3 columns? (provide 3 answers)
3. How accurate can you get if you only use 2 of the 3 columns? (provide 3 answers)
4. Come up with a creative way to communicate your findings, such as:
 - Which hyper-parameters were important? How important? E.g. does setting some hyper-parameters to certain values make the algorithm 'not learn' or 'do learn'?
 - What was the 'best' you could do on the 20% Test validation score?