Credit Card Default

Summary: We will predict the probability of "Credit Card Default" on a simulated dataset of 10,000 customers using a "Dense Neural Network". The data comes from:

    Games, G., Witten, D., Hastie, T., and Tibshirani, R. (2013) An Introduction to Statistical Learning with applications in R, www.StatLearning.com, Springer-Verlag, New York


<< In class section >>

0. Recall the Code used for the Iris flower dataset, we will be adapting that for this problem

1. Use library(ISLR) in R to load the "ISLR" Default dataset

2. Write the Default variable to a csv

3. Copy the csv to the k05 server (or use linux ssh/winscp to copy the dataset into your own directory)


<< After class >>

0. Use a Jupyter notebook to prototype your code, but run it on k05 (om.ktbyte.com:60122) for GPU accelerated performance

1. Predict the probability of loan default using the other 3 columns. Try playing around with these hyperparemeters to increase your in-sample and out-of-sample accuracy. Make a list of each parameters that you tried along with the two accuracy figures:

    a. The number of epochs

    b. The batch size

    c. The depth of neural network (number of layers)

    d. The number of nodes in each layer

    e. The activation function

    f. The layer initialization

    g. Adding dropout layers for regularization

    h. The optimizer and loss functions

Now, using your best model, answer the remaining questions:

2. How accurate can you get if you only use 1 of the 3 columns? (provide 3 answers)

3. How accurate can you get if you only use 2 of the 3 columns? (provide 3 answers)

4. In the linear regression context, we would sometimes perform a transform (such as a logarithm) on our dataset prior to performing a linear fit. For linear regressions, the impact can be very significant. That is not as necessary here, but let's see if it makes a difference. Plot dataframe['balance'].plot.hist() and also plot the 'income' column. Try transforming the data to make it more normally distributed (for example reducing the tail size on balance). Do one or two fits on the new data and record your results.