Advertisement
ktbyte

Credit Card Default

Nov 2nd, 2016
192
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.14 KB | None | 0 0
  1. Summary: We will predict the probability of "Credit Card Default" on a simulated dataset of 10,000 customers using a "Dense Neural Network". The data comes from:
  2.  
  3. Games, G., Witten, D., Hastie, T., and Tibshirani, R. (2013) An Introduction to Statistical Learning with applications in R, www.StatLearning.com, Springer-Verlag, New York
  4.  
  5.  
  6. << In class section >>
  7.  
  8. 0. Recall the Code used for the Iris flower dataset, we will be adapting that for this problem
  9.  
  10. 1. Use library(ISLR) in R to load the "ISLR" Default dataset
  11.  
  12. 2. Write the Default variable to a csv
  13.  
  14. 3. Copy the csv to the k05 server (or use linux ssh/winscp to copy the dataset into your own directory)
  15.  
  16.  
  17. << After class >>
  18.  
  19. 0. Use a Jupyter notebook to prototype your code, but run it on k05 (om.ktbyte.com:60122) for GPU accelerated performance
  20.  
  21. 1. Predict the probability of loan default using the other 3 columns. Try playing around with these hyperparemeters to increase your in-sample and out-of-sample accuracy. Make a list of each parameters that you tried along with the two accuracy figures:
  22.  
  23. a. The number of epochs
  24.  
  25. b. The batch size
  26.  
  27. c. The depth of neural network (number of layers)
  28.  
  29. d. The number of nodes in each layer
  30.  
  31. e. The activation function
  32.  
  33. f. The layer initialization
  34.  
  35. g. Adding dropout layers for regularization
  36.  
  37. h. The optimizer and loss functions
  38.  
  39. Now, using your best model, answer the remaining questions:
  40.  
  41. 2. How accurate can you get if you only use 1 of the 3 columns? (provide 3 answers)
  42.  
  43. 3. How accurate can you get if you only use 2 of the 3 columns? (provide 3 answers)
  44.  
  45. 4. In the linear regression context, we would sometimes perform a transform (such as a logarithm) on our dataset prior to performing a linear fit. For linear regressions, the impact can be very significant. That is not as necessary here, but let's see if it makes a difference. Plot dataframe['balance'].plot.hist() and also plot the 'income' column. Try transforming the data to make it more normally distributed (for example reducing the tail size on balance). Do one or two fits on the new data and record your results.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement