Advertisement
JustCaused

IS - Skripta 4

Jun 4th, 2023
40
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.38 KB | None | 0 0
  1. ##########################
  2. # Classification trees
  3. ##########################
  4.  
  5. # load ISLR package
  6. # install.packages('ISLR')
  7.  
  8. # get the Carseats dataset docs
  9.  
  10. # examine dataset structure
  11.  
  12. # examine Sales variable distribution
  13.  
  14. # get the 3rd quartile of the Sales variable
  15.  
  16. # create a new variable HighSales based on the value of the Sales variable
  17.  
  18. # check the type of the HighSales variable
  19.  
  20. # convert HighSales into a factor variable
  21.  
  22. # get the distribution of the HighSales variable
  23.  
  24. ##################################
  25. # Create train and test datasets
  26. ##################################
  27.  
  28. # remove Sales variable
  29.  
  30. # load caret package
  31.  
  32. # create train and test datasets
  33.  
  34. # print distributions of the outcome variable on train and test datasets
  35.  
  36. #######################################################
  37. # Create a prediction model using Classification Trees
  38. #######################################################
  39.  
  40. # load rpart library
  41.  
  42. # rpart uses random sampling, so, we have to set the seed value before calling the function
  43.  
  44. # build the model
  45.  
  46. # print the model
  47.  
  48. # load rpart.plot library
  49. # install.packages("rpart.plot")
  50.  
  51. # plot the tree
  52.  
  53.  
  54. # make the predictions with tree1 over the test dataset
  55.  
  56. # print several predictions
  57.  
  58. # create the confusion matrix
  59.  
  60. # function for computing evaluation measures
  61.  
  62. # compute the evaluation metrics
  63.  
  64. # get the docs for the rpart.control function
  65.  
  66. # build the second model with minsplit = 10 and cp = 0.001
  67.  
  68. # print the model
  69.  
  70. # plot the tree2
  71.  
  72. # make the predictions with tree2 over the test dataset
  73.  
  74. # create the confusion matrix for tree2 predictions
  75.  
  76. # compute the evaluation metrics
  77.  
  78. # compare the evaluation metrics for tree1 and tree2
  79.  
  80. # load e1071 library
  81. # install.packages('e1071')
  82.  
  83. # define cross-validation (cv) parameters; we'll do 10-fold cross-validation
  84.  
  85. # then, define the range of the cp values to examine in the cross-validation
  86.  
  87. # since cross-validation is a probabilistic process, we need to set the seed
  88. # so that the results can be replicated
  89.  
  90. # run the cross-validation
  91.  
  92. # plot the cross-validation results
  93.  
  94.  
  95. # create tree2 using the new cp value
  96.  
  97. # plot the new tree
  98.  
  99. # make the predictions with tree3 over the test dataset
  100.  
  101. # create the confusion matrix for tree3 predictions
  102.  
  103. # compute the evaluation metrics
  104.  
  105. # compare the evaluation metrics for tree1, tree2 and tree3
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement