JustCaused

IS - Skripta 3

Jun 3rd, 2023 (edited)
96
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 3.52 KB | None | 0 0
  1. ##########################
  2. # Linear Regression
  3. ##########################
  4.  
  5. # load MASS, corrplot and ggplot2
  6. #install.packages('corrplot')
  7.  
  8. # examine the structure of the Boston dataset
  9.  
  10. # bring out the docs for the dataset
  11.  
  12. # compute the correlation matrix
  13.  
  14. # one option for plotting correlations: using colors to represent the extent of correlation
  15.  
  16. # another option, with both colors and exact correlation scores
  17.  
  18. # plot *lstat* against the response variable
  19.  
  20. # plot *rm* against the response variable
  21.  
  22. #############################################
  23. # Split the data into training and test sets
  24. #############################################
  25.  
  26. # install.packages('caret')
  27.  
  28. # assure the replicability of the results by setting the seed
  29.  
  30. # generate indices of the observations to be selected for the training set
  31. # select observations at the positions defined by the train.indices vector
  32. # select observations at the positions that are NOT in the train.indices vector
  33.  
  34.  
  35. ##########################
  36. # Simple Linear Regression
  37. ##########################
  38.  
  39. # build an lm model with a formula: medv ~ lstat
  40. # print the model summary
  41.  
  42. # print all attributes stored in the fitted model
  43.  
  44. # print the coefficients
  45.  
  46. # print the coefficients with the coef() f.
  47.  
  48. # compute the RSS
  49. # compute 95% confidence interval
  50.  
  51. # plot the data points and the regression line
  52.  
  53. ##########################
  54. ## Making predictions
  55. ##########################
  56.  
  57. # calculate the predictions with the fitted model over the test data
  58.  
  59. # calculate the predictions with the fitted model over the test data, including the confidence interval
  60.  
  61. # calculate the predictions with the fitted model over the test data, including the prediction interval
  62.  
  63. ##########################
  64. ## Diagnostic Plots
  65. ##########################
  66.  
  67. # split the plotting area into 4 cells
  68.  
  69. # print the diagnostic plots
  70. # reset the plotting area
  71. # compute the leverage statistic
  72.  
  73. # calculate the number of high leverage points
  74.  
  75. ###############################
  76. ## Multiple Linear Regression
  77. ###############################
  78.  
  79. # generate the scatterplots for variables medv, lstat, rm, ptratio
  80.  
  81. # build an lm model with a train dataset using the formula: medv ~ lstat + rm + ptratio
  82.  
  83. # print the model summary
  84. # calculate the predictions with the lm2 model over the test data
  85.  
  86. # print out a few predictions
  87.  
  88. # combine the test set with the predictions
  89.  
  90. # plot actual (medv) vs. predicted values
  91.  
  92. # calculate RSS
  93.  
  94. # calculate TSS
  95.  
  96. # calculate R-squared on the test data
  97.  
  98. # calculate RMSE
  99.  
  100. # compare medv mean to the RMSE
  101.  
  102. # build an lm model with the training set using all of the variables except chas
  103. # note the use of '.' to mean all variables and the use of '-' to exclude the chas variable
  104.  
  105. # print the model summary
  106.  
  107. # check for multicolinearity using the vif function (from the 'car' package)
  108. # calculate vif
  109.  
  110. # calculate square root of the VIF
  111.  
  112. # build an lm model with the training set using all of the variables except chas and tax
  113. # (multicolinearity was detected for 'tax')
  114.  
  115. # check the VIF scores again
  116.  
  117. # next, we will exclude *nox* and build a new model (lm5):
  118. # The *dis* variable is the edge case
  119.  
  120. # The summary of lm5 indicated that *dis* should be excluded
  121.  
  122. # calculate the predictions with the new model over the test data
  123. # print out a few predictions
  124.  
  125. # combine the test set with the predictions
  126.  
  127. # plot actual (medv) vs. predicted values
  128.  
  129. # calculate RSS
  130.  
  131. # calculate R-squared on the test data
  132. # calculate RMSE
  133.  
  134.  
Advertisement
Add Comment
Please, Sign In to add comment