# IS - Skripta 3

Jun 3rd, 2023 (edited)
46
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
1. ##########################
2. # Linear Regression
3. ##########################
4.
5. # load MASS, corrplot and ggplot2
6. #install.packages('corrplot')
7.
8. # examine the structure of the Boston dataset
9.
10. # bring out the docs for the dataset
11.
12. # compute the correlation matrix
13.
14. # one option for plotting correlations: using colors to represent the extent of correlation
15.
16. # another option, with both colors and exact correlation scores
17.
18. # plot *lstat* against the response variable
19.
20. # plot *rm* against the response variable
21.
22. #############################################
23. # Split the data into training and test sets
24. #############################################
25.
26. # install.packages('caret')
27.
28. # assure the replicability of the results by setting the seed
29.
30. # generate indices of the observations to be selected for the training set
31. # select observations at the positions defined by the train.indices vector
32. # select observations at the positions that are NOT in the train.indices vector
33.
34.
35. ##########################
36. # Simple Linear Regression
37. ##########################
38.
39. # build an lm model with a formula: medv ~ lstat
40. # print the model summary
41.
42. # print all attributes stored in the fitted model
43.
44. # print the coefficients
45.
46. # print the coefficients with the coef() f.
47.
49. # compute 95% confidence interval
50.
51. # plot the data points and the regression line
52.
53. ##########################
54. ## Making predictions
55. ##########################
56.
57. # calculate the predictions with the fitted model over the test data
58.
59. # calculate the predictions with the fitted model over the test data, including the confidence interval
60.
61. # calculate the predictions with the fitted model over the test data, including the prediction interval
62.
63. ##########################
64. ## Diagnostic Plots
65. ##########################
66.
67. # split the plotting area into 4 cells
68.
69. # print the diagnostic plots
70. # reset the plotting area
71. # compute the leverage statistic
72.
73. # calculate the number of high leverage points
74.
75. ###############################
76. ## Multiple Linear Regression
77. ###############################
78.
79. # generate the scatterplots for variables medv, lstat, rm, ptratio
80.
81. # build an lm model with a train dataset using the formula: medv ~ lstat + rm + ptratio
82.
83. # print the model summary
84. # calculate the predictions with the lm2 model over the test data
85.
86. # print out a few predictions
87.
88. # combine the test set with the predictions
89.
90. # plot actual (medv) vs. predicted values
91.
93.
94. # calculate TSS
95.
96. # calculate R-squared on the test data
97.
98. # calculate RMSE
99.
100. # compare medv mean to the RMSE
101.
102. # build an lm model with the training set using all of the variables except chas
103. # note the use of '.' to mean all variables and the use of '-' to exclude the chas variable
104.
105. # print the model summary
106.
107. # check for multicolinearity using the vif function (from the 'car' package)
108. # calculate vif
109.
110. # calculate square root of the VIF
111.
112. # build an lm model with the training set using all of the variables except chas and tax
113. # (multicolinearity was detected for 'tax')
114.
115. # check the VIF scores again
116.
117. # next, we will exclude *nox* and build a new model (lm5):
118. # The *dis* variable is the edge case
119.
120. # The summary of lm5 indicated that *dis* should be excluded
121.
122. # calculate the predictions with the new model over the test data
123. # print out a few predictions
124.
125. # combine the test set with the predictions
126.
127. # plot actual (medv) vs. predicted values
128.