Advertisement
Guest User

Untitled

a guest
Jan 22nd, 2018
56
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.71 KB | None | 0 0
  1. ---
  2. title: "Homework 1"
  3. author: "Nick Meyer"
  4. output:
  5. word_document: default
  6. pdf_document: default
  7. ---
  8.  
  9. ```{r setup, include=TRUE, echo=FALSE}
  10. knitr::opts_chunk$set(echo = TRUE)
  11. suppressMessages({library(smooth)
  12. library(tidyverse)
  13. library(car)
  14. library(leaps)
  15. library(bestglm)
  16. attach("./../Regression.Rdata", name="Regression")})
  17. ```
  18.  
  19. ## King County Housing Prices
  20.  
  21.  
  22. ```{r DataAcqFmt, include=TRUE}
  23. king <- read.csv("./../data/KingCountyHomes_train.csv")
  24. king$waterfront <- king$waterfront %>% as.factor
  25. king$renovated <- king$renovated %>% as.factor
  26.  
  27. king <- subset(king, select=-ID)
  28. king.test <- read.csv("./../data/KingCountyHomes_test.csv")
  29. king.test$waterfront <- king.test$waterfront %>% as.factor
  30. king.test$renovated <- king.test$renovated %>% as.factor
  31.  
  32. king.test <- subset(king.test, select=-ID)
  33. summary(king)
  34. ggplot(king, aes(x=price)) +
  35. geom_density()
  36.  
  37. ```
  38.  
  39. ## Task 1
  40. ### Part A: Naive OLS
  41.  
  42. ```{r NaiveOLS, include=TRUE}
  43. king.baseOLS <- lm(price~., data=king)
  44.  
  45. print(summary(king.baseOLS))
  46.  
  47. par(mfrow=c(2,2))
  48. plot(king.baseOLS)
  49. par(mfrow=c(1,1))
  50.  
  51. #print(VIF(king.baseOLS))
  52. ```
  53. Our model does not account for colinearity, in fact, when I try to run `VIF`, it simply fails due to singularities (specifically the fact that `king$sqft_basement` can be zero messes with things.) It also does not take into account teh fact that `sqft_living = sqft_above + sqft_basement`, which is probably screwing up the model even more.
  54. $R^2_{adj} = 0.6981$, which isn't terrible, but is certainly not great.
  55.  
  56. Let's try to find which variables are not independent: (ie find a linear combination)
  57. ```{r linCombs}
  58. lcs <- findLinearCombos(data.matrix(king))
  59. #which col are we getting rid of?
  60. sprintf("There is a Linear Combination between the following columns: %s", lcs$linearCombos)
  61. sprintf("Column[s] that needs to be removed is/are %s. (index %i)", names(king)[lcs$remove], lcs$remove)
  62.  
  63. king.fixed <- king %>% select(-.[[lcs$remove]])
  64. ```
  65. So then the `sqft_basement` column is redundant, and in task 2, we will eliminate it.
  66.  
  67.  
  68. ### Part B: Stepwise Models
  69.  
  70. ```{r stepOLS, include=TRUE}
  71. king.full <- regsubsets(price ~., data=king, nvmax=length(names(king)))
  72.  
  73. summary(king.full)
  74. king.full.summary <- summary(king.full)
  75. names(king.full.summary)
  76.  
  77. which.max(king.full.summary$adjr2)
  78. which.min(king.full.summary$cp)
  79. which.min(king.full.summary$bic)
  80.  
  81. #par(mfrow=c(2,2))
  82. #plot(king.full, scale='r2')
  83. #plot(king.full, scale='adjr2')
  84. #plot(king.full, scale='Cp')
  85. #plot(king.full, scale='bic')
  86. #par(mfrow=c(1,1))
  87. ```
  88. ```{r bestOLS, include=TRUE}
  89. #king.xs <-subset(king, select=-price)
  90. #king.bestOLS <- bestglm(cbind(king.xs, king$price))
  91. #attributes(king.bestOLS)
  92. ```
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement