Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ---
- title: "Homework 1"
- author: "Nick Meyer"
- output:
- word_document: default
- pdf_document: default
- ---
- ```{r setup, include=TRUE, echo=FALSE}
- knitr::opts_chunk$set(echo = TRUE)
- suppressMessages({library(smooth)
- library(tidyverse)
- library(car)
- library(leaps)
- library(bestglm)
- attach("./../Regression.Rdata", name="Regression")})
- ```
- ## King County Housing Prices
- ```{r DataAcqFmt, include=TRUE}
- king <- read.csv("./../data/KingCountyHomes_train.csv")
- king$waterfront <- king$waterfront %>% as.factor
- king$renovated <- king$renovated %>% as.factor
- king <- subset(king, select=-ID)
- king.test <- read.csv("./../data/KingCountyHomes_test.csv")
- king.test$waterfront <- king.test$waterfront %>% as.factor
- king.test$renovated <- king.test$renovated %>% as.factor
- king.test <- subset(king.test, select=-ID)
- summary(king)
- ggplot(king, aes(x=price)) +
- geom_density()
- ```
- ## Task 1
- ### Part A: Naive OLS
- ```{r NaiveOLS, include=TRUE}
- king.baseOLS <- lm(price~., data=king)
- print(summary(king.baseOLS))
- par(mfrow=c(2,2))
- plot(king.baseOLS)
- par(mfrow=c(1,1))
- #print(VIF(king.baseOLS))
- ```
- Our model does not account for colinearity, in fact, when I try to run `VIF`, it simply fails due to singularities (specifically the fact that `king$sqft_basement` can be zero messes with things.) It also does not take into account teh fact that `sqft_living = sqft_above + sqft_basement`, which is probably screwing up the model even more.
- $R^2_{adj} = 0.6981$, which isn't terrible, but is certainly not great.
- Let's try to find which variables are not independent: (ie find a linear combination)
- ```{r linCombs}
- lcs <- findLinearCombos(data.matrix(king))
- #which col are we getting rid of?
- sprintf("There is a Linear Combination between the following columns: %s", lcs$linearCombos)
- sprintf("Column[s] that needs to be removed is/are %s. (index %i)", names(king)[lcs$remove], lcs$remove)
- king.fixed <- king %>% select(-.[[lcs$remove]])
- ```
- So then the `sqft_basement` column is redundant, and in task 2, we will eliminate it.
- ### Part B: Stepwise Models
- ```{r stepOLS, include=TRUE}
- king.full <- regsubsets(price ~., data=king, nvmax=length(names(king)))
- summary(king.full)
- king.full.summary <- summary(king.full)
- names(king.full.summary)
- which.max(king.full.summary$adjr2)
- which.min(king.full.summary$cp)
- which.min(king.full.summary$bic)
- #par(mfrow=c(2,2))
- #plot(king.full, scale='r2')
- #plot(king.full, scale='adjr2')
- #plot(king.full, scale='Cp')
- #plot(king.full, scale='bic')
- #par(mfrow=c(1,1))
- ```
- ```{r bestOLS, include=TRUE}
- #king.xs <-subset(king, select=-price)
- #king.bestOLS <- bestglm(cbind(king.xs, king$price))
- #attributes(king.bestOLS)
- ```
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement