Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ```{r, results='hide', message=FALSE, warning=FALSE}
- library(mdsr)
- library(mosaic)
- library(tidyverse)
- library(gapminder)
- library("rmarkdown")
- library(dplyr)
- library(class)
- library(e1071)
- library(ROCR)
- library(NHANES)
- library(rpart)
- library(rpart.plot)
- library(tidyr)
- ```
- 1. If you look at the class notes, you will see that the dataset NHANES has 76 variables. In this homework, you will select a subset of the data targeting a specific group of people doing analysis using the filter command. For example, If we looking at the group of married people, you will use;
- ```{r, results='hide', message=FALSE, warning=FALSE}
- Married1 <- filter(NHANES, MaritalStatus =="Married")
- Married1
- ```
- 2. Count the percentage of married persons have Diabetes
- ```{r, results='hide', message=FALSE, warning=FALSE}
- tally(~ Diabetes, data= people, format = "percent")
- ```
- 3. Using the Married people's data, select columns of Age, Diabetes, BMI, TotChol, PhysActive from the data
- ```{r, results='hide', message=FALSE, warning=FALSE}
- NHANES %>%
- select(c("TotChol","Age","Diabetes","BMI", "PhysActive"))
- ```
- 4. Use rpart to generate a decision tree classification model using:
- ```{r, results='hide', message=FALSE, warning=FALSE}
- whoIsDiabetic <-rpart( Diabetes ~ Age +BMI +TotChol +PhysActive, data =people, control =rpart.control(cp =0.005, minbucket = 30))
- rpart.plot(whoIsDiabetic, extra=4)
- whoIsDiabetic
- ```
- 6. Discuss any difference between this decision tree and the first one decision tree shown in the lecture notes.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement