Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # filter
- ![](slides/images/filter.png)
- We use `filter` to choose a subset of cases.
- Use `filter` to keep only respondents who are divorced. Then, use `select` to show only the `marital_status` variable.
- ```{r}
- nhanes %>%
- filter(marital_status == "Divorced") %>%
- select(marital_status)
- ```
- Use `filter` to keep only respondents who are **not** divorced. Then, use `select` to show only the `marital_status` variable.
- ```{r}
- nhanes %>%
- filter(marital_status != "Divorced") %>%
- select(marital_status)
- ```
- Use `filter` to keep only respondents who are divorced or separated. Then, use `select` to show only the `marital_status` variable.
- ```{r}
- nhanes %>%
- filter(marital_status == "Divorced" | marital_status == "Separated") %>%
- select(marital_status)
- ```
- Use `%in%` within the `filter` function to keep only those who are divorced, separated, or widowed. Then, use `select` to show only the `marital_status` variable.
- ```{r}
- nhanes %>%
- filter(marital_status %in% c("Divorced", "Separated", "Widowed")) %>%
- select(marital_status)
- ```
- We can chain together multiple `filter` functions. Doing it this way, we don't have create complex logic in one line.
- Create a chain that keeps only those are college grads (line #1). Then, `filter` to keep only those who are divorced, separated, or widowed. Finally, use `select` to show only the `education` and `marital_status` variables.
- ```{r}
- nhanes %>%
- filter(education == "College Grad") %>%
- filter(marital_status %in% c("Divorced", "Separated", "Widowed")) %>%
- select(education, marital_status)
- ```
- We can use Use `<`, `>`, `<=`, and `=>` for numeric data.
- Use `filter` to only show those reported at least 5 days of physical activity in the last 30 days (this is the `phys_active_days` variable). Then, use `select` to keep only the `phys_active_days` and the `days_phys_hlth_bad` variables.
- ```{r}
- nhanes %>%
- filter(phys_active_days >= 5) %>%
- select(phys_active_days, days_phys_hlth_bad)
- ```
- We can drop `NAs` with `!is.na`
- Do the same thing as above, but drop responses that don't have a response for `days_phys_hlth_bad`. Then, use `select` to keep only the `phys_active_days` and the `days_phys_hlth_bad` variables.
- ```{r}
- nhanes %>%
- filter(phys_active_days >= 5) %>%
- filter(!is.na(days_phys_hlth_bad)) %>%
- select(phys_active_days, days_phys_hlth_bad)
- ```
- You can also drop `NAs` with `drop_na`
- Do the same thing as above, but use `drop_na` instead of `!is.na`. Make sure you get the same result!
- ```{r}
- nhanes %>%
- filter(phys_active_days >= 5) %>%
- drop_na(days_phys_hlth_bad) %>%
- select(phys_active_days, days_phys_hlth_bad)
- ```
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement