Advertisement
Guest User

Untitled

a guest
Apr 19th, 2019
97
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.59 KB | None | 0 0
  1. # filter
  2.  
  3. ![](slides/images/filter.png)
  4.  
  5. We use `filter` to choose a subset of cases.
  6.  
  7. Use `filter` to keep only respondents who are divorced. Then, use `select` to show only the `marital_status` variable.
  8.  
  9. ```{r}
  10. nhanes %>%
  11. filter(marital_status == "Divorced") %>%
  12. select(marital_status)
  13. ```
  14.  
  15. Use `filter` to keep only respondents who are **not** divorced. Then, use `select` to show only the `marital_status` variable.
  16.  
  17. ```{r}
  18. nhanes %>%
  19. filter(marital_status != "Divorced") %>%
  20. select(marital_status)
  21. ```
  22.  
  23.  
  24.  
  25. Use `filter` to keep only respondents who are divorced or separated. Then, use `select` to show only the `marital_status` variable.
  26.  
  27. ```{r}
  28. nhanes %>%
  29. filter(marital_status == "Divorced" | marital_status == "Separated") %>%
  30. select(marital_status)
  31. ```
  32.  
  33.  
  34. Use `%in%` within the `filter` function to keep only those who are divorced, separated, or widowed. Then, use `select` to show only the `marital_status` variable.
  35.  
  36. ```{r}
  37. nhanes %>%
  38. filter(marital_status %in% c("Divorced", "Separated", "Widowed")) %>%
  39. select(marital_status)
  40. ```
  41.  
  42. We can chain together multiple `filter` functions. Doing it this way, we don't have create complex logic in one line.
  43.  
  44. Create a chain that keeps only those are college grads (line #1). Then, `filter` to keep only those who are divorced, separated, or widowed. Finally, use `select` to show only the `education` and `marital_status` variables.
  45.  
  46. ```{r}
  47. nhanes %>%
  48. filter(education == "College Grad") %>%
  49. filter(marital_status %in% c("Divorced", "Separated", "Widowed")) %>%
  50. select(education, marital_status)
  51. ```
  52.  
  53.  
  54.  
  55. We can use Use `<`, `>`, `<=`, and `=>` for numeric data.
  56.  
  57. Use `filter` to only show those reported at least 5 days of physical activity in the last 30 days (this is the `phys_active_days` variable). Then, use `select` to keep only the `phys_active_days` and the `days_phys_hlth_bad` variables.
  58.  
  59. ```{r}
  60. nhanes %>%
  61. filter(phys_active_days >= 5) %>%
  62. select(phys_active_days, days_phys_hlth_bad)
  63. ```
  64.  
  65.  
  66. We can drop `NAs` with `!is.na`
  67.  
  68. Do the same thing as above, but drop responses that don't have a response for `days_phys_hlth_bad`. Then, use `select` to keep only the `phys_active_days` and the `days_phys_hlth_bad` variables.
  69.  
  70. ```{r}
  71. nhanes %>%
  72. filter(phys_active_days >= 5) %>%
  73. filter(!is.na(days_phys_hlth_bad)) %>%
  74. select(phys_active_days, days_phys_hlth_bad)
  75. ```
  76.  
  77.  
  78. You can also drop `NAs` with `drop_na`
  79.  
  80. Do the same thing as above, but use `drop_na` instead of `!is.na`. Make sure you get the same result!
  81.  
  82. ```{r}
  83. nhanes %>%
  84. filter(phys_active_days >= 5) %>%
  85. drop_na(days_phys_hlth_bad) %>%
  86. select(phys_active_days, days_phys_hlth_bad)
  87. ```
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement