Advertisement
kmishra9

Prevalence RTutor Problemset

Jun 28th, 2018
50
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 3.68 KB | None | 0 0
  1. # Problemset Prevalence
  2.  
  3. title: "Prevalence Tutorial"
  4. author: "Jade Benjamin-Chung"
  5. date: "6/28/2018"
  6.  
  7. #< ignore
  8. ```{r setup, include=FALSE}
  9. library(dplyr)
  10. library(RTutor)
  11. library(here)
  12.  
  13. knitr::opts_chunk$set(echo = FALSE)
  14. df=data.frame(x=rep(c("Treatment","Control"),5),y=c(0,0,0,1,1,0,1,1,0,1))
  15.  
  16. setwd(here("prevalenceRTutor"))
  17.  
  18. ps.name = "prevalence"
  19. sol.file = paste0(ps.name,"_sol.Rmd")
  20.  
  21. libs = c("dplyr", "here") # character vector of all packages you load in the problem set
  22. #name.rmd.chunks(sol.file) # set auto chunk names in this file
  23.  
  24. create.ps(sol.file=sol.file, ps.name=ps.name, user.name=NULL,
  25. libs=libs, extra.code.file=NULL, var.txt.file=NULL)
  26.  
  27. # When you want to solve in the browser
  28. show.ps(ps.name,launch.browser=TRUE, load.sav=FALSE,
  29. sample.solution=FALSE, is.solved=FALSE)
  30. ```
  31. #>
  32.  
  33. ## Exercise 1 -- Calculate prevalence
  34.  
  35. ### Epidemiology pre-requisites
  36. Before completing this tutorial, we recommend that you become familiar with 2x2 tables and calculating prevalence.
  37.  
  38. ### Tools used in this tutorial
  39. In this tutorial, we will use the `dplyr` package. We will:
  40.  
  41. - Use the pipe operator (`%>%`) to perform operations on a dataframe
  42. - Use the `group_by` command to sort data to prepare for calculation within groups
  43. - Use the `filter` command to subset the data by particular values
  44. - Use the `summarise` command to count values and estimate the mean and other quantities
  45.  
  46. In this tutorial, you'll learn how to make a 2x2 table in R. Then you'll learn how to calculate prevalence. We'll be using a simulated dataset to make these calculations.
  47.  
  48. ### Step 1: View the data
  49.  
  50. Read in the example data frame then view the first few rows of the data frame using the `head` command.
  51.  
  52. ```{r "1 a)"}
  53. #< task_notest
  54. head(df)
  55. #>
  56. ```
  57.  
  58. ### Step 2: Obtain the number with and without the disease ($y$)
  59. To count the number of people with the disease, the code below uses the pipe operator `%>%` to perform an operation on the dataframe `df`. First, we filter the dataframe to only include rows where y==1. Then the next pipe indicates that the summarise command is performed on the filtered dataset. In this filtered dataset, we count the number of rows using the `summarise(n())` command. `summarise` can be used for a variety of calculations, such as the mean. `n()` is used with summarise to count rows.
  60.  
  61. ```{r "1 b)"}
  62. #< task_notest
  63. df %>% filter(y==1) %>% summarise(n())
  64. #>
  65. ```
  66.  
  67.  
  68. Now modify the code above to count the number of observations with y==0. You should replace the entirety of the "YOUR CODE HERE" string with your own code.
  69. ```{r "1 c)"}
  70. #< task_notest
  71. # Number of observations with y==0
  72. #>
  73. df %>% filter(y==0) %>% summarise(n())
  74.  
  75. #< hint
  76. display("Try changing the filter condition so that you're looking at all rows where y==0")
  77. #>
  78. ```
  79.  
  80. #< quiz "Counting"
  81. question: How many observations were there with y==0?
  82. sc:
  83. - 1
  84. - 3
  85. - 5*
  86. - 7
  87. - 10
  88. success: Great, your answer is correct!
  89. failure: Try again.
  90. #>
  91.  
  92. ### Step 3: Calculate the prevalence in the whole sample
  93.  
  94. To calculate prevalence in the whole sample pooling across the treatment and control group. The following code uses the pipe operator (`%>%`) to perform the `summarise` command on the dataframe `df`. `prevalence` is a name we gave to the result. `mean(y)` is the mean of `y`, which in this case is a proportion because y is binary.
  95.  
  96. ```{r "1 d)"}
  97. #< task_notest
  98. df %>% summarise(prevalence=mean(y))
  99. #>
  100. ```
  101.  
  102. #< quiz "Calculating Prevalence"
  103. question: What is the prevalence?
  104. sc:
  105. - 0.2
  106. - 0.5*
  107. - 0.7
  108. success: Great, your answer is correct!
  109. failure: Try again.
  110. #>
  111.  
  112. #< award "Prevalence Wizard"
  113. You finished the tutorial!
  114. #>
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement