Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # Problemset Prevalence
- title: "Prevalence Tutorial"
- author: "Jade Benjamin-Chung"
- date: "6/28/2018"
- #< ignore
- ```{r setup, include=FALSE}
- library(dplyr)
- library(RTutor)
- library(here)
- knitr::opts_chunk$set(echo = FALSE)
- df=data.frame(x=rep(c("Treatment","Control"),5),y=c(0,0,0,1,1,0,1,1,0,1))
- setwd(here("prevalenceRTutor"))
- ps.name = "prevalence"
- sol.file = paste0(ps.name,"_sol.Rmd")
- libs = c("dplyr", "here") # character vector of all packages you load in the problem set
- #name.rmd.chunks(sol.file) # set auto chunk names in this file
- create.ps(sol.file=sol.file, ps.name=ps.name, user.name=NULL,
- libs=libs, extra.code.file=NULL, var.txt.file=NULL)
- # When you want to solve in the browser
- show.ps(ps.name,launch.browser=TRUE, load.sav=FALSE,
- sample.solution=FALSE, is.solved=FALSE)
- ```
- #>
- ## Exercise 1 -- Calculate prevalence
- ### Epidemiology pre-requisites
- Before completing this tutorial, we recommend that you become familiar with 2x2 tables and calculating prevalence.
- ### Tools used in this tutorial
- In this tutorial, we will use the `dplyr` package. We will:
- - Use the pipe operator (`%>%`) to perform operations on a dataframe
- - Use the `group_by` command to sort data to prepare for calculation within groups
- - Use the `filter` command to subset the data by particular values
- - Use the `summarise` command to count values and estimate the mean and other quantities
- In this tutorial, you'll learn how to make a 2x2 table in R. Then you'll learn how to calculate prevalence. We'll be using a simulated dataset to make these calculations.
- ### Step 1: View the data
- Read in the example data frame then view the first few rows of the data frame using the `head` command.
- ```{r "1 a)"}
- #< task_notest
- head(df)
- #>
- ```
- ### Step 2: Obtain the number with and without the disease ($y$)
- To count the number of people with the disease, the code below uses the pipe operator `%>%` to perform an operation on the dataframe `df`. First, we filter the dataframe to only include rows where y==1. Then the next pipe indicates that the summarise command is performed on the filtered dataset. In this filtered dataset, we count the number of rows using the `summarise(n())` command. `summarise` can be used for a variety of calculations, such as the mean. `n()` is used with summarise to count rows.
- ```{r "1 b)"}
- #< task_notest
- df %>% filter(y==1) %>% summarise(n())
- #>
- ```
- Now modify the code above to count the number of observations with y==0. You should replace the entirety of the "YOUR CODE HERE" string with your own code.
- ```{r "1 c)"}
- #< task_notest
- # Number of observations with y==0
- #>
- df %>% filter(y==0) %>% summarise(n())
- #< hint
- display("Try changing the filter condition so that you're looking at all rows where y==0")
- #>
- ```
- #< quiz "Counting"
- question: How many observations were there with y==0?
- sc:
- - 1
- - 3
- - 5*
- - 7
- - 10
- success: Great, your answer is correct!
- failure: Try again.
- #>
- ### Step 3: Calculate the prevalence in the whole sample
- To calculate prevalence in the whole sample pooling across the treatment and control group. The following code uses the pipe operator (`%>%`) to perform the `summarise` command on the dataframe `df`. `prevalence` is a name we gave to the result. `mean(y)` is the mean of `y`, which in this case is a proportion because y is binary.
- ```{r "1 d)"}
- #< task_notest
- df %>% summarise(prevalence=mean(y))
- #>
- ```
- #< quiz "Calculating Prevalence"
- question: What is the prevalence?
- sc:
- - 0.2
- - 0.5*
- - 0.7
- success: Great, your answer is correct!
- failure: Try again.
- #>
- #< award "Prevalence Wizard"
- You finished the tutorial!
- #>
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement