Prevalence RTutor Problemset

# Problemset Prevalence

title: "Prevalence Tutorial"
author: "Jade Benjamin-Chung"
date: "6/28/2018"

#< ignore
```{r setup, include=FALSE}
library(dplyr)
library(RTutor)
library(here)

knitr::opts_chunk$set(echo = FALSE)
df=data.frame(x=rep(c("Treatment","Control"),5),y=c(0,0,0,1,1,0,1,1,0,1))

setwd(here("prevalenceRTutor"))

ps.name = "prevalence"
sol.file = paste0(ps.name,"_sol.Rmd")

libs = c("dplyr", "here") # character vector of all packages you load in the problem set
#name.rmd.chunks(sol.file) # set auto chunk names in this file

create.ps(sol.file=sol.file, ps.name=ps.name, user.name=NULL,
          libs=libs, extra.code.file=NULL, var.txt.file=NULL)

# When you want to solve in the browser
show.ps(ps.name,launch.browser=TRUE, load.sav=FALSE,
        sample.solution=FALSE, is.solved=FALSE)
```
#>

## Exercise 1 -- Calculate prevalence

### Epidemiology pre-requisites
Before completing this tutorial, we recommend that you become familiar with 2x2 tables and calculating prevalence.

### Tools used in this tutorial
In this tutorial, we will use the `dplyr` package. We will:

- Use the pipe operator (`%>%`) to perform operations on a dataframe
- Use the `group_by` command to sort data to prepare for calculation within groups
- Use the `filter` command to subset the data by particular values
- Use the `summarise` command to count values and estimate the mean and other quantities

In this tutorial, you'll learn how to make a 2x2 table in R. Then you'll learn how to calculate prevalence. We'll be using a simulated dataset to make these calculations.

### Step 1: View the data

Read in the example data frame then view the first few rows of the data frame using the `head` command.

```{r "1 a)"}
#< task_notest
head(df)
#>
```

### Step 2: Obtain the number with and without the disease ($y$)
To count the number of people with the disease, the code below uses the pipe operator `%>%` to perform an operation on the dataframe `df`. First, we filter the dataframe to only include rows where y==1. Then the next pipe indicates that the summarise command is performed on the filtered dataset. In this filtered dataset, we count the number of rows using the `summarise(n())` command. `summarise` can be used for a variety of calculations, such as the mean. `n()` is used with summarise to count rows.

```{r "1 b)"}
#< task_notest
df %>% filter(y==1) %>% summarise(n())
#>
```


Now modify the code above to count the number of observations with y==0. You should replace the entirety of the "YOUR CODE HERE" string with your own code.
```{r "1 c)"}
#< task_notest
# Number of observations with y==0
#>
df %>% filter(y==0) %>% summarise(n())

#< hint
display("Try changing the filter condition so that you're looking at all rows where y==0")
#>
```

#< quiz "Counting"
question: How many observations were there with y==0?
sc:
    - 1
    - 3
    - 5*
    - 7
    - 10
success: Great, your answer is correct!
failure: Try again.
#>

### Step 3: Calculate the prevalence in the whole sample

To calculate prevalence in the whole sample pooling across the treatment and control group. The following code uses the pipe operator (`%>%`) to perform the `summarise` command on the dataframe `df`. `prevalence` is a name we gave to the result. `mean(y)` is the mean of `y`, which in this case is a proportion because y is binary.

```{r "1 d)"}
#< task_notest
df %>% summarise(prevalence=mean(y))
#>
```

#< quiz "Calculating Prevalence"
question: What is the prevalence?
sc:
    - 0.2
    - 0.5*
    - 0.7
success: Great, your answer is correct!
failure: Try again.
#>

#< award "Prevalence Wizard"
You finished the tutorial!
#>