Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ---
- title: "Homework 2"
- author: "Joshua Kim"
- date: "10/19/2019"
- output: pdf_document
- ---
- ## Question 1
- **1. A laboratory is estimating the rate of tumorigenesis (the formation of tumors) in two strains of mice, A and B. They have tumor count data for 10 mice in strain A and 13 mice in strain B. Type A mice have been well studied, and information from other laboratories suggests that type A mice have tumor counts that are approximately Poisson-distributed. Tumor count rates for type B mice are unknown, but type B mice are related to type A mice. Assuming a Poisson sampling distribution for each group with rates $\theta_A$ and $\theta_B$ Based on previous research you settle on the following prior distribution:**\newline
- \centerline {$\theta_A$ ~ gamma(120, 10), $\theta_B$ ~ gamma(12, 1)}
- **(a) Before seeing any data, which group do you expect to have a higher average incidence of cancer? Which group are you more certain about a priori? You answers should be based on the priors specified above.**
- I expect type A mice to have a higher average incidence of cancer. I am more certainf about Group A's priori.
- **(b) After you the complete of the experiment, you observe the following tumor counts for the two populations:**
- \centerline {$y_A$ = (12,9,12,14,13,13,15,8,15,6)}
- \centerline {$y_B$ = (11,11,10,9,9,8,7,10,6,8,8,9,7)}
- ```{r}
- y_a = c(12,9,12,14,13,13,15,8,15,6)
- sum_y_a = sum(y_a) #117
- y_b = c(11,11,10,9,9,8,7,10,6,8,8,9,7)
- sum_y_b = sum(y_b) #113
- ```
- $$\sum_{i=1}^{10} y_A = 117, \quad \sum_{i=1}^{13} y_B = 113 $$
- **Write down the posterior distributions, posterior means, posterior variances and 95% quantile-based credible intervals for $\theta_A$ and $\theta_B$**
- $\theta_A$ posterior distributions with $\theta_A$ ~ Gamma(120,10)
- $$Posterior \quad \alpha \quad Poisson(y_A | \theta_A) \quad x \quad Gamma(\theta_A)$$
- $$Poisson (y_A | \theta_A)= \theta_A^ {\sum_{i=1}^{10} y_A} e^{-10\theta_A}$$
- $$Gamma(\theta_A) = \theta_A^ {120-1} e^{-10 \theta_A}$$
- $$Posterior \quad \alpha \quad [\theta_A^ {\sum_{i=1}^{10} y_A} e^{-10\theta_A} ] \quad x \quad [\theta_A^ {119} e^{-10 \theta_A}]$$
- $$ \alpha \quad [\theta_A^ {\sum_{i=1}^{10} y_A + 119} e^{-20\theta_A} ] $$
- This is a Gamma conjugate prior with
- $\alpha_A -1 = \sum_{i=1}^{10} y_A + 119$, \space $\beta_A = 20$
- Therefore, Posterior Distribution of $\theta_A$ is
- $$Gamma (\sum_{i=1}^{10} y_A + 120,\space 20)$$
- E($\theta_A$) for a Gamma Distribution is $\frac {\alpha} {\beta}$. Therefore
- $$E(\theta_A | y_A) = \frac {(117+120)} {20} = 11.85 $$
- Var($\theta_A$ ) for a Gamma Distribution is $\frac {\alpha} {\beta^2}$. Therefore
- $$Var(\theta_A| y_A) = \frac {(117+120)} {400} = 0.5925 $$
- $\theta_B$ posterior distributions with $\theta_B$ ~ Gamma(12,1)
- $$Posterior \quad \alpha \quad Poisson(y_B | \theta_B) \quad x \quad Gamma(\theta_B)$$
- $$Poisson (y_B | \theta_B)= \theta_B^ {\sum_{i=1}^{13} y_B} e^{-13\theta_B}$$
- $$Gamma(\theta_B) = \theta_B^ {12-1} e^{-\theta_B}$$
- $$Posterior \quad \alpha \quad [\theta_B^ {\sum_{i=1}^{13} y_B} e^{-13\theta_B} ] \quad x \quad [\theta_B^ {11} e^{-\theta_B}]$$
- $$ \alpha \quad [\theta_B^ {\sum_{i=1}^{13} y_B + 11} e^{-14\theta_B} ] $$
- This is a Gamma conjugate prior with
- $\alpha_B - 1= \sum_{i=1}^{13} y_B + 11$, \space $\beta_B = 14$
- Therefore, Posterior Distribution of $\theta_B$ is
- $$Gamma (\sum_{i=1}^{13} y_B + 12,\space 14)$$
- E($\theta_B$) for a Gamma Distribution is $\frac {\alpha} {\beta}$. Therefore
- $$E(\theta_B | y_B) = \frac {(113+12)} {14} =8.929 $$
- Var($\theta_B$) for a Gamma Distribution is $\frac {\alpha} {\beta^2}$. Therefore
- $$Var(\theta_B | y_B) = \frac {(113+12)} {196} = 0.638 $$
- ```{r}
- y_a = c(12,9,12,14,13,13,15,8,15,6)
- sum_y_a = sum(y_a) #117
- a_post_A = sum_y_a +120
- b_post_A = 20
- alpha = 1 - 0.95
- low_A = qgamma(alpha/2, a_post_A, b_post_A)
- high_A = qgamma(1 - alpha/2, a_post_A, b_post_A)
- y_b = c(11,11,10,9,9,8,7,10,6,8,8,9,7)
- sum_y_b = sum(y_b) #113
- a_post_B = sum_y_b +12
- b_post_B = 14
- alpha = 1 - 0.95
- low_B = qgamma(alpha/2, a_post_B, b_post_B)
- high_B = qgamma(1 - alpha/2, a_post_B, b_post_B)
- print(c(low_A, high_A))
- print(c(low_B, high_B))
- ```
- 95% quantile-based credible intervals for $\theta_A$: ( 0.8442978, 0.9434177 )
- 95% quantile-based credible intervals for $\theta_B$: ( 0.8442978, 0.9434177 )
- **(c) Compute and plot the posterior expectation of $\theta_B$ given $y_B$ under the prior distribution gamma(12 x $n_o$, $n_o$) for each value of $n_o \epsilon$ {1, 2, ..., 50}. As a reminder, $n_o$ can be thought of as the number of prior observations (or pseudo-counts).**
- ```{r}
- n_0 = c(1:50)
- a = 12*n_0 #alpha
- b = n_0 #beta
- avrg = a/b
- avrg
- posterior_exp = dgamma(avrg, shape = a, rate = b)
- posterior_exp
- plot(n_0, posterior_exp,
- main="Posterior Density", col='blue', type="l")
- ```
- **(d) Should knowledge about population A tell us anything about population B? Discuss whether or not it makes sense to have p($\theta_A$ , $\theta_B$ ) = p($\theta_A$) × p($\theta_B$ ).**
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement