HW 2

---
title: "Homework 2"
author: "Joshua Kim"
date: "10/19/2019"
output: pdf_document
---
## Question 1
**1. A laboratory is estimating the rate of tumorigenesis (the formation of tumors) in two strains of mice, A and B. They have tumor count data for 10 mice in strain A and 13 mice in strain B. Type A mice have been well studied, and information from other laboratories suggests that type A mice have tumor counts that are approximately Poisson-distributed. Tumor count rates for type B mice are unknown, but type B mice are related to type A mice. Assuming a Poisson sampling distribution for each group with rates $\theta_A$ and $\theta_B$ Based on previous research you settle on the following prior distribution:**\newline

\centerline {$\theta_A$ ~ gamma(120, 10), $\theta_B$ ~ gamma(12, 1)}


**(a) Before seeing any data, which group do you expect to have a higher average incidence of cancer? Which group are you more certain about a priori? You answers should be based on the priors specified above.**

I expect type A mice to have a higher average incidence of cancer. I am more certainf about Group A's priori.


**(b) After you the complete of the experiment, you observe the following tumor counts for the two populations:**

\centerline {$y_A$ = (12,9,12,14,13,13,15,8,15,6)}
\centerline {$y_B$ = (11,11,10,9,9,8,7,10,6,8,8,9,7)}
```{r}
y_a = c(12,9,12,14,13,13,15,8,15,6)
sum_y_a = sum(y_a)  #117

y_b = c(11,11,10,9,9,8,7,10,6,8,8,9,7)
sum_y_b = sum(y_b)  #113

```
$$\sum_{i=1}^{10} y_A = 117, \quad \sum_{i=1}^{13} y_B = 113 $$

**Write down the posterior distributions, posterior means, posterior variances and 95% quantile-based credible intervals for $\theta_A$ and $\theta_B$**

$\theta_A$ posterior distributions with $\theta_A$ ~ Gamma(120,10)

$$Posterior \quad \alpha \quad Poisson(y_A | \theta_A) \quad x \quad Gamma(\theta_A)$$
$$Poisson (y_A | \theta_A)=  \theta_A^ {\sum_{i=1}^{10} y_A} e^{-10\theta_A}$$
$$Gamma(\theta_A) = \theta_A^ {120-1} e^{-10 \theta_A}$$

$$Posterior \quad \alpha \quad [\theta_A^ {\sum_{i=1}^{10} y_A} e^{-10\theta_A} ] \quad x \quad  [\theta_A^ {119} e^{-10 \theta_A}]$$
$$ \alpha \quad [\theta_A^ {\sum_{i=1}^{10} y_A + 119} e^{-20\theta_A} ] $$

This is a Gamma conjugate prior with

$\alpha_A -1 = \sum_{i=1}^{10} y_A + 119$, \space $\beta_A = 20$

Therefore, Posterior Distribution of $\theta_A$  is
$$Gamma (\sum_{i=1}^{10} y_A + 120,\space 20)$$

E($\theta_A$) for a Gamma Distribution is $\frac {\alpha} {\beta}$. Therefore

$$E(\theta_A  | y_A) = \frac {(117+120)} {20} = 11.85 $$
Var($\theta_A$ ) for a Gamma Distribution is $\frac {\alpha} {\beta^2}$. Therefore

$$Var(\theta_A| y_A) = \frac {(117+120)} {400} = 0.5925 $$

$\theta_B$ posterior distributions with $\theta_B$ ~ Gamma(12,1)

$$Posterior \quad \alpha \quad Poisson(y_B | \theta_B) \quad x \quad Gamma(\theta_B)$$
$$Poisson (y_B | \theta_B)=  \theta_B^ {\sum_{i=1}^{13} y_B} e^{-13\theta_B}$$
$$Gamma(\theta_B) = \theta_B^ {12-1} e^{-\theta_B}$$

$$Posterior \quad \alpha \quad [\theta_B^ {\sum_{i=1}^{13} y_B} e^{-13\theta_B} ] \quad x \quad  [\theta_B^ {11} e^{-\theta_B}]$$
$$ \alpha \quad [\theta_B^ {\sum_{i=1}^{13} y_B + 11} e^{-14\theta_B} ] $$

This is a Gamma conjugate prior with

$\alpha_B - 1= \sum_{i=1}^{13} y_B + 11$, \space $\beta_B = 14$

Therefore, Posterior Distribution of $\theta_B$  is
$$Gamma (\sum_{i=1}^{13} y_B + 12,\space 14)$$
E($\theta_B$) for a Gamma Distribution is $\frac {\alpha} {\beta}$. Therefore

$$E(\theta_B | y_B) = \frac {(113+12)} {14} =8.929  $$
Var($\theta_B$) for a Gamma Distribution is $\frac {\alpha} {\beta^2}$. Therefore

$$Var(\theta_B  | y_B) = \frac {(113+12)} {196} = 0.638 $$

```{r}
y_a = c(12,9,12,14,13,13,15,8,15,6)
sum_y_a = sum(y_a)  #117
a_post_A = sum_y_a +120
b_post_A = 20
alpha = 1 - 0.95
low_A = qgamma(alpha/2, a_post_A, b_post_A)
high_A = qgamma(1 - alpha/2, a_post_A, b_post_A)


y_b = c(11,11,10,9,9,8,7,10,6,8,8,9,7)
sum_y_b = sum(y_b)  #113
a_post_B = sum_y_b +12
b_post_B = 14
alpha = 1 - 0.95
low_B = qgamma(alpha/2, a_post_B, b_post_B)
high_B = qgamma(1 - alpha/2, a_post_B, b_post_B)
print(c(low_A, high_A))
print(c(low_B, high_B))
```

95% quantile-based credible intervals for $\theta_A$: ( 0.8442978, 0.9434177 )

95% quantile-based credible intervals for $\theta_B$: ( 0.8442978, 0.9434177 )

**(c) Compute and plot the posterior expectation of $\theta_B$ given $y_B$ under the prior distribution gamma(12 x $n_o$, $n_o$) for each value of $n_o \epsilon$  {1, 2, ..., 50}. As a reminder, $n_o$ can be thought of as the number of prior observations (or pseudo-counts).**

```{r}
n_0 = c(1:50)
a = 12*n_0 #alpha
b = n_0 #beta
avrg = a/b
avrg
posterior_exp = dgamma(avrg, shape = a, rate = b)
posterior_exp
plot(n_0, posterior_exp,
    main="Posterior Density", col='blue', type="l")
```

**(d) Should knowledge about population A tell us anything about population B? Discuss whether or not it makes sense to have p($\theta_A$ , $\theta_B$ ) = p($\theta_A$) × p($\theta_B$ ).**