Untitled

---
title: "Assignment 1"
author: "Jaymon Veldkamp"
date: "15 May 2018"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

#Part 1: Properties of the estimators of the standard error of the difference in means

##1
The population standard error of the difference in means

##2
###a
As $s_1^2$ and $s_2^2$ are unbiased variance estimates, $\sigma_1^2$ and $\sigma_2^2$ can be filled in for them.

Student's t-test statistic:
```{r}
Sp <- sqrt(((10-1)*1+(200-1)*1)/(10+200-2))

Sp * sqrt(1/10+1/200)
```


Welch's t-test statistic:
```{r cars}

sqrt(1/10 + 1/200)

```

So the population parameter in the condition with $n_1=10$ and $\sigma_2^2 = 1$ is 0.324037.

###b

MC_ttest_welch($S$, $n_1$, $n_2$, $\mu_1$, $\mu_2$, $\sigma_1^2$, $\sigma_2^2$)

Input:
- $S$: integer defining the number of independent datasets to generate
- $n_1$: the sample size of dataset 1
- $n_2$: the sample size of dataset 2
- $\mu_1$, $\sigma_1^2$: Values for the population parameters of population 1
- $\mu_2$, $\sigma_2^2$: Values for the population parameters of population 2

Output: The bias, variance, MSE and RE for both standard errors using welch - and student t-test.

1. Initialize output SE_welch and SE_student both as a vector of length S.
2. for s in 1:S
  A. generate $data1$ with sample size $n_1$ from $N(\mu_1,\sigma_1^2)$
  B. Generate $data2$ with sample size $n_2$ from $N(\mu_2, \sigma_2^2)$
  C. Obtain the sample standard error sample_welch using the Welch test
  D. Obtain the sample standard error sample_student using the Student t-test
  E. Store the sample standard errors:
    a. SE_welch[s] = sample_welch
    b. SE_student[s] = sample_student
3. Obtain the bias of SE_welch and SE_student and store it as welch_bias and student_bias
4. Obtain the variances of SE_welch and SE_student and store it as welch_var and student_var
5. Obtain the MSE of SE_welch and SE_student and store it as welch_MSE and student_MSE
6. Obtain the relative efficiency for SE_welch and SE_student and store it as RE
7. Return welch_bias, student_bias, welch_var, student_var, welch_MSE, student_MSE and RE.

###c
```{r}
set.seed(200)

true_SE = 0.324037 # See 2a

MC_ttest_welch = function(S=10000, n1, n2=200, mu1=0, mu2=1, var1=1, var2){

  SE_welch <- c()
  SE_student <- c()
  #2
  for (i in 1:S){

    #A
    data1 <- rnorm(n1, mu1, sqrt(var1))

    #B
    data2 <- rnorm(n2, mu2, sqrt(var2))

    #C
    sample_welch <- sqrt(var(data1)/n1 + var(data2)/n2)

    #D
    Sp <- sqrt(((n1-1)*var(data1)+(n2-1)*var(data2))/(n1+n2-2))
    sample_student <- Sp * sqrt(1/n1 + 1/n2)

    #E
    #a
    SE_welch <- c(SE_welch, sample_welch)
    #b
    SE_student <- c(SE_student, sample_student)}


  #3
  true_SE <- sqrt(var1/n1 + var2/n2)
  welch_bias <- true_SE - mean(SE_welch)
  student_bias <- true_SE - mean(SE_student)

  #4
  welch_var <- var(SE_welch)
  student_var <- var(SE_student)

  #5
  welch_MSE <- welch_bias^2 + welch_var
  student_MSE <- student_bias^2 + student_var

  #6
  RE <- welch_MSE/student_MSE

  #7
  return(c(welch_bias, student_bias, welch_var, student_var, welch_MSE, student_MSE, RE))}

matrix = rbind(
      MC_ttest_welch(n1=10, var2=1), MC_ttest_welch(n1=100, var2=1), MC_ttest_welch(n1=200, var2=1),
      MC_ttest_welch(n1=10, var2=2), MC_ttest_welch(n1=100, var2=2), MC_ttest_welch(n1=200, var2=2),
      MC_ttest_welch(n1=10, var2=10), MC_ttest_welch(n1=100, var2=10), MC_ttest_welch(n1=200, var2=10))

dimnames(matrix) <- list(c("n1=10 var2=1", "n1=100 var2=1", "n1=200 var2=1",
                              "n1=10 var2=2","n1=100 var2=2","n1=200 var2=2",
                              "n1=10 var2=10","n1=100 var2=10","n1=200 var2=10"),
                              c("bias welch", "bias student", "variance welch", "variance student", "MSE welch", "MSE student", "RE"))

matrix

```

###d
The fact that the variances of the two populations should be the same.

##3
1.
- Sample size: low; variances: unequal; student t-test (more) biased
- Sample size: low; variances: equal; welch test (more) biased
- Sample size: large (with regard to the variance); no difference between welch and student.

2.
- Larger sample sizes decrease the variances.
- Larger difference in variances (larger var2), increases the variance in both tests

3.
Student t-test when sample size is low and variances the same, welch test when the sample size is low and the variances are unequal. When the sample size is large ($n_1 \geq 200$) there is no difference.