Untitled

---
title: "Statistics2, lab 1: 1-way ANOVA"
author: "Gijs Danoe s3494888"
date: "April 12th, 2019"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## 1. Load libraries and data

```{r}
# remove the comments if you don't have these packages yet, run those lines and then comment the lines again. If you don't comment them again, every time you knit the file R will try to install those packages again.


library(foreign)
library(car)
mydata = read.spss("lab1_reading.sav", to.data.frame=TRUE)
# if you have the file lab1_reading.sav in another folder and you are using windows then it should be something as...
#mydata = read.spss("C:/.../reading.sav", to.data.frame=TRUE) # where you need to replace "..." with the folder where the file is located.
# this command produces a warning, which however is not important.

```


## 2. Investigate variable Post3

```{r 1load}
boxplot(mydata$POST3 ~ mydata$Group)

```

## 3. Hypotheses
H0: $\mu_1$ = $\mu_2$ = $\mu_3$ \n
Ha: not all of the $\mu_i$ are equal.

## 4. Test normality

```{r}
mydata.aov <- aov(POST3 ~ Group, data = mydata)
mydata.aov.res <- residuals( object = mydata.aov ) # extract the residuals
qqnorm( y = mydata.aov.res ); qqline( y = mydata.aov.res)

aggregate(POST3~Group, data=mydata, function(x) shapiro.test(x)$p.value)

```

According to the qq plot, the observations all follow the line and the residuals are roughly normally distributed. The Shapiro test confirms this because all p values are above 0.05.

## 5. Test variance

```{r}
leveneTest(mydata.aov)

```

The assumption that the variance is homogenous is fulfilled because the p value is higher than 0.05.

## 6. Test variance with Hartley's test, by hand

```{r}
var(mydata[mydata$Group == "Basal",]$POST3)
var(mydata[mydata$Group == "DRTA",]$POST3)
var(mydata[mydata$Group == "Strat",]$POST3)
var(mydata[mydata$Group == "DRTA",]$POST3)/var(mydata[mydata$Group == "Basal",]$POST3)
```

Because k = 3 and n - 1 = 22-1=21, the table for $\alpha$=0.05 shows us that the Hartley's test reaches significance at 2.95. Because my score is lower (1.72 < 2.95) the test is significant and the assumption that the variance is homogenous is fulfilled.

## 7. 1-way ANOVA

```{r}
summary(mydata.aov)
```

Our p value is lower than the $\alpha$ value of 0.05, which means the test is significant and we reject H0 and accept Ha that not all of the $\mu_i$ are equal.

## 8. Effect size

```{r}
# regular R^2

mean_data <- mean(mydata$POST3)
SST <- ((41-mean_data)^2 + (41-mean_data)^2 + (43-mean_data)^2 + (46-mean_data)^2 + (46-mean_data)^2 + (45-mean_data)^2 + (45-mean_data)^2 + (32-mean_data)^2 + (33-mean_data)^2 + (39-mean_data)^2 + (42-mean_data)^2 + (45-mean_data)^2 + (39-mean_data)^2 + (44-mean_data)^2 + (36-mean_data)^2 + (49-mean_data)^2 + (40-mean_data)^2 + (35-mean_data)^2 + (36-mean_data)^2 + (40-mean_data)^2 + (54-mean_data)^2 + (32-mean_data)^2 + (31-mean_data)^2 + (40-mean_data)^2 + (48-mean_data)^2 + (30-mean_data)^2 + (42-mean_data)^2 + (48-mean_data)^2 + (49-mean_data)^2 + (53-mean_data)^2 + (48-mean_data)^2 + (43-mean_data)^2 + (55-mean_data)^2 + (55-mean_data)^2 + (57-mean_data)^2 + (53-mean_data)^2 + (37-mean_data)^2 + (50-mean_data)^2 + (54-mean_data)^2 + (41-mean_data)^2 + (49-mean_data)^2 + (47-mean_data)^2 + (49-mean_data)^2 + (49-mean_data)^2 + (53-mean_data)^2 + (47-mean_data)^2 + (41-mean_data)^2 + (49-mean_data)^2 + (43-mean_data)^2 + (45-mean_data)^2 + (50-mean_data)^2 + (48-mean_data)^2 + (49-mean_data)^2 + (42-mean_data)^2 + (38-mean_data)^2 + (42-mean_data)^2 + (34-mean_data)^2 + (48-mean_data)^2 + (51-mean_data)^2 + (33-mean_data)^2 + (44-mean_data)^2 + (48-mean_data)^2 + (49-mean_data)^2 + (33-mean_data)^2 + (45-mean_data)^2 + (42-mean_data)^2)

(SSG <- 22*(41.04545 - mean_data)^2 + 22*(46.72727 - mean_data)^2 + 22*(44.27273 - mean_data)^2)

SSG/SST

summary(lm(POST3~Group, mydata))$adj.r.squared

```

The effect size is medium.

## 9. Post-hoc test Bonferroni

```{r}
pairwise.t.test(x = mydata$POST3, g = mydata$Group, p.adjust.method = "bonferroni")
```

The most significant difference according to this test is between DRTA and Basal.

## 10. Post-hoc test TukeyHSD

```{r}
TukeyHSD(mydata.aov)
```

This test agrees with my statement at exercise 9: the biggest difference is between DRTA and Basal.