Untitled

---
title: "One-way Analysis of Variance"
author: "Data Analysis in Sociology"
date: "3/24-25/2022"
output:
  pdf_document: default
  html_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
```

## Problem 1

Is the level of alcohol sales the same across the Federal Districts of Russia?

The data come from <https://rosstat.gov.ru/> and <https://www.kaggle.com/dwdkills/alcohol-consumption-in-russia>

Data file: `alcohol_districts.csv`

Create an index showing the average sales of all the types of alcohol per region. Use `rowMeans()` to calculate the mean across several columns.

```{r}
library(dplyr)

alcohol_districts$index = rowMeans(alcohol_districts[c("wine", "beer", "vodka", "champagne", "brandy")])

```

Now try a new way of visualising pairwise comparisons. Read the example for the `ggbetweenstats()` function here and adapt the non-parametric test used in this example: <https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggbetweenstats.html>.

```{r}
install.packages("ggstatsplot")
library(ggstatsplot)


ggbetweenstats(
  data = alc,
  x = district,
  y = index,
  type = "nonparametric",
  plot.type = "box"
  )
```

2016
```{r}
alc <- alcohol_districts %>%
  subset(year == 2016)


```


```{r}
install.packages("car")
library(car)

leveneTest(alc$index  ~ alc$district)
```

```{r}
library(ggplot2)

ggplot(alc, aes(district, index)) + geom_boxplot()
```
```{r}
aov_out <- aov(alc$index ~ alc$district)

plot(density(residuals(aov_out)))
```
```{r}
kruskal.test(alc$index ~ alc$district)

```
```{r}
install.packages("dunn.test")
library(dunn.test)

dunn.test(alc$index, alc$district, method="Holm")
```
```{r}
install.packages("rstatix")
library(rstatix)

kruskal_effsize(alc, index ~ district)
```

## Problem 2

For tasks 2-3, use the file `so2.csv` from the data folder. The data set contains the result of a survey of software developers using the StackOverflow website.

How is the subjective level of competence in programming related to coding experience?

Do the most experienced developers feel they are the most proficient?

Examine this relationship visually, then run a suitable statistical test, and show the effect size.

Variables: `ImpSyn`, `YearsCodePro`.


```{r}


```


Now try a new way of visualising distribution of numeric variable across groups-- a ridgeplot.

Read the example for the `geom_ridgeline` function here and adapt the code to our data: <https://cran.r-project.org/web/packages/ggridges/vignettes/introduction.html> (go to the "Density ridgeline plots" section)

```{r}
# install.packages("ggridges")
library(ggridges)
# put your code here
```


## Problem 3

Are the more experienced programmers seriously older?

Compare the age of respondents by their coding experience.

First, visualise the relationship, then run a suitable statistical test, and show the effect size.

```{r}
# put your code here
```


Finally, create a visualization with all the effects shown in the picture. Use `ggbetweenstats()`.

Variables: `ImpSyn`, `Age`.

```{r}
library(ggstatsplot)
# put your code here
```