Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ---
- title: "One-way Analysis of Variance"
- author: "Data Analysis in Sociology"
- date: "3/24-25/2022"
- output:
- pdf_document: default
- html_document: default
- ---
- ```{r setup, include=FALSE}
- knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
- ```
- ## Problem 1
- Is the level of alcohol sales the same across the Federal Districts of Russia?
- The data come from <https://rosstat.gov.ru/> and <https://www.kaggle.com/dwdkills/alcohol-consumption-in-russia>
- Data file: `alcohol_districts.csv`
- Create an index showing the average sales of all the types of alcohol per region. Use `rowMeans()` to calculate the mean across several columns.
- ```{r}
- library(dplyr)
- alcohol_districts$index = rowMeans(alcohol_districts[c("wine", "beer", "vodka", "champagne", "brandy")])
- ```
- Now try a new way of visualising pairwise comparisons. Read the example for the `ggbetweenstats()` function here and adapt the non-parametric test used in this example: <https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggbetweenstats.html>.
- ```{r}
- install.packages("ggstatsplot")
- library(ggstatsplot)
- ggbetweenstats(
- data = alc,
- x = district,
- y = index,
- type = "nonparametric",
- plot.type = "box"
- )
- ```
- 2016
- ```{r}
- alc <- alcohol_districts %>%
- subset(year == 2016)
- ```
- ```{r}
- install.packages("car")
- library(car)
- leveneTest(alc$index ~ alc$district)
- ```
- ```{r}
- library(ggplot2)
- ggplot(alc, aes(district, index)) + geom_boxplot()
- ```
- ```{r}
- aov_out <- aov(alc$index ~ alc$district)
- plot(density(residuals(aov_out)))
- ```
- ```{r}
- kruskal.test(alc$index ~ alc$district)
- ```
- ```{r}
- install.packages("dunn.test")
- library(dunn.test)
- dunn.test(alc$index, alc$district, method="Holm")
- ```
- ```{r}
- install.packages("rstatix")
- library(rstatix)
- kruskal_effsize(alc, index ~ district)
- ```
- ## Problem 2
- For tasks 2-3, use the file `so2.csv` from the data folder. The data set contains the result of a survey of software developers using the StackOverflow website.
- How is the subjective level of competence in programming related to coding experience?
- Do the most experienced developers feel they are the most proficient?
- Examine this relationship visually, then run a suitable statistical test, and show the effect size.
- Variables: `ImpSyn`, `YearsCodePro`.
- ```{r}
- ```
- Now try a new way of visualising distribution of numeric variable across groups-- a ridgeplot.
- Read the example for the `geom_ridgeline` function here and adapt the code to our data: <https://cran.r-project.org/web/packages/ggridges/vignettes/introduction.html> (go to the "Density ridgeline plots" section)
- ```{r}
- # install.packages("ggridges")
- library(ggridges)
- # put your code here
- ```
- ## Problem 3
- Are the more experienced programmers seriously older?
- Compare the age of respondents by their coding experience.
- First, visualise the relationship, then run a suitable statistical test, and show the effect size.
- ```{r}
- # put your code here
- ```
- Finally, create a visualization with all the effects shown in the picture. Use `ggbetweenstats()`.
- Variables: `ImpSyn`, `Age`.
- ```{r}
- library(ggstatsplot)
- # put your code here
- ```
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement