Advertisement
tsoxmas

Untitled

Mar 3rd, 2023
1,628
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
R 3.23 KB | None | 0 0
  1. ---
  2. title: "One-way Analysis of Variance"
  3. author: "Data Analysis in Sociology"
  4. date: "3/24-25/2022"
  5. output:
  6.   pdf_document: default
  7.   html_document: default
  8. ---
  9.  
  10. ```{r setup, include=FALSE}
  11. knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
  12. ```
  13.  
  14. ## Problem 1
  15.  
  16. Is the level of alcohol sales the same across the Federal Districts of Russia?
  17.  
  18. The data come from <https://rosstat.gov.ru/> and <https://www.kaggle.com/dwdkills/alcohol-consumption-in-russia>
  19.  
  20. Data file: `alcohol_districts.csv`
  21.  
  22. Create an index showing the average sales of all the types of alcohol per region. Use `rowMeans()` to calculate the mean across several columns.
  23.  
  24. ```{r}
  25. library(dplyr)
  26.  
  27. alcohol_districts$index = rowMeans(alcohol_districts[c("wine", "beer", "vodka", "champagne", "brandy")])
  28.  
  29. ```
  30.  
  31. Now try a new way of visualising pairwise comparisons. Read the example for the `ggbetweenstats()` function here and adapt the non-parametric test used in this example: <https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggbetweenstats.html>.
  32.  
  33. ```{r}
  34. install.packages("ggstatsplot")
  35. library(ggstatsplot)
  36.  
  37.  
  38. ggbetweenstats(
  39.   data = alc,
  40.   x = district,
  41.   y = index,
  42.   type = "nonparametric",
  43.   plot.type = "box"
  44.   )
  45. ```
  46.  
  47. 2016
  48. ```{r}
  49. alc <- alcohol_districts %>%
  50.   subset(year == 2016)
  51.  
  52.  
  53. ```
  54.  
  55.  
  56. ```{r}
  57. install.packages("car")
  58. library(car)
  59.  
  60. leveneTest(alc$index  ~ alc$district)
  61. ```
  62.  
  63. ```{r}
  64. library(ggplot2)
  65.  
  66. ggplot(alc, aes(district, index)) + geom_boxplot()
  67. ```
  68. ```{r}
  69. aov_out <- aov(alc$index ~ alc$district)
  70.  
  71. plot(density(residuals(aov_out)))
  72. ```
  73. ```{r}
  74. kruskal.test(alc$index ~ alc$district)
  75.  
  76. ```
  77. ```{r}
  78. install.packages("dunn.test")
  79. library(dunn.test)
  80.  
  81. dunn.test(alc$index, alc$district, method="Holm")
  82. ```
  83. ```{r}
  84. install.packages("rstatix")
  85. library(rstatix)
  86.  
  87. kruskal_effsize(alc, index ~ district)
  88. ```
  89.  
  90. ## Problem 2
  91.  
  92. For tasks 2-3, use the file `so2.csv` from the data folder. The data set contains the result of a survey of software developers using the StackOverflow website.
  93.  
  94. How is the subjective level of competence in programming related to coding experience?
  95.  
  96. Do the most experienced developers feel they are the most proficient?
  97.  
  98. Examine this relationship visually, then run a suitable statistical test, and show the effect size.
  99.  
  100. Variables: `ImpSyn`, `YearsCodePro`.
  101.  
  102.  
  103. ```{r}
  104.  
  105.  
  106. ```
  107.  
  108.  
  109. Now try a new way of visualising distribution of numeric variable across groups-- a ridgeplot.
  110.  
  111. Read the example for the `geom_ridgeline` function here and adapt the code to our data: <https://cran.r-project.org/web/packages/ggridges/vignettes/introduction.html> (go to the "Density ridgeline plots" section)
  112.  
  113. ```{r}
  114. # install.packages("ggridges")
  115. library(ggridges)
  116. # put your code here
  117. ```
  118.  
  119.  
  120.  
  121. ## Problem 3
  122.  
  123. Are the more experienced programmers seriously older?
  124.  
  125. Compare the age of respondents by their coding experience.
  126.  
  127. First, visualise the relationship, then run a suitable statistical test, and show the effect size.
  128.  
  129. ```{r}
  130. # put your code here
  131. ```
  132.  
  133.  
  134. Finally, create a visualization with all the effects shown in the picture. Use `ggbetweenstats()`.
  135.  
  136. Variables: `ImpSyn`, `Age`.
  137.  
  138. ```{r}
  139. library(ggstatsplot)
  140. # put your code here
  141. ```
  142.  
  143.  
  144.  
  145.  
  146.  
  147.  
  148.  
  149.  
  150.  
  151.  
  152.  
  153.  
  154.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement