Advertisement
spendy2129

Untitled

Nov 8th, 2019
131
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 4.36 KB | None | 0 0
  1. ### Exercise 1:
  2.  
  3. ```{r}
  4. ## Compute popoulation p directly
  5. ames %>% select(Central.Air) %>% table() %>% prop.table()
  6. ```
  7.  
  8. ### Exercise 2:
  9.  
  10. ```{r}
  11. ## new variable / extract air
  12. ames <- ames %>% mutate(air = as.numeric(Central.Air=='Y'))
  13. air <- ames$air
  14. ## Compute population p in a new way; save it
  15. pop_p <- sum(air) / nrow(ames)
  16. pop_p
  17.  
  18. ```
  19.  
  20.  
  21. I created the value of the amount with air and divided it by the total number of homes in ames, then created this seperate value.
  22.  
  23.  
  24. ```{r}
  25. ### Exercise 3:
  26. ## Compute pop sd in two different ways
  27. sd(air, na.rm = TRUE)
  28. sqrt(pop_p*(1-pop_p))
  29. ```
  30.  
  31. **written answer**
  32. no they are not the same but they are the same out to four significant figures
  33.  
  34.  
  35. ### Exercise 4:
  36.  
  37. ```{r}
  38. ## Draw sample
  39. samp <- sample(air, size=50)
  40. ## Compute p_hat
  41. p_hat <- sum(samp) / 50
  42. p_hat
  43. ```
  44.  
  45. when I run the code over and over the value changes. generally anywhere from .86-.98
  46.  
  47. ### Exercise 5:
  48.  
  49. After running the line of code over 20 times, most of the answers were larger than pop_p (93.3). The answers I saw did range from 86-98 but most of the time it was 94 and above. however if I were to run the line of code several more times and calculated the mean, I wouldn't be surprised if it was very close to pop_p.
  50.  
  51. ### Exercise 6:
  52.  
  53. ```{r}
  54. ### Try out samples of all different sizes
  55. samp1 <- sample(air, size=20)
  56. p_hat1 <- sum(samp1) / 20
  57. samp2 <- sample(air, size=50)
  58. p_hat2 <- sum(samp2) / 50
  59. samp3 <- sample(air, size=200)
  60. p_hat3 <- sum(samp3) / 200
  61. p_hat1
  62. p_hat2
  63. p_hat3
  64. ```
  65.  
  66. The sample size that tends to be closest to the truth is samp3. Samp1 (sample size 20) is the sample size with the most variability.
  67.  
  68. ### Exercise 7:
  69.  
  70. ```{r}
  71. ## Make a plot
  72. set.seed(111)
  73. phats_20 <- replicate(100000, mean(sample(air, size=20)))
  74. ggplot(data = NULL, aes(x = phats_20)) + geom_histogram()
  75. ```
  76.  
  77. The plot has a left skew shape with the center being at 0.95. The values range from 0.7-1.0 with most the data being from 0.9-1.0
  78.  
  79. ### Exercise 8
  80.  
  81. ```{r}
  82. mean(phats_20)
  83. ```
  84.  
  85. This value is much closer to the true population proportion (0.9331058) and is the same out to four decimals. It is a slight over-estimate but is very close.
  86.  
  87. ### Exercise 9
  88.  
  89. ```{r}
  90. ## Compute SD
  91. sd(phats_20)
  92. ```
  93.  
  94. The standard errors are so far apart it is ridiculous. When I calculated the standard error a different way based off the mean it was very close to the true standard error.
  95.  
  96. ### Exercise 10
  97.  
  98. ```{r}
  99. set.seed(111)
  100. phats_20 <- replicate(100000, mean(sample(air, size=20)))
  101. ### Fill in for size
  102. set.seed(111)
  103. phats_50 <- replicate(100000, mean(sample(air, size=50)))
  104. ### Fill in for size 200.
  105. set.seed(111)
  106. phats_200 <- replicate(100000, mean(sample(air, size=200)))
  107.  
  108. ```
  109.  
  110. ### Exercise 11
  111. ```{r}
  112. ## Two histograms
  113. ggplot(data = NULL, aes(x = phats_50)) + geom_histogram()
  114. ggplot(data = NULL, aes(x = phats_200)) + geom_histogram()
  115. ```
  116.  
  117. As sample size increases, the shape of the distribution becomes more normal, and the center starts to approach the true proportion value. There is also a decrease in the spread of the data as sample size increases.
  118.  
  119. ### Exercise 12
  120.  
  121. The distribution of phats_20 is not normal due to the left skew of the distribution. phats_50 is approaching normal but still have a left skew in the data, but the left skew isn't as strong as in phats_20. phats_200 is the closest to a normal distribution out of the three distributions with only a slight left skew in the data.
  122.  
  123. ### Exercise 13
  124.  
  125. the first and third conditions are met for all three of the sample sizes however the second one isn't met. With such small sample sizes they weren't able to meet the success failure conditions?????
  126.  
  127. ### Exercise 14
  128.  
  129. ```{r}
  130. ## Include the histogram here
  131. ggplot(data = NULL, aes(x = phats_200)) +
  132. geom_blank() +
  133. geom_histogram(bins=30,aes(y = ..density..)) +
  134. stat_function(fun = dnorm, args = c(mean = pop_p, sd = sqrt((pop_p)*(1-pop_p)/200)), col = "tomato")
  135. ```
  136.  
  137. The empirical distribution and theoretical distributions aer very close to the same but don't quite match. The peaks of the distributions are off slightly.
  138.  
  139. ### Exercise 15
  140.  
  141. ```{r}
  142. ## And calculations
  143. lower <- pop_p - 1.96*pop_sd/sqrt(200)
  144. upper <- pop_p + 1.96*pop_sd/sqrt(200)
  145. sum(phats_200 > lower & phats_200 < upper) / 100000
  146. ```
  147.  
  148. The proportion of my samples that fell between the bounds was 0.96.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement