Advertisement
ElenaR1

week 10-11

May 25th, 2018
210
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
R 6.61 KB | None | 0 0
  1. data("ToothGrowth")
  2.  
  3. head(ToothGrowth)
  4.  
  5. my_data <- ToothGrowth
  6. #http://www.sthda.com/english/wiki/qq-plots-quantile-quantile-plots-r-base-graphs
  7. #The R base functions qqnorm() and qqplot() can be used to produce quantile-quantile plots:
  8.  
  9. #qqnorm(): produces a normal QQ plot of the variable-it assessesm whether or not the data set is
  10. #approximately normally distributed.The data are plotted against a theoritical normal distribution
  11. #in such a way that the points should form an approximate straight line.
  12. #Departures from the straight line indicate departures from normality
  13.  
  14. #qqline(): adds a reference line-this is the straight line moje i qqplot(x,y)
  15.  
  16.  
  17. qqnorm(my_data$len, pch = 1, frame = FALSE)
  18. qqline(my_data$len, col = "steelblue", lwd = 2)
  19. #As all the points fall approximately along this reference line, we can assume normality.
  20. y <- rnorm(20)*4
  21. y
  22. qqnorm(y); qqline(y, col = 2,lwd=2,lty=2)
  23.  
  24. y <- rbinom(2000,size=10,prob=1/10)
  25. qqplot(x,y); qqline(y, col = 2,lwd=2,lty=2)
  26.  
  27. y <- rbinom(2000,size=10,prob=1/2)
  28. qqplot(x,y); qqline(y, col = 2,lwd=2,lty=2)
  29.  
  30.  
  31. ############################# prop.test()
  32.  
  33. ###########################
  34. Suppose that I have two approaches to a particular problem. Approach A is observed to succeed 685 times out of 1347 attempts. Approach B is observed to succeed 2100 times out of 3748 attempts. I want to see if Approach B is preferable to Approach A.
  35.  
  36. In R I run:
  37.  
  38.   prop.test(c(2100,685), c(3748,1347), alternative="greater", correct=FALSE)
  39. and I get:
  40.  
  41.   data:  c(2100, 685) out of c(3748, 1347)
  42. X-squared = 10.7124, df = 1, p-value = 0.0005321
  43. alternative hypothesis: greater
  44. 95 percent confidence interval:
  45.   0.02568765 1.00000000
  46. sample estimates:
  47.   prop 1    prop 2
  48. 0.5602988 0.5085375
  49.  
  50. does the line about the confidence interval
  51. mean that the "true" value of pB−pA lies in the interval (.02568765,1) -YES
  52.  
  53.  
  54. EXAMPLE 1
  55. #IN THIS CASE WE ACCEPT THE NULL HYPOTHESIS
  56. prop.test(c(15,25),c(100,100))
  57. 2-sample test for equality of proportions with continuity correction
  58.  
  59. data:  c(15, 25) out of c(100, 100)
  60. X-squared = 2.5312, df = 1, p-value = 0.1116# > 0.05 this also means that the 2 proportions r equal
  61. alternative hypothesis: two.sided
  62. 95 percent confidence interval:
  63.   -0.22000271  0.02000271 #zero lies within the confidence interval -> the two proportions are equal
  64. sample estimates:
  65.   prop 1 prop 2
  66. 0.15   0.25
  67.  
  68.  
  69.  
  70.  
  71.  
  72. EXAMPLE 2https://www.youtube.com/watch?v=L9YDB1LRK5I
  73. #IN THIS CASE WE REJECT THE NULL HYPOTHESIS
  74. prop.test(c(45,66),c(100,110))#45 OT 100 -UPSEH I 66 OT 110 UPSEH.V TOZI SLUCHAI 45 out of 100 is statistically different from 66 out of 110
  75.  
  76. 2-sample test for equality of proportions with continuity correction
  77.  
  78. data:  c(45, 66) out of c(100, 110)
  79. X-squared = 4.1469, df = 1, p-value = 0.04171# < 0.05 -> are not euqal.
  80. alternative hypothesis: two.sided
  81. 95 percent confidence interval:
  82.   -0.293295128 -0.006704872 #0ta ne leji v intervala -> the two proportions are not equal
  83. sample estimates:
  84.   prop 1 prop 2
  85. 0.45   0.60
  86.  
  87. ##############################
  88.  
  89.  
  90. #A Confidence Interval is a range of values we are fairly sure our true value lies in.
  91. #https://www.mathsisfun.com/data/confidence-interval.html
  92. res <- prop.test(x = c(490, 400), n = c(500, 500))
  93. # Printing the results
  94. res
  95.  
  96. sexsmoke<-matrix(c(70,120,65,140),ncol=2,byrow=T)
  97. rownames(sexsmoke)<-c("male","female")
  98. colnames(sexsmoke)<-c("smoke","nosmoke")
  99. prop.test(sexsmoke)
  100. prop.test(c(70,65),c(190,205)) # identichno, no ot kude sa argumentite vuv 2riq vektor
  101. prop.test(c(70,65),c(190,205),conf.level=0.99) #smqna na alpha
  102. prop.test(c(70,65),c(190,205),c(0.33,0.33)) # predvaritelno zlaojeni proporcii
  103.  
  104.  
  105.  
  106. #########################     t.test()
  107. https://www.youtube.com/watch?v=F2rakDKp5f4
  108.  
  109. #EXAMPLE 1
  110. iris
  111. t.test(iris$Petal.Width[iris$Species=="setosa"],iris$Petal.Width[iris$Species=="versicolor"])
  112. #pvalue-calculates the probthat the two data sets come from the same probability distribution
  113. Welch Two Sample t-test
  114.  
  115. data:  iris$Petal.Width[iris$Species == "setosa"] and iris$Petal.Width[iris$Species == "versicolor"]
  116. t = -34.08, df = 74.755, p-value < 2.2e-16 # df-degree of freedom
  117. alternative hypothesis: true difference in means is not equal to 0
  118. 95 percent confidence interval:
  119.   -1.143133 -1.016867 # zero doesn't lie in the interval->THET DON'T COME FROM THE SAME PROB DISTR
  120. sample estimates:
  121.   mean of x mean of y
  122. 0.246     1.326 # the mean of setosa is 0.246
  123.  
  124.  
  125. #EXAMPLE 2
  126.  
  127. mydata = c(5.2, 6.1, 7.3, 7.4, 7.6, 7.9, 8.1, 8.3, 8.5, 8.5, 8.7, 8.8, 8.8, 9.1,
  128.            9.2, 9.4, 9.4, 9.8, 9.9, 10.2, 10.2, 10.8,
  129.            11.2, 11.9,12.1, 13)
  130. yourdata = c(5.3, 6.1, 6.3, 7.4, 7.6, 7.2, 8.1, 8.2, 8.5, 8.7, 8.7, 8.8, 8.9, 9.2,
  131.              9.2, 9.4, 9.4, 9.8, 9.9, 10.2, 10.2, 10.8)
  132. t.test(mydata,yourdata)
  133.  
  134. Welch Two Sample t-test
  135.  
  136. data:  mydata and yourdata
  137. t = 1.2734, df = 45.848, p-value = 0.2093 # >0.05 -> I ACCEPT THE HYPOTHESIS
  138. alternative hypothesis: true difference in means is not equal to 0
  139. 95 percent confidence interval:
  140.   -0.342621  1.522341 # ZERO LIES IN THE INTERVAL BUT I NEED MORE DATA , ZA CONCLUDE-NEM NESHTO T.K alternative hypothesis: true difference in means is not equal to 0, A TRQBVA DA BUDE
  141. sample estimates:
  142.   mean of x mean of y
  143. 9.130769  8.540909
  144.  
  145. #1 zad v tetradkata i drugo ?
  146. library(MASS)
  147. head(quine)
  148. prop.test
  149.  
  150. table(quine$Eth, quine$Sex)
  151. #kak opredelqme che e samo za jenite ?  table(quine$Eth, quine$Sex=='F')
  152. prop.test(table(quine$Eth, quine$Sex), correct=FALSE)
  153.  
  154. 2-sample test for equality of proportions
  155. without continuity correction
  156.  
  157. data:  table(quine$Eth, quine$Sex)
  158. X-squared = 0.0041, df = 1, p-value = 0.949
  159. alternative hypothesis: two.sided
  160. 95 percent confidence interval:
  161.   -0.15642  0.16696
  162. sample estimates:
  163.   prop 1  prop 2
  164. 0.55072 0.54545
  165. #Answer
  166. #The 95% confidence interval estimate of the difference between the female proportion of
  167. #Aboriginal students and the female proportion of Non-Aboriginal students is between -15.6% and 16.7%.
  168.  
  169.  
  170.  
  171. #2zad DA SE TESTVA NULEVATA HIPOTEZA.Our null hypothesis is that mu is equal t 170
  172. x=c(170, 167, 174,
  173.     179, 179, 156, 163, 156, 187, 156, 183, 179, 174, 179, 170, 156, 187, 179,
  174.     183, 174, 187, 167, 159, 170, 179)
  175. #A za alternativnite kak ???
  176. t.test(x,mu=170,alternative = 'two.sided')
  177.  
  178. One Sample t-test
  179.  
  180. data:  x
  181. t = 1.2218, df = 24, p-value = 0.2336
  182. alternative hypothesis: true mean is not equal to 170
  183. 95 percent confidence interval:
  184.   168.2633 176.7767
  185. sample estimates:
  186.   mean of x
  187. 172.52
  188.  
  189. #3zad ????
  190. x=c(7.65, 7.60 ,7.65
  191.     ,7.70, 7.55, 7.55, 7.40, 7.40, 7.50, 7.50)
  192.  
  193. t.test(x,mu < 7.5,alternative = 'two.sided')
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement