Guest User

Untitled

a guest
Oct 11th, 2016
131
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 3.17 KB | None | 0 0
  1. I;ve done such simulations many times, I really don't follow what results you expect to see. The sample size will determine the estimated effect size if you subset only significant results, but the proportion should be the same if the model is really correct... That said, a short tour through the R source code lead to some interesting findings:
  2.  
  3. 1) The t.test() code utilizes the pt function:
  4.  
  5. pval <- pt(tstat, df)
  6. [...]
  7. pval <- pt(tstat, df, lower.tail = FALSE)
  8. [...]
  9. pval <- 2 * pt(-abs(tstat), df)
  10.  
  11. https://svn.r-project.org/R/trunk/src/library/stats/R/t.test.R
  12.  
  13.  
  14.  
  15.  
  16. 2) The algorithm used by the pt() function is determined by the size of the input (cutoff at 400,000). So, at least if you are using R, your example is literally comparing apples to oranges:
  17. #ifdef R_version_le_260
  18. if (n > 4e5) { /*-- Fixme(?): test should depend on `n' AND `x' ! */
  19. /* Approx. from Abramowitz & Stegun 26.7.8 (p.949) */
  20. val = 1./(4.*n);
  21. return pnorm(x*(1. - val)/sqrt(1. + x*x*2.*val), 0.0, 1.0,
  22. lower_tail, log_p);
  23. }
  24. #endif
  25.  
  26. nx = 1 + (x/n)*x;
  27. /* FIXME: This test is probably losing rather than gaining precision,
  28. * now that pbeta(*, log_p = TRUE) is much better.
  29. * Note however that a version of this test *is* needed for x*x > D_MAX */
  30. if(nx > 1e100) { /* <==> x*x > 1e100 * n */
  31. /* Danger of underflow. So use Abramowitz & Stegun 26.5.4
  32. pbeta(z, a, b) ~ z^a(1-z)^b / aB(a,b) ~ z^a / aB(a,b),
  33. with z = 1/nx, a = n/2, b= 1/2 :
  34. */
  35. double lval;
  36. lval = -0.5*n*(2*log(fabs(x)) - log(n))
  37. - lbeta(0.5*n, 0.5) - log(0.5*n);
  38. val = log_p ? lval : exp(lval);
  39. } else {
  40. val = (n > x * x)
  41. ? pbeta (x * x / (n + x * x), 0.5, n / 2., /*lower_tail*/0, log_p)
  42. : pbeta (1. / nx, n / 2., 0.5, /*lower_tail*/1, log_p);
  43. }
  44. https://svn.r-project.org/R/trunk/src/nmath/pt.c
  45.  
  46.  
  47.  
  48.  
  49. 3a) The above pt function utilizes either pnorm() or pbeta() to do the grunt work. The pnorm() funnction includes all sorts of arbitrary constants:
  50.  
  51. const static double a[5] = {
  52. 2.2352520354606839287,
  53. 161.02823106855587881,
  54. 1067.6894854603709582,
  55. 18154.981253343561249,
  56. 0.065682337918207449113
  57. };
  58. [...]
  59. const static double q[5] = {
  60. 1.28426009614491121,
  61. 0.468238212480865118,
  62. 0.0659881378689285515,
  63. 0.00378239633202758244,
  64. 7.29751555083966205e-5
  65. };
  66. https://svn.r-project.org/R/trunk/src/nmath/pnorm.c
  67.  
  68.  
  69.  
  70.  
  71. 3b) The pbeta function calls a bratio function, which likewise uses an approximation requires various constants and sqitches between algorithms based on the input arguments, which depend on n (not shown... this is getting too long).
  72.  
  73. static double c0 = .0833333333333333;
  74. static double c1 = -.00277777777760991;
  75. static double c2 = 7.9365066682539e-4;
  76. static double c3 = -5.9520293135187e-4;
  77. static double c4 = 8.37308034031215e-4;
  78. static double c5 = -.00165322962780713;
  79.  
  80. https://svn.r-project.org/R/trunk/src/nmath/pbeta.c
  81. https://svn.r-project.org/R/trunk/src/nmath/toms708.c
  82.  
  83.  
  84. So, it is possible that whatever you observed is due to numerical error and that software often handles large datasets differently than small datasets.
Add Comment
Please, Sign In to add comment