Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :: This is the BIRTHDAY SPACINGS TEST ::
- :: Choose m birthdays in a year of n days. List the spacings ::
- :: between the birthdays. If j is the number of values that ::
- :: occur more than once in that list, then j is asymptotically ::
- :: Poisson distributed with mean m^3/(4n). Experience shows n ::
- :: must be quite large, say n>=2^18, for comparing the results ::
- :: to the Poisson distribution with that mean. This test uses ::
- :: n=2^24 and m=2^9, so that the underlying distribution for j ::
- :: is taken to be Poisson with lambda=2^27/(2^26)=2. A sample ::
- :: of 500 j's is taken, and a chi-square goodness of fit test ::
- :: provides a p value. The first test uses bits 1-24 (counting ::
- :: from the left) from integers in the specified file. ::
- :: Then the file is closed and reopened. Next, bits 2-25 are ::
- :: used to provide birthdays, then 3-26 and so on to bits 9-32. ::
- :: Each set of bits provides a p-value, and the nine p-values ::
- :: provide a sample for a KSTEST. ::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :: THE OVERLAPPING 5-PERMUTATION TEST ::
- :: This is the OPERM5 test. It looks at a sequence of one mill- ::
- :: ion 32-bit random integers. Each set of five consecutive ::
- :: integers can be in one of 120 states, for the 5! possible or- ::
- :: derings of five numbers. Thus the 5th, 6th, 7th,...numbers ::
- :: each provide a state. As many thousands of state transitions ::
- :: are observed, cumulative counts are made of the number of ::
- :: occurences of each state. Then the quadratic form in the ::
- :: weak inverse of the 120x120 covariance matrix yields a test ::
- :: equivalent to the likelihood ratio test that the 120 cell ::
- :: counts came from the specified (asymptotically) normal dis- ::
- :: tribution with the specified 120x120 covariance matrix (with ::
- :: rank 99). This version uses 1,000,000 integers, twice. ::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :: This is the BINARY RANK TEST for 31x31 matrices. The leftmost ::
- :: 31 bits of 31 random integers from the test sequence are used ::
- :: to form a 31x31 binary matrix over the field {0,1}. The rank ::
- :: is determined. That rank can be from 0 to 31, but ranks< 28 ::
- :: are rare, and their counts are pooled with those for rank 28. ::
- :: Ranks are found for 40,000 such random matrices and a chisqua-::
- :: re test is performed on counts for ranks 31,30,29 and <=28. ::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :: This is the BINARY RANK TEST for 32x32 matrices. A random 32x ::
- :: 32 binary matrix is formed, each row a 32-bit random integer. ::
- :: The rank is determined. That rank can be from 0 to 32, ranks ::
- :: less than 29 are rare, and their counts are pooled with those ::
- :: for rank 29. Ranks are found for 40,000 such random matrices ::
- :: and a chisquare test is performed on counts for ranks 32,31, ::
- :: 30 and <=29. ::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :: This is the BINARY RANK TEST for 6x8 matrices. From each of ::
- :: six random 32-bit integers from the generator under test, a ::
- :: specified byte is chosen, and the resulting six bytes form a ::
- :: 6x8 binary matrix whose rank is determined. That rank can be ::
- :: from 0 to 6, but ranks 0,1,2,3 are rare; their counts are ::
- :: pooled with those for rank 4. Ranks are found for 100,000 ::
- :: random matrices, and a chi-square test is performed on ::
- :: counts for ranks 6,5 and <=4. ::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :: THE BITSTREAM TEST ::
- :: The file under test is viewed as a stream of bits. Call them ::
- :: b1,b2,... . Consider an alphabet with two "letters", 0 and 1 ::
- :: and think of the stream of bits as a succession of 20-letter ::
- :: "words", overlapping. Thus the first word is b1b2...b20, the ::
- :: second is b2b3...b21, and so on. The bitstream test counts ::
- :: the number of missing 20-letter (20-bit) words in a string of ::
- :: 2^21 overlapping 20-letter words. There are 2^20 possible 20 ::
- :: letter words. For a truly random string of 2^21+19 bits, the ::
- :: number of missing words j should be (very close to) normally ::
- :: distributed with mean 141,909 and sigma 428. Thus ::
- :: (j-141909)/428 should be a standard normal variate (z score) ::
- :: that leads to a uniform [0,1) p value. The test is repeated ::
- :: twenty times. ::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :: The tests OPSO, OQSO and DNA ::
- :: OPSO means Overlapping-Pairs-Sparse-Occupancy ::
- :: The OPSO test considers 2-letter words from an alphabet of ::
- :: 1024 letters. Each letter is determined by a specified ten ::
- :: bits from a 32-bit integer in the sequence to be tested. OPSO ::
- :: generates 2^21 (overlapping) 2-letter words (from 2^21+1 ::
- :: "keystrokes") and counts the number of missing words---that ::
- :: is 2-letter words which do not appear in the entire sequence. ::
- :: That count should be very close to normally distributed with ::
- :: mean 141,909, sigma 290. Thus (missingwrds-141909)/290 should ::
- :: be a standard normal variable. The OPSO test takes 32 bits at ::
- :: a time from the test file and uses a designated set of ten ::
- :: consecutive bits. It then restarts the file for the next de- ::
- :: signated 10 bits, and so on. ::
- :: ::
- :: OQSO means Overlapping-Quadruples-Sparse-Occupancy ::
- :: The test OQSO is similar, except that it considers 4-letter ::
- :: words from an alphabet of 32 letters, each letter determined ::
- :: by a designated string of 5 consecutive bits from the test ::
- :: file, elements of which are assumed 32-bit random integers. ::
- :: The mean number of missing words in a sequence of 2^21 four- ::
- :: letter words, (2^21+3 "keystrokes"), is again 141909, with ::
- :: sigma = 295. The mean is based on theory; sigma comes from ::
- :: extensive simulation. ::
- :: ::
- :: The DNA test considers an alphabet of 4 letters:: C,G,A,T,::
- :: determined by two designated bits in the sequence of random ::
- :: integers being tested. It considers 10-letter words, so that ::
- :: as in OPSO and OQSO, there are 2^20 possible words, and the ::
- :: mean number of missing words from a string of 2^21 (over- ::
- :: lapping) 10-letter words (2^21+9 "keystrokes") is 141909. ::
- :: The standard deviation sigma=339 was determined as for OQSO ::
- :: by simulation. (Sigma for OPSO, 290, is the true value (to ::
- :: three places), not determined by simulation. ::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :: This is the COUNT-THE-1's TEST on a stream of bytes. ::
- :: Consider the file under test as a stream of bytes (four per ::
- :: 32 bit integer). Each byte can contain from 0 to 8 1's, ::
- :: with probabilities 1,8,28,56,70,56,28,8,1 over 256. Now let ::
- :: the stream of bytes provide a string of overlapping 5-letter ::
- :: words, each "letter" taking values A,B,C,D,E. The letters are ::
- :: determined by the number of 1's in a byte:: 0,1,or 2 yield A,::
- :: 3 yields B, 4 yields C, 5 yields D and 6,7 or 8 yield E. Thus ::
- :: we have a monkey at a typewriter hitting five keys with vari- ::
- :: ous probabilities (37,56,70,56,37 over 256). There are 5^5 ::
- :: possible 5-letter words, and from a string of 256,000 (over- ::
- :: lapping) 5-letter words, counts are made on the frequencies ::
- :: for each word. The quadratic form in the weak inverse of ::
- :: the covariance matrix of the cell counts provides a chisquare ::
- :: test:: Q5-Q4, the difference of the naive Pearson sums of ::
- :: (OBS-EXP)^2/EXP on counts for 5- and 4-letter cell counts. ::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :: This is the COUNT-THE-1's TEST for specific bytes. ::
- :: Consider the file under test as a stream of 32-bit integers. ::
- :: From each integer, a specific byte is chosen , say the left- ::
- :: most:: bits 1 to 8. Each byte can contain from 0 to 8 1's, ::
- :: with probabilitie 1,8,28,56,70,56,28,8,1 over 256. Now let ::
- :: the specified bytes from successive integers provide a string ::
- :: of (overlapping) 5-letter words, each "letter" taking values ::
- :: A,B,C,D,E. The letters are determined by the number of 1's, ::
- :: in that byte:: 0,1,or 2 ---> A, 3 ---> B, 4 ---> C, 5 ---> D,::
- :: and 6,7 or 8 ---> E. Thus we have a monkey at a typewriter ::
- :: hitting five keys with with various probabilities:: 37,56,70,::
- :: 56,37 over 256. There are 5^5 possible 5-letter words, and ::
- :: from a string of 256,000 (overlapping) 5-letter words, counts ::
- :: are made on the frequencies for each word. The quadratic form ::
- :: in the weak inverse of the covariance matrix of the cell ::
- :: counts provides a chisquare test:: Q5-Q4, the difference of ::
- :: the naive Pearson sums of (OBS-EXP)^2/EXP on counts for 5- ::
- :: and 4-letter cell counts. ::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :: THIS IS A PARKING LOT TEST ::
- :: In a square of side 100, randomly "park" a car---a circle of ::
- :: radius 1. Then try to park a 2nd, a 3rd, and so on, each ::
- :: time parking "by ear". That is, if an attempt to park a car ::
- :: causes a crash with one already parked, try again at a new ::
- :: random location. (To avoid path problems, consider parking ::
- :: helicopters rather than cars.) Each attempt leads to either ::
- :: a crash or a success, the latter followed by an increment to ::
- :: the list of cars already parked. If we plot n: the number of ::
- :: attempts, versus k:: the number successfully parked, we get a::
- :: curve that should be similar to those provided by a perfect ::
- :: random number generator. Theory for the behavior of such a ::
- :: random curve seems beyond reach, and as graphics displays are ::
- :: not available for this battery of tests, a simple characteriz ::
- :: ation of the random experiment is used: k, the number of cars ::
- :: successfully parked after n=12,000 attempts. Simulation shows ::
- :: that k should average 3523 with sigma 21.9 and is very close ::
- :: to normally distributed. Thus (k-3523)/21.9 should be a st- ::
- :: andard normal variable, which, converted to a uniform varia- ::
- :: ble, provides input to a KSTEST based on a sample of 10. ::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :: THE MINIMUM DISTANCE TEST ::
- :: It does this 100 times:: choose n=8000 random points in a ::
- :: square of side 10000. Find d, the minimum distance between ::
- :: the (n^2-n)/2 pairs of points. If the points are truly inde- ::
- :: pendent uniform, then d^2, the square of the minimum distance ::
- :: should be (very close to) exponentially distributed with mean ::
- :: .995 . Thus 1-exp(-d^2/.995) should be uniform on [0,1) and ::
- :: a KSTEST on the resulting 100 values serves as a test of uni- ::
- :: formity for random points in the square. Test numbers=0 mod 5 ::
- :: are printed but the KSTEST is based on the full set of 100 ::
- :: random choices of 8000 points in the 10000x10000 square. ::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :: THE 3DSPHERES TEST ::
- :: Choose 4000 random points in a cube of edge 1000. At each ::
- :: point, center a sphere large enough to reach the next closest ::
- :: point. Then the volume of the smallest such sphere is (very ::
- :: close to) exponentially distributed with mean 120pi/3. Thus ::
- :: the radius cubed is exponential with mean 30. (The mean is ::
- :: obtained by extensive simulation). The 3DSPHERES test gener- ::
- :: ates 4000 such spheres 20 times. Each min radius cubed leads ::
- :: to a uniform variable by means of 1-exp(-r^3/30.), then a ::
- :: KSTEST is done on the 20 p-values. ::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :: This is the SQEEZE test ::
- :: Random integers are floated to get uniforms on [0,1). Start- ::
- :: ing with k=2^31=2147483647, the test finds j, the number of ::
- :: iterations necessary to reduce k to 1, using the reduction ::
- :: k=ceiling(k*U), with U provided by floating integers from ::
- :: the file being tested. Such j's are found 100,000 times, ::
- :: then counts for the number of times j was <=6,7,...,47,>=48 ::
- :: are used to provide a chi-square test for cell frequencies. ::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :: The OVERLAPPING SUMS test ::
- :: Integers are floated to get a sequence U(1),U(2),... of uni- ::
- :: form [0,1) variables. Then overlapping sums, ::
- :: S(1)=U(1)+...+U(100), S2=U(2)+...+U(101),... are formed. ::
- :: The S's are virtually normal with a certain covariance mat- ::
- :: rix. A linear transformation of the S's converts them to a ::
- :: sequence of independent standard normals, which are converted ::
- :: to uniform variables for a KSTEST. The p-values from ten ::
- :: KSTESTs are given still another KSTEST. ::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :: This is the RUNS test. It counts runs up, and runs down, ::
- :: in a sequence of uniform [0,1) variables, obtained by float- ::
- :: ing the 32-bit integers in the specified file. This example ::
- :: shows how runs are counted: .123,.357,.789,.425,.224,.416,.95::
- :: contains an up-run of length 3, a down-run of length 2 and an ::
- :: up-run of (at least) 2, depending on the next values. The ::
- :: covariance matrices for the runs-up and runs-down are well ::
- :: known, leading to chisquare tests for quadratic forms in the ::
- :: weak inverses of the covariance matrices. Runs are counted ::
- :: for sequences of length 10,000. This is done ten times. Then ::
- :: repeated. ::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- :: This is the CRAPS TEST. It plays 200,000 games of craps, finds::
- :: the number of wins and the number of throws necessary to end ::
- :: each game. The number of wins should be (very close to) a ::
- :: normal with mean 200000p and variance 200000p(1-p), with ::
- :: p=244/495. Throws necessary to complete the game can vary ::
- :: from 1 to infinity, but counts for all>21 are lumped with 21. ::
- :: A chi-square test is made on the no.-of-throws cell counts. ::
- :: Each 32-bit integer from the test file provides the value for ::
- :: the throw of a die, by floating to [0,1), multiplying by 6 ::
- :: and taking 1 plus the integer part of the result. ::
- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- NOTE: Most of the tests in DIEHARD return a p-value, which
- should be uniform on [0,1) if the input file contains truly
- independent random bits. Those p-values are obtained by
- p=F(X), where F is the assumed distribution of the sample
- random variable X---often normal. But that assumed F is just
- an asymptotic approximation, for which the fit will be worst
- in the tails. Thus you should not be surprised with
- occasional p-values near 0 or 1, such as .0012 or .9983.
- When a bit stream really FAILS BIG, you will get p's of 0 or
- 1 to six or more places. By all means, do not, as a
- Statistician might, think that a p < .025 or p> .975 means
- that the RNG has "failed the test at the .05 level". Such
- p's happen among the hundreds that DIEHARD produces, even
- with good RNG's. So keep in mind that " p happens".
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement