Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- /*
- ABSTRACT
- I want to randomize TCP/IP port scans (like those with nmap) for large-scale
- scans of the entire Internet. I want to use a random number generator, rather
- than keeping track in memory, becuase the memory needed to scan the entire
- Internet would be too large.
- Using the "linear congruential generator" behind the typical 'rand()' function
- seems the obvious choice. Indeed, it's what many worms used (like Slammer and
- Witty) to scan the entire Internet. However, a typical 'rand()' function only
- works when the ranges are a power of 2. When using for odd sized ranges, I
- need different internal constants.
- That's what this program is for: given a range, choose the constants 'a' and
- 'c' to randomize the range. I'll practice with this code manually to figure
- things out, then copy this code into a port scanner to make the process
- automated.
- BUILD
- It's just standard C, so all you need do is compile it on your favorite system.
- Though on GNU you may need to link to the math library.
- Mac: gcc lcgtest.c -O3 -o lcgtest
- Ubuntu: gcc lcgtest.c -O3 -o lggtest -lm
- Windows: I create a VisualStudio project and just compile it
- USAGE
- lcgtest <range>
- where <range> is a number like 100 or 0xffff11223344. Since the idea is
- to scan the entire Interent and all ports, the max size for the range
- can be up to a 48 bit number.
- MATH THEORY
- One way of randomizing a range is the "full cycle" technique:
- http://en.wikipedia.org/wiki/Full_cycle
- Given a range 'range' (like 100), we enumerate the new indexes with:
- next_index = (index + c) % range;
- 1. The value of 'c' is relatively prime with 'range', meaning these two
- numbers have no prime factors in common. Thus, for a 'range' of 100
- addresses, you could choose '17' or '977', since they have no prime
- factors in common with '100'.
- This full cycle technique isn't random enough, which is why we look at
- a function like 'rand()'. This function is known as a Linear Congruential
- Generator, and is described here:
- http://en.wikipedia.org/wiki/Linear_congruential_generator
- This function looks like:
- next_index = (index * a + c) % range;
- The constants have the rules:
- 1. The value 'c' is relatively prime with 'range', as before.
- 2. The value (a - 1) is divisible by all prime factors of 'range'
- 3. The value (a - 1) is a multiple of 4 if 'range' is a multiple of 4
- In other words, it's the same as before, but we add a multiplication along
- with the addition.
- PRIME FACTORS
- So, to find the constants 'a' and 'c' for a range, we need to do prime
- factorization. How do we do that efficiently? Curiously, I couldn't find
- answers to that. Everyone is concerned with the theoretical problem of
- factoring 1000 bit numbers, but all I want to do is factor a 48 bit number.
- It turns out that the best answer for small numbers is obvious answer: just
- take a list of primes and try them all one-by-one until you've found all the
- factors.
- How to get the list of primes starting from 2, 3, 5, 7, 11, 17, 19, ...? The
- answer is to use the well-known "Sieve of Eratosthenes". This turns out to
- be the most expensive part of the code, taking 2 seconds of compute power on
- an Atom 1.6 Ghz x86 CPU and 2-megabytes of RAM. I could precompute them, but
- then I'd have to ship around a 5-megabyte file of integers along with the
- program.
- An important note is that I don't have to factor a 48 bit number, but only
- a 24 bit number. A 48 bit number can have at most one prime factor larger
- than 24 bits, and once I've found all the other factors smaller than 24 bits,
- this larger factor will be the only one left. Thus, I use the sieve to go
- through 24 bits or 16-million combinations, which can be done in 2-megabytes
- of RAM. I think a big reason the Atom is slow is that this can't fit in the
- cache.
- SO WHAT THIS PROGRAM DOES
- Given a 'range', this program does:
- 1. sieves it to find all prime numbers smaller than the sqrt(range).
- 2. finds all the prime factors of 'range'
- 3. calculates a value of 'a' and 'c' that meat the requirements for 'rand()'
- 4. for small ranges, verifies that 1-to-1 really works
- 5. prints the first few random numbers output, so that we can see how random
- they are visually.
- NOTES
- If I google harder, I might come up with a better way of factoring large
- numbers. For example, instead of the basic Sieve of Eratosthenes, I could
- instead use DJB's "primegen". On a Raspberry Pi (400 MHz ARM), primegen
- is about 10 times faster factoring small numbers, and uses less memory.
- */
- #include <ctype.h>
- #include <string.h>
- #include <stdio.h>
- #include <stdlib.h>
- #include <stdint.h>
- #include <time.h>
- #include <math.h>
- /* A 64-bit number cannot have more than 16 prime factors */
- typedef unsigned PRIMEFACTORS[16];
- #if defined(_MSC_VER)
- #define strtoll _strtoi64
- #endif
- /****************************************************************************
- * A object for creating a list of prime numbers, then using those primes
- * to find the prime factors of a number.
- *
- * To find the first primes up to 'n', we use the Sieve of Eratosthenes
- * algorithm. It's the slowest part of the whole thing.
- *
- * To find the prime factors of a number, we just brute force check every
- * prime number in the list.
- ****************************************************************************/
- struct PrimeList
- {
- unsigned char *list;
- unsigned count;
- unsigned max;
- };
- /****************************************************************************
- * When running the Sieve of Eratosthenes, mark a number as not being prime.
- * Numbers are represented as a bit mask, so clear the bit representing
- * that number.
- *
- * @param list
- * A bit mask of bits representing all numbers from 0 to 'n'.
- * @param num
- * The number we are checking to see if it's prime or not.
- ****************************************************************************/
- static __inline void
- sieve_clear(unsigned char *list, unsigned num)
- {
- unsigned bit = num & 0x7;
- unsigned index = num / 8;
- list[index] &= ~(1 << bit);
- }
- /****************************************************************************
- * Test a number to see if it's in our list of primes. Numbers are
- * represented with a bit mask, so if it's prime, the bit is set, and if
- * it's not prime, the corresponding bit is clear. Finding the next prime
- * in the list means skip the bits that are clear.
- *
- * @param list
- * A bit mask of bits representing all numbers from 0 to 'n'.
- * @param num
- * The number we are checking to see if it's prime or not.
- ****************************************************************************/
- static __inline unsigned
- sieve_is_clear(const unsigned char *list, unsigned num)
- {
- unsigned bit = num & 0x7;
- unsigned index = num / 8;
- return (list[index] & (1 << bit)) == 0;
- }
- /****************************************************************************
- * Create an object for dealing with a list of first primes.
- *
- * We do this creation as a separate step because we want to also benchmark
- * the actual seiving step. That's because seiving up to 24 bit numbers can
- * take several seconds on low-power processors.
- *
- * @param largest_number
- * Find all primes up to this number.
- ****************************************************************************/
- struct PrimeList *
- primes_create(unsigned largest_number)
- {
- struct PrimeList *primes;
- /* allocate the main object memory */
- primes = (struct PrimeList *)malloc(sizeof(*primes));
- memset(primes, 0, sizeof(*primes));
- /* Set the largest number we are going to look for */
- primes->max = largest_number;
- /* Allocate the sieve bit-mask. This uses one bit for each candidate
- * number. In other words, if we are looking for all primes up to
- * 100, then we'll need 12.5 bytes to store one bit for each number */
- primes->count = (largest_number+7)/8;
- primes->list = (unsigned char *)malloc(primes->count);
- /* Set to all 1. As we sieve the primes, we'll clear the bits for
- * numbers that aren't prime. */
- memset(primes->list, 0xFF, primes->count);
- /* 0 and 1 are not prime */
- sieve_clear(primes->list, 0);
- sieve_clear(primes->list, 1);
- /* the padding bits at the end aren't prime (well, if they are, we
- * don't care) */
- while (largest_number+1 < primes->count*8) {
- sieve_clear(primes->list, largest_number+1);
- largest_number++;
- }
- return primes;
- }
- /****************************************************************************
- * Clean up the object created by 'primes_create()'.
- ****************************************************************************/
- void
- primes_destroy(struct PrimeList *primes)
- {
- if (primes == NULL)
- return;
- if (primes->list)
- free(primes->list);
- free(primes);
- }
- /****************************************************************************
- * Run the "Sieve of Eratosthenes" to find all prime numbers up to 'n'. The
- * value of 'n' was set when we initialized the object.
- *
- * The sieving process works by doing:
- * - remove all even numbers from the list of possible primes (except for 2)
- * - remove all multiples of 3 (except for 3 itself)
- * - remove all multiples of 5 (except for 5 itself)
- * - remove all multiples of 7 (except for 7 itself)
- * - ... and so on for all prime numbers
- *
- ****************************************************************************/
- void
- primes_run_sieve(struct PrimeList *primes)
- {
- unsigned char *list = primes->list;
- unsigned max = primes->max;
- unsigned prime = 2;
- /* for all primes ... */
- for (prime = 2; prime < max; ) {
- unsigned j;
- /* clear all bits representing multiples of this prime number*/
- for (j = prime*2; j < max; j += prime)
- sieve_clear(list, j);
- /* goto next prime number */
- prime++;
- while (prime < max && sieve_is_clear(list, prime))
- prime++;
- }
- }
- /****************************************************************************
- * For enumerating the primes in the list of primes after we've done
- * a 'run'.
- *
- * @param primes
- * A list of prime numbers (that we probably created with the Sieve)
- * @param prime
- * A prime number, like 2, 5, 17, 1699999
- * @return
- * the next prime in the list. Given a prime number like '2', this
- * function returns '3'.
- ****************************************************************************/
- unsigned
- primes_next(const struct PrimeList *primes, unsigned prime)
- {
- const unsigned char *list = primes->list;
- unsigned max = primes->max;
- /* skip cleared bits representing non-primes until we get to the first
- * set bit representing a prime number */
- prime++;
- while (prime < max && sieve_is_clear(list, prime))
- prime++;
- return prime;
- }
- /****************************************************************************
- * Count the number of primes we've found. This just brute-force counts all
- * the bits set in the bit-mask.
- ****************************************************************************/
- unsigned
- primes_count(const struct PrimeList *primes)
- {
- const unsigned char *list = primes->list;
- unsigned max = primes->count;
- unsigned i;
- unsigned count = 0;
- for (i=0; i<max; i++) {
- unsigned char c = list[i];
- count += (c&1); c >>= 1;
- count += (c&1); c >>= 1;
- count += (c&1); c >>= 1;
- count += (c&1); c >>= 1;
- count += (c&1); c >>= 1;
- count += (c&1); c >>= 1;
- count += (c&1); c >>= 1;
- count += (c&1);
- }
- return count;
- }
- /****************************************************************************
- * Do the prime factorization of a number. This first runs the "Sieve of
- * Eratosthenes" to find all the small primes, then it does a brute-force
- * search of all the prime numbers in order to find the list of factors
- * of the number.
- *
- * @param m
- * The number we are factoring.
- * @param factors
- * The output of this function listing the prime factors of 'm'.
- * If 'm' is prime, then 'm' will be the only member of this list.
- * @param elapsed
- * Since the 'sieve' takes a long time to run, we can benchmark it,
- * outputing the number of seconds taken. This is optional, if NULL,
- * then benchmarking isn't done.
- ****************************************************************************/
- void
- primes_factor(uint64_t m, PRIMEFACTORS factors, double *elapsed)
- {
- uint64_t root;
- struct PrimeList *primes;
- clock_t start, stop; /* for benchmark */
- /*
- * Calculate the square-root. That's because no prime factor can be
- * larger than the square-root of a number, except for the largest
- * one. In other words, all prime factors of '14' are less than the
- * square root (3.742), except for the largest prime factor (7).
- */
- root = (unsigned)sqrt(m+1.0);
- /*
- * Create a Seive of Erastothenes that finds all primes up to the
- * root, then tries them all to see if they are prime factors.
- */
- primes = primes_create((unsigned)root);
- /*
- * Run the sieve to find primes. This will take 0.2 seconds on a
- * Nehalem+ Intel x86 CPU for the full 48 bit range
- */
- start = clock();
- primes_run_sieve(primes);
- stop = clock();
- if (elapsed)
- *elapsed = (double)(stop - start)/(double)CLOCKS_PER_SEC;
- /*
- * Now we factor our original number.
- */
- {
- unsigned temp = (unsigned)m;
- unsigned prime;
- unsigned count = 0;
- memset(factors, 0, sizeof(factors));
- /* for all primes ... */
- for (prime = 2;
- prime < root;
- prime = primes_next(primes, prime)) {
- /* see if we have a prime factor */
- if ((temp % prime) != 0)
- continue; /* not a prime factor */
- /* add this factor to our list */
- factors[count++] = prime;
- /* remove this factor from our 'temp' variable */
- while ((temp % prime) == 0)
- temp /= prime;
- }
- /* the residual may also be a factor, in which case, add it
- * to our list as well */
- if (temp != 1)
- factors[count++] = temp;
- }
- /* free the memory we allocated for sieving the primes */
- primes_destroy(primes);
- }
- /****************************************************************************
- * Do a pseudo-random 1-to-1 translation of a number within a range to
- * another number in that range.
- *
- * The constants 'a' and 'c' must be chosen to match the LCG algorithm
- * to fit 'm' (range).
- *
- * This the same as the function 'rand()', except all the constants and
- * seeds are specified as parameters.
- *
- * @param index
- * The index within the range that we are randomizing.
- * @param a
- * The 'multiplier' of the LCG algorithm.
- * @param c
- * The 'increment' of the LCG algorithm.
- * @param range
- * The 'modulus' of the LCG algorithm.
- ****************************************************************************/
- uint64_t
- lcg(uint64_t index, uint64_t a, uint64_t c, uint64_t range)
- {
- return (index * a + c) % range;
- }
- /****************************************************************************
- * Verify the LCG algorithm. You shouldn't do this for large ranges,
- * because we'll run out of memory. Therefore, this algorithm allocates
- * a buffer only up to a smaller range. We still have to traverse the
- * entire range of numbers, but we only need store values for a smaller
- * range. If 10% of the range checks out, then there's a good chance
- * it applies to the other 90% as well.
- *
- * This works by counting the results of rand(), which should be produced
- * exactly once.
- ****************************************************************************/
- unsigned
- lcg_verify(uint64_t a, uint64_t c, uint64_t range, uint64_t max)
- {
- unsigned char *list;
- uint64_t i;
- unsigned is_success = 1;
- /* Allocate a list of 1-byte counters */
- list = (unsigned char *)malloc((range<max)?range:max);
- memset(list, 0, (range<max)?range:max);
- /* For all numbers in the range, verify increment the counter for the
- * the output. */
- for (i=0; i<range; i++) {
- uint64_t x = lcg(i, a, c, range);
- if (x < max)
- list[x]++;
- }
- /* Now check the output to make sure that every counter is set exactly
- * to the value of '1'. */
- for (i=0; i<max && i<range; i++) {
- if (list[i] != 1)
- is_success = 0;
- }
- free(list);
- return is_success;
- }
- /****************************************************************************
- * Count the number of digits in a number so that we can pretty-print a
- * bunch of numbers in nice columns.
- ****************************************************************************/
- unsigned
- count_digits(uint64_t num)
- {
- unsigned result = 0;
- while (num) {
- result++;
- num /= 10;
- }
- return result;
- }
- /****************************************************************************
- * Tell whether the number has any prime factors in common with the list
- * of factors. In other words, if it's not coprime with the other number.
- * @param c
- * The number we want to see has common factors with the other number.
- * @param factors
- * The factors from the other number
- * @return
- * !is_coprime(c, factors)
- ****************************************************************************/
- unsigned
- has_factors_in_common(uint64_t c, PRIMEFACTORS factors)
- {
- unsigned i;
- for (i=0; factors[i]; i++) {
- if ((c % factors[i]) == 0)
- return factors[i]; /* found a common factor */
- }
- return 0; /* no factors in common */
- }
- /****************************************************************************
- ****************************************************************************/
- int main(int argc, char *argv[])
- {
- uint64_t m=2; /* LCG modulus aka. the IP address range size */
- uint64_t a; /* LCG multiplier */
- uint64_t c = 0; /* LCG increment */
- double elapsed = 0.0; /* Benchmark of 'sieve' algorithm */
- uint64_t i;
- PRIMEFACTORS factors; /* List of prime factors of 'm' */
- uint64_t a_offset = 1;
- memset(factors, 0, sizeof(factors));
- /* usage info if nothing specified */
- usage:
- if (argc < 2) {
- printf("usage:\n lcgtest <range> [-c num] [-a num]\n");
- printf("Finds constants for randomizing range using a typical "
- "linear congruent generator\n");
- printf("<range> may be up to 48 bits in size\n");
- printf("[-c num] suggests a 'c' constant\n");
- printf("[-a num] suggests a 'a' scaling factor\n");
- printf("Numbers may be decimal or hex\n");
- return 0;
- }
- /*
- * Get the input
- * Read in optional additional parameters, such as a starting value for
- * 'c' or a scaling factor for 'a'.
- */
- for (i=1; i<(unsigned)argc; i++) {
- if (isdigit(argv[i][0]&0xFF))
- m = strtoll(argv[1], 0, 0);
- else if (argv[i][0] == '-') {
- switch (tolower(argv[i][1]&0xFF)) {
- case 'a':
- if (argv[i][2] == '\0')
- a_offset = strtoll(argv[++i], 0, 0);
- else
- a_offset = strtoll(argv[i]+2, 0, 0);
- break;
- case 'c':
- if (argv[i][2] == '\0')
- c = strtoll(argv[++i], 0, 0);
- else
- c = strtoll(argv[i]+2, 0, 0);
- break;
- case 'h':
- case '?':
- goto usage;
- case '-':
- goto usage;
- default:
- fprintf(stderr, "unknown parm: %s\n", argv[i]);
- goto usage;
- }
- } else {
- fprintf(stderr, "unknown parm: %s\n", argv[i]);
- goto usage;
- }
- }
- /*
- * Find all the prime factors of the number. This step can take several
- * seconds for 48 bit numbers, which is why we benchmark how long it
- * takes.
- */
- primes_factor(m, factors, &elapsed);
- /*
- * Calculate the 'a-1' constant. It must share all the prime factors
- * with the range, and if the range is a multiple of 4, must also
- * be a multiple of 4
- */
- a = 1;
- for (i=0; factors[i]; i++)
- a = a * factors[i];
- if ((m % 4) == 0)
- a *= 2;
- a *= a_offset; /* maybe improve for better randomness */
- a += 1;
- /*
- * Calculate the 'c' constant. It must have no prime factors in
- * common with the range.
- */
- if (c == 0)
- c = 29752039458; /* something random */
- while (has_factors_in_common(c, factors))
- c++;
- /*
- * print the results
- */
- //printf("sizeof(int) = %llu-bits\n", (uint64_t)(sizeof(size_t)*8));
- printf("elapsed = %5.3f-seconds\n", elapsed);
- printf("factors = ");
- for (i=0; factors[i]; i++)
- printf("%u ", factors[i]);
- printf("%s\n", factors[0]?"":"(none)");
- printf("m = %-24llu (0x%llx)\n", m, m);
- printf("a = %-24llu (0x%llx)\n", a, a);
- printf("c = %-24llu (0x%llx)\n", c, c);
- printf("c%%m = %-24llu (0x%llx)\n", c%m, c%m);
- if (m < 1000000000) {
- if (lcg_verify(a, c+1, m, 280))
- printf("verify = success\n");
- else
- printf("verify = failure\n");
- } else {
- printf("verify = too big to check\n");
- }
- /*
- * Print some first numbers. We use these to visually inspect whether
- * the results are random or not.
- */
- {
- unsigned count = 0;
- uint64_t x = 0;
- unsigned digits = count_digits(m);
- for (i=0; i<100 && i < m; i++) {
- x = lcg(x, a, c, m);
- count += printf("%*llu ", digits, x);
- if (count >= 70) {
- count = 0;
- printf("\n");
- }
- }
- printf("\n");
- }
- return 0;
- }
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement