/* ABSTRACT I want to randomize TCP/IP port scans (like those with nmap) for large-scale scans of the entire Internet. I want to use a random number generator, rather than keeping track in memory, becuase the memory needed to scan the entire Internet would be too large. Using the "linear congruential generator" behind the typical 'rand()' function seems the obvious choice. Indeed, it's what many worms used (like Slammer and Witty) to scan the entire Internet. However, a typical 'rand()' function only works when the ranges are a power of 2. When using for odd sized ranges, I need different internal constants. That's what this program is for: given a range, choose the constants 'a' and 'c' to randomize the range. I'll practice with this code manually to figure things out, then copy this code into a port scanner to make the process automated. BUILD It's just standard C, so all you need do is compile it on your favorite system. Though on GNU you may need to link to the math library. Mac: gcc lcgtest.c -O3 -o lcgtest Ubuntu: gcc lcgtest.c -O3 -o lggtest -lm Windows: I create a VisualStudio project and just compile it USAGE lcgtest where is a number like 100 or 0xffff11223344. Since the idea is to scan the entire Interent and all ports, the max size for the range can be up to a 48 bit number. MATH THEORY One way of randomizing a range is the "full cycle" technique: http://en.wikipedia.org/wiki/Full_cycle Given a range 'range' (like 100), we enumerate the new indexes with: next_index = (index + c) % range; 1. The value of 'c' is relatively prime with 'range', meaning these two numbers have no prime factors in common. Thus, for a 'range' of 100 addresses, you could choose '17' or '977', since they have no prime factors in common with '100'. This full cycle technique isn't random enough, which is why we look at a function like 'rand()'. This function is known as a Linear Congruential Generator, and is described here: http://en.wikipedia.org/wiki/Linear_congruential_generator This function looks like: next_index = (index * a + c) % range; The constants have the rules: 1. The value 'c' is relatively prime with 'range', as before. 2. The value (a - 1) is divisible by all prime factors of 'range' 3. The value (a - 1) is a multiple of 4 if 'range' is a multiple of 4 In other words, it's the same as before, but we add a multiplication along with the addition. PRIME FACTORS So, to find the constants 'a' and 'c' for a range, we need to do prime factorization. How do we do that efficiently? Curiously, I couldn't find answers to that. Everyone is concerned with the theoretical problem of factoring 1000 bit numbers, but all I want to do is factor a 48 bit number. It turns out that the best answer for small numbers is obvious answer: just take a list of primes and try them all one-by-one until you've found all the factors. How to get the list of primes starting from 2, 3, 5, 7, 11, 17, 19, ...? The answer is to use the well-known "Sieve of Eratosthenes". This turns out to be the most expensive part of the code, taking 2 seconds of compute power on an Atom 1.6 Ghz x86 CPU and 2-megabytes of RAM. I could precompute them, but then I'd have to ship around a 5-megabyte file of integers along with the program. An important note is that I don't have to factor a 48 bit number, but only a 24 bit number. A 48 bit number can have at most one prime factor larger than 24 bits, and once I've found all the other factors smaller than 24 bits, this larger factor will be the only one left. Thus, I use the sieve to go through 24 bits or 16-million combinations, which can be done in 2-megabytes of RAM. I think a big reason the Atom is slow is that this can't fit in the cache. SO WHAT THIS PROGRAM DOES Given a 'range', this program does: 1. sieves it to find all prime numbers smaller than the sqrt(range). 2. finds all the prime factors of 'range' 3. calculates a value of 'a' and 'c' that meat the requirements for 'rand()' 4. for small ranges, verifies that 1-to-1 really works 5. prints the first few random numbers output, so that we can see how random they are visually. NOTES If I google harder, I might come up with a better way of factoring large numbers. For example, instead of the basic Sieve of Eratosthenes, I could instead use DJB's "primegen". On a Raspberry Pi (400 MHz ARM), primegen is about 10 times faster factoring small numbers, and uses less memory. */ #include #include #include #include #include #include #include /* A 64-bit number cannot have more than 16 prime factors */ typedef unsigned PRIMEFACTORS[16]; #if defined(_MSC_VER) #define strtoll _strtoi64 #endif /**************************************************************************** * A object for creating a list of prime numbers, then using those primes * to find the prime factors of a number. * * To find the first primes up to 'n', we use the Sieve of Eratosthenes * algorithm. It's the slowest part of the whole thing. * * To find the prime factors of a number, we just brute force check every * prime number in the list. ****************************************************************************/ struct PrimeList { unsigned char *list; unsigned count; unsigned max; }; /**************************************************************************** * When running the Sieve of Eratosthenes, mark a number as not being prime. * Numbers are represented as a bit mask, so clear the bit representing * that number. * * @param list * A bit mask of bits representing all numbers from 0 to 'n'. * @param num * The number we are checking to see if it's prime or not. ****************************************************************************/ static __inline void sieve_clear(unsigned char *list, unsigned num) { unsigned bit = num & 0x7; unsigned index = num / 8; list[index] &= ~(1 << bit); } /**************************************************************************** * Test a number to see if it's in our list of primes. Numbers are * represented with a bit mask, so if it's prime, the bit is set, and if * it's not prime, the corresponding bit is clear. Finding the next prime * in the list means skip the bits that are clear. * * @param list * A bit mask of bits representing all numbers from 0 to 'n'. * @param num * The number we are checking to see if it's prime or not. ****************************************************************************/ static __inline unsigned sieve_is_clear(const unsigned char *list, unsigned num) { unsigned bit = num & 0x7; unsigned index = num / 8; return (list[index] & (1 << bit)) == 0; } /**************************************************************************** * Create an object for dealing with a list of first primes. * * We do this creation as a separate step because we want to also benchmark * the actual seiving step. That's because seiving up to 24 bit numbers can * take several seconds on low-power processors. * * @param largest_number * Find all primes up to this number. ****************************************************************************/ struct PrimeList * primes_create(unsigned largest_number) { struct PrimeList *primes; /* allocate the main object memory */ primes = (struct PrimeList *)malloc(sizeof(*primes)); memset(primes, 0, sizeof(*primes)); /* Set the largest number we are going to look for */ primes->max = largest_number; /* Allocate the sieve bit-mask. This uses one bit for each candidate * number. In other words, if we are looking for all primes up to * 100, then we'll need 12.5 bytes to store one bit for each number */ primes->count = (largest_number+7)/8; primes->list = (unsigned char *)malloc(primes->count); /* Set to all 1. As we sieve the primes, we'll clear the bits for * numbers that aren't prime. */ memset(primes->list, 0xFF, primes->count); /* 0 and 1 are not prime */ sieve_clear(primes->list, 0); sieve_clear(primes->list, 1); /* the padding bits at the end aren't prime (well, if they are, we * don't care) */ while (largest_number+1 < primes->count*8) { sieve_clear(primes->list, largest_number+1); largest_number++; } return primes; } /**************************************************************************** * Clean up the object created by 'primes_create()'. ****************************************************************************/ void primes_destroy(struct PrimeList *primes) { if (primes == NULL) return; if (primes->list) free(primes->list); free(primes); } /**************************************************************************** * Run the "Sieve of Eratosthenes" to find all prime numbers up to 'n'. The * value of 'n' was set when we initialized the object. * * The sieving process works by doing: * - remove all even numbers from the list of possible primes (except for 2) * - remove all multiples of 3 (except for 3 itself) * - remove all multiples of 5 (except for 5 itself) * - remove all multiples of 7 (except for 7 itself) * - ... and so on for all prime numbers * ****************************************************************************/ void primes_run_sieve(struct PrimeList *primes) { unsigned char *list = primes->list; unsigned max = primes->max; unsigned prime = 2; /* for all primes ... */ for (prime = 2; prime < max; ) { unsigned j; /* clear all bits representing multiples of this prime number*/ for (j = prime*2; j < max; j += prime) sieve_clear(list, j); /* goto next prime number */ prime++; while (prime < max && sieve_is_clear(list, prime)) prime++; } } /**************************************************************************** * For enumerating the primes in the list of primes after we've done * a 'run'. * * @param primes * A list of prime numbers (that we probably created with the Sieve) * @param prime * A prime number, like 2, 5, 17, 1699999 * @return * the next prime in the list. Given a prime number like '2', this * function returns '3'. ****************************************************************************/ unsigned primes_next(const struct PrimeList *primes, unsigned prime) { const unsigned char *list = primes->list; unsigned max = primes->max; /* skip cleared bits representing non-primes until we get to the first * set bit representing a prime number */ prime++; while (prime < max && sieve_is_clear(list, prime)) prime++; return prime; } /**************************************************************************** * Count the number of primes we've found. This just brute-force counts all * the bits set in the bit-mask. ****************************************************************************/ unsigned primes_count(const struct PrimeList *primes) { const unsigned char *list = primes->list; unsigned max = primes->count; unsigned i; unsigned count = 0; for (i=0; i>= 1; count += (c&1); c >>= 1; count += (c&1); c >>= 1; count += (c&1); c >>= 1; count += (c&1); c >>= 1; count += (c&1); c >>= 1; count += (c&1); c >>= 1; count += (c&1); } return count; } /**************************************************************************** * Do the prime factorization of a number. This first runs the "Sieve of * Eratosthenes" to find all the small primes, then it does a brute-force * search of all the prime numbers in order to find the list of factors * of the number. * * @param m * The number we are factoring. * @param factors * The output of this function listing the prime factors of 'm'. * If 'm' is prime, then 'm' will be the only member of this list. * @param elapsed * Since the 'sieve' takes a long time to run, we can benchmark it, * outputing the number of seconds taken. This is optional, if NULL, * then benchmarking isn't done. ****************************************************************************/ void primes_factor(uint64_t m, PRIMEFACTORS factors, double *elapsed) { uint64_t root; struct PrimeList *primes; clock_t start, stop; /* for benchmark */ /* * Calculate the square-root. That's because no prime factor can be * larger than the square-root of a number, except for the largest * one. In other words, all prime factors of '14' are less than the * square root (3.742), except for the largest prime factor (7). */ root = (unsigned)sqrt(m+1.0); /* * Create a Seive of Erastothenes that finds all primes up to the * root, then tries them all to see if they are prime factors. */ primes = primes_create((unsigned)root); /* * Run the sieve to find primes. This will take 0.2 seconds on a * Nehalem+ Intel x86 CPU for the full 48 bit range */ start = clock(); primes_run_sieve(primes); stop = clock(); if (elapsed) *elapsed = (double)(stop - start)/(double)CLOCKS_PER_SEC; /* * Now we factor our original number. */ { unsigned temp = (unsigned)m; unsigned prime; unsigned count = 0; memset(factors, 0, sizeof(factors)); /* for all primes ... */ for (prime = 2; prime < root; prime = primes_next(primes, prime)) { /* see if we have a prime factor */ if ((temp % prime) != 0) continue; /* not a prime factor */ /* add this factor to our list */ factors[count++] = prime; /* remove this factor from our 'temp' variable */ while ((temp % prime) == 0) temp /= prime; } /* the residual may also be a factor, in which case, add it * to our list as well */ if (temp != 1) factors[count++] = temp; } /* free the memory we allocated for sieving the primes */ primes_destroy(primes); } /**************************************************************************** * Do a pseudo-random 1-to-1 translation of a number within a range to * another number in that range. * * The constants 'a' and 'c' must be chosen to match the LCG algorithm * to fit 'm' (range). * * This the same as the function 'rand()', except all the constants and * seeds are specified as parameters. * * @param index * The index within the range that we are randomizing. * @param a * The 'multiplier' of the LCG algorithm. * @param c * The 'increment' of the LCG algorithm. * @param range * The 'modulus' of the LCG algorithm. ****************************************************************************/ uint64_t lcg(uint64_t index, uint64_t a, uint64_t c, uint64_t range) { return (index * a + c) % range; } /**************************************************************************** * Verify the LCG algorithm. You shouldn't do this for large ranges, * because we'll run out of memory. Therefore, this algorithm allocates * a buffer only up to a smaller range. We still have to traverse the * entire range of numbers, but we only need store values for a smaller * range. If 10% of the range checks out, then there's a good chance * it applies to the other 90% as well. * * This works by counting the results of rand(), which should be produced * exactly once. ****************************************************************************/ unsigned lcg_verify(uint64_t a, uint64_t c, uint64_t range, uint64_t max) { unsigned char *list; uint64_t i; unsigned is_success = 1; /* Allocate a list of 1-byte counters */ list = (unsigned char *)malloc((range [-c num] [-a num]\n"); printf("Finds constants for randomizing range using a typical " "linear congruent generator\n"); printf(" may be up to 48 bits in size\n"); printf("[-c num] suggests a 'c' constant\n"); printf("[-a num] suggests a 'a' scaling factor\n"); printf("Numbers may be decimal or hex\n"); return 0; } /* * Get the input * Read in optional additional parameters, such as a starting value for * 'c' or a scaling factor for 'a'. */ for (i=1; i<(unsigned)argc; i++) { if (isdigit(argv[i][0]&0xFF)) m = strtoll(argv[1], 0, 0); else if (argv[i][0] == '-') { switch (tolower(argv[i][1]&0xFF)) { case 'a': if (argv[i][2] == '\0') a_offset = strtoll(argv[++i], 0, 0); else a_offset = strtoll(argv[i]+2, 0, 0); break; case 'c': if (argv[i][2] == '\0') c = strtoll(argv[++i], 0, 0); else c = strtoll(argv[i]+2, 0, 0); break; case 'h': case '?': goto usage; case '-': goto usage; default: fprintf(stderr, "unknown parm: %s\n", argv[i]); goto usage; } } else { fprintf(stderr, "unknown parm: %s\n", argv[i]); goto usage; } } /* * Find all the prime factors of the number. This step can take several * seconds for 48 bit numbers, which is why we benchmark how long it * takes. */ primes_factor(m, factors, &elapsed); /* * Calculate the 'a-1' constant. It must share all the prime factors * with the range, and if the range is a multiple of 4, must also * be a multiple of 4 */ a = 1; for (i=0; factors[i]; i++) a = a * factors[i]; if ((m % 4) == 0) a *= 2; a *= a_offset; /* maybe improve for better randomness */ a += 1; /* * Calculate the 'c' constant. It must have no prime factors in * common with the range. */ if (c == 0) c = 29752039458; /* something random */ while (has_factors_in_common(c, factors)) c++; /* * print the results */ //printf("sizeof(int) = %llu-bits\n", (uint64_t)(sizeof(size_t)*8)); printf("elapsed = %5.3f-seconds\n", elapsed); printf("factors = "); for (i=0; factors[i]; i++) printf("%u ", factors[i]); printf("%s\n", factors[0]?"":"(none)"); printf("m = %-24llu (0x%llx)\n", m, m); printf("a = %-24llu (0x%llx)\n", a, a); printf("c = %-24llu (0x%llx)\n", c, c); printf("c%%m = %-24llu (0x%llx)\n", c%m, c%m); if (m < 1000000000) { if (lcg_verify(a, c+1, m, 280)) printf("verify = success\n"); else printf("verify = failure\n"); } else { printf("verify = too big to check\n"); } /* * Print some first numbers. We use these to visually inspect whether * the results are random or not. */ { unsigned count = 0; uint64_t x = 0; unsigned digits = count_digits(m); for (i=0; i<100 && i < m; i++) { x = lcg(x, a, c, m); count += printf("%*llu ", digits, x); if (count >= 70) { count = 0; printf("\n"); } } printf("\n"); } return 0; }