I am trying to generate random base32 numbers that are 6 characters or less. This should give approximately 1 billion different combinations.
I have created a program to generate these “random” numbers. However, it appears that it generates a duplicate on average every 40,000 generations.
Why are these “random” numbers duplicating so often when there are over a billion different combinations?
Here is my code:
static void Main(string[] args) {     int seed = Environment.TickCount;     Random r = new Random(seed);      Dictionary<int, int> resultDictionary = new Dictionary<int, int>();     for (int x = 1; x <= 1000; x++)     {         Dictionary<string, int> dictionary = new Dictionary<string, int>();         try         {             while (true)             {                 int rand = r.Next(0, 1073741823);                 CrockfordBase32Encoding encoding = new CrockfordBase32Encoding();                 string encodedRand = encoding.Encode((ulong)rand, false);                 dictionary.Add(encodedRand, rand);             }         }         catch (Exception)         {         }         Console.WriteLine(string.Format("{0} - {1}", x, dictionary.Count));         resultDictionary.Add(x, dictionary.Count);         x++;     }      Console.WriteLine();     Console.WriteLine("Average Number Before Duplicate: " + resultDictionary.Average(x => x.Value));     Console.ReadLine(); } They are not truly random because the computer uses an algorithm based on a distribution, and are not secure because they rely on deterministic, predictable algorithms. Since a seed number can be set to replicate the “random” numbers generated, it is possible to predict the numbers if the seed is known.
For starters, it's not really random Surprise surprise, the answer is that Math. random() doesn't really generate a random number. Not exactly. It just does a really good job of simulating randomness.
Researchers typically use random numbers supplied by a computer, but these are generated by mathematical formulas – and so by definition cannot be truly random. In the 1970s, scientists discovered that a widely-used formula produced regularities in its 'random' numbers that undermined countless research studies.
But good random number generators don't have any clear pattern to their output, making finding which page of their codebook they correspond to very difficult.) There is no limit to the size of the codebook that algorithmic random number generation can support.
This is similar to the Birthday Problem.  Given a group of n people, What is the probability that two share the same birthday1?  It's higher than you'd think.  
In your case, what are the odds that randomly picking a number between 0 and 1,073,741,823 n times will give you a duplicate?
One approximation from the link above is 1-exp(-(n*n)/(2*d)).  If n=40,000 that equates to about a 52.5% probability that a duplicate is chosen, so seeing duplicates after 40,000 picks on average seems reasonable.
1assuming that birthdays are uniformly distributed universally, which is not the case in reality but is "close enough" and makes the math easier
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With