Consider I am given a specific range (0 to 5,000,000) and I should generate 2,500,000 unique random numbers from this range. What is an efficient way to do this? I understand that is tough to get true random numbers.
I tried by checking if a number exists so that I can generate a new random number. But it takes hours to compute. Is there a better way to do this.
The reason behind this is, I have a vector of size 5,000,000. I want to shrink the vector exactly by half. i.e. delete random 50% of the elements from the vector.
#include <iostream>
#include <vector>
#include <stdlib.h>
#include <algorithm>
using namespace std;
#define NUMBER 2500000
#define RAND_START 0
#define RAND_END 5000000
unsigned int generate_random_number(int min, int max)
{
return min + (rand() % (unsigned int)(max - min + 1));
}
int main(int argc, char* argv[])
{
unsigned int count = 0, random_number;
vector<unsigned int> rand_vector;
do
{
count++;
random_number = generate_random_number(RAND_START,RAND_END);
// Tried to manually add a different number each time. But still not a considerable improvement in performance.
if (std::find(rand_vector.begin(), rand_vector.end(), random_number) != rand_vector.end())
{
if(random_number > count)
random_number = random_number - count;
else
random_number = random_number + count;
}
rand_vector.push_back(random_number);
sort(rand_vector.begin(), rand_vector.end());
rand_vector.erase(unique (rand_vector.begin(), rand_vector.end()), rand_vector.end());
}while (rand_vector.size() != NUMBER);
for (unsigned int i =0; i < rand_vector.size(); i++)
{
cout<<rand_vector.at(i)<<", ";
}
cout<<endl;
return 0;
}
Any better approach by which I can do this?
You seem to be locked on an idea that you have to pre-generate your random numbers somehow. Why? You said that the ultimate task is to delete some random elements from a vector. For that specific problem it is not necessary to pre-generate all random indices in advance. You can simply generate these indices "on the fly".
For this specific task (i.e. delete 50% of elements in the vector), Knuth algorithm will work pretty well (see https://stackoverflow.com/a/1608585/187690).
Just iterate through all elements of the original vector from 0 to N-1 and make a random decision to delete i-th element with the probability of N_to_delete / N_to_iterate, where N_to_delete is the number of elements that still have to be deleted, and N_to_iterate is the length of the remaining portion of the vector. This approach does it in one pass (if implemented smartly), requires no extra memory and no trial-and-error iterations. It simply does exactly what you want it to do: destroys 50% of vector elements with equal probability.
Knuth algorithm works best in situations when the number of random values (M) is fairly large compared to the length of the range (N), since its complexity is tied to N. In your case, where M is 50% of N, using Knuth algorithm is a pretty good idea.
When the number of random values is much smaller than the range (M << N), Bob Floyd algorithm (see the above link) makes more sense, since its complexity is defined by M and not by N. It requires additional memory (a set), but still makes no trial-and-error iterations when generating random numbers.
However, in your case you are trying to delete elements from a vector. Vector element deletion is dominated by N, which defeats the benefits of Bob Floyd algorithm anyway.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With