Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given an unknown length list, return a random item in it by scanning it only 1 time

Given an unknown length list, return a random item in it by scanning it only 1 time.

My idea:

A similar algorithm is Reservoir Sampling (posted by others). But, it is too complicated because it needs to run rand() and keep k nodes each iteration.

Is there a better solution? O(n) time and O(1) space?

like image 484
user1002288 Avatar asked Oct 31 '25 21:10

user1002288


2 Answers

Why are you against reservoir sampling? You happen to be doing it with k = 1. There are minor optimizations (e.g. you don't need to select 1 out of the k, since k = 1) but it's the right approach. You could try to optimize by keeping processing a fixed window at a time, do the math to figure out with equal probability if you should choose any of the items in your window instead of the one you have, etc. to minimize rand() calls at the expensive of a more complicated algorithm, but you're going to wind up back at reservoir sampling more or less anyhow.

like image 69
James Avatar answered Nov 02 '25 12:11

James


You use reservoir sampling.

This is not too complicated nor expensive; it is the minimal approach given the constraints that you have (selecting an element from a stream).

It works just fine if you want a random sample size of 1 and if all the elements have the same weight.

When you've simplified the code with a k of 1 and no explicit weighting, its still reservoir sampling.

Not all pseudo random number generators run at the same speed; pick a fast one.


Comments ask what would happen if you re-used the same random number rather than generating a new random number each step:

The Wikipedia link given shows the equivalence to the Yates-Fisher/Knuth shuffle. If you asked what would picking the same random number each step of the shuffle would be, you'd be barking.

like image 22
Will Avatar answered Nov 02 '25 11:11

Will