Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate a random, equally probable combination from a list in python

Tags:

python

Let's say I have a list like this: ['a','b','c']. I need to get a random combination from this list eg ['a','c']. However I need all combinations to have equal probability so the chances of getting ['a'] should be the exact same as the chances of getting ['b','c']. My real list is 22 elements long so enumerating every single combination is impossible. My first thought was to use random.sample however that requires you to specify the number of elements, which would have to be randomly chosen but the probability would have to be (number of elements in this combination)/(number of elements in all combinations) which are gigantic numbers. Is there any better way? This will be run thousands of times so efficient solutions are appreciated.

like image 296
trallgorm Avatar asked Oct 27 '25 06:10

trallgorm


1 Answers

There's a very efficient way to do this. The set of all combinations of a given set is known as the power set, the set of all subsets of the given set. If the set S contains m items, then there are 2**m possible combinations in total, including the empty set and S itself.

So to randomly select a combination from the power set of S we just need to choose a random number n from range(2**m) as an index into the power set and then generate the combination corresponding to n.

We can convert the index number n into a combination by looking at the binary expansion of n. There are m bits in n. We pair up those bits with the items in S. If a given bit is 1 then that item is selected for our combination, if it's a 0, we reject that item.

Here's a short demo.

from random import seed, randrange

seed(42)

def indexed_combination(seq, n):
    result = []
    for u in seq:
        if n & 1:
            result.append(u)
        n >>= 1
        if not n:
            break
    return result

print('Testing indexed_combination')
seq = 'abc'
for i in range(1 << len(seq)):
    print(i, ''.join(indexed_combination(seq, i)))
print()

def random_combination(seq):
    n = randrange(1 << len(seq))
    return indexed_combination(seq, n)

print('Testing random_combination')
seq = 'abcdefghij'
for i in range(20):
    print(i, random_combination(seq))

output

Testing indexed_combination
0 
1 a
2 b
3 ab
4 c
5 ac
6 bc
7 abc

Testing random_combination
0 ['c', 'f', 'g', 'h']
1 ['a', 'b', 'e', 'f']
2 ['a', 'b', 'e', 'f', 'j']
3 ['a', 'c', 'e', 'f', 'g', 'h', 'i']
4 ['a', 'd', 'g', 'h', 'i']
5 ['a', 'c', 'd', 'e', 'i']
6 ['a', 'e', 'g', 'h']
7 ['b', 'e', 'f', 'h']
8 ['f', 'g', 'i', 'j']
9 ['a', 'g']
10 ['a', 'c', 'd', 'e', 'f']
11 ['a', 'b', 'c', 'd', 'e', 'f', 'h']
12 ['a', 'b', 'c', 'd', 'e', 'f', 'h', 'i']
13 ['c', 'd', 'e', 'g', 'h', 'i']
14 ['b', 'c', 'e', 'f']
15 ['a', 'b', 'c', 'e', 'h', 'i']
16 ['a', 'b', 'd', 'e', 'g', 'i', 'j']
17 ['a', 'b', 'g', 'h', 'i']
18 ['a', 'b', 'c', 'e', 'h', 'i', 'j']
19 ['a', 'd', 'e', 'f', 'j']

I call the random seed function at the start of the script with a fixed seed number. I find it's convenient to do that when developing code that uses pseudorandom numbers because it makes it easier to test and debug the code when the random numbers are reproducible. In a real application you should seed the radomizer with a system entropy source. You can easily do that by eliminating the seed call, or by doing seed(None). If you want more randomness than what the standard Mersenee Twister generator offers, you can hook into the system's random source via the random.SystemRandom class.

like image 133
PM 2Ring Avatar answered Oct 28 '25 20:10

PM 2Ring