Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I "multi-process" the itertools product module?

So I tried I tried calculating millions and millions of different combinations of the below string but I was only calculating roughly 1,750 combinations a second which isn't even near the speed I need. So how would I reshape this so multiple processes of the same thing are calculating different parts, while not calculating parts that have already been calculated and also maintaining fast speeds? The code below is partially what I've been using. Any examples would be appreciated!

from itertools import product
for chars in product("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ12234567890!@#$%^&*?,()-=+[]/;", repeat = 4):
   print chars
like image 432
Noah R Avatar asked Apr 21 '12 19:04

Noah R


People also ask

What does Itertools Zip_longest return?

itertools. zip_longest (*iterables, fillvalue=None) Make an iterator that aggregates elements from each of the iterables. If the iterables are of uneven length, missing values are filled-in with fillvalue. Iteration continues until the longest iterable is exhausted.

What is product in Itertools in Python?

itertools. product() is used to find the cartesian product from the given iterator, output is lexicographic ordered.

What is combinations in Itertools?

itertools.combinations(iterable, r) This tool returns the length subsequences of elements from the input iterable. Combinations are emitted in lexicographic sorted order. So, if the input iterable is sorted, the combination tuples will be produced in sorted order.


2 Answers

One way to break the product up into parts is to break up the first component of the product, so that each independent job has all the elements starting with a certain set of first letters. For example:

import string
import multiprocessing as mp
import itertools

alphabet = string.ascii_letters+string.digits+"!@#$%^&*?,()-=+[]/;"
num_parts = 4
part_size = len(alphabet) // num_parts

def do_job(first_bits):
    for x in itertools.product(first_bits, alphabet, alphabet, alphabet):
        print(x)

if __name__ == "__main__":
    pool = mp.Pool()
    results = []
    for i in xrange(num_parts):
        if i == num_parts - 1:
            first_bit = alphabet[part_size * i :]
        else:
            first_bit = alphabet[part_size * i : part_size * (i+1)]
        results.append(pool.apply_async(do_job(first_bit)))

    pool.close()
    pool.join()

(where obviously you'd only use results if do_job actually returned something).

like image 139
Danica Avatar answered Sep 28 '22 06:09

Danica


Are you sure you're only getting 1750 combinations per second? I'm getting about 10 million.

def test(n):
    start = time.time()
    count = 0
    for chars in product("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ12234567890!@#$%^&*?,()-=+[]/;", repeat = 4):

        count += 1
        if count == n: break
    return time.time() - start    

>>> test(10000)
0.03300023078918457
>>> test(1000000)
0.15799999237060547
>>> test(10000000)
1.0469999313354492

I don't think my computer is that much faster than yours.

note: I posted this as an answer because I wanted to show code. It's really more of a comment. So please, no upvotes or downvotes.

like image 41
Steven Rumbalski Avatar answered Sep 28 '22 07:09

Steven Rumbalski