I have a very large dictionary with thousands of elements. I need to execute a function with this dictionary as parameter. Now, instead of passing the whole dictionary in a single execution, I want to execute the function in batches - with x key-value pairs of the dictionary at a time.
I am doing the following:
mydict = ##some large hash
x = ##batch size
def some_func(data):
    ##do something on data
temp = {}
for key,value in mydict.iteritems():
        if len(temp) != 0 and len(temp)%x == 0:
                some_func(temp)
                temp = {}
                temp[key] = value
        else:
                temp[key] = value
if temp != {}:
        some_func(temp)
This looks very hackish to me. I want to know if there is an elegant/better way of doing this.
You can iterate through a Python dictionary using the keys(), items(), and values() methods. keys() returns an iterable list of dictionary keys. items() returns the key-value pairs in a dictionary. values() returns the dictionary values.
To iterate through the dictionary's keys, utilise the keys() method that is supplied by the dictionary. An iterable of the keys available in the dictionary is returned.
I often use this little utility:
import itertools
def chunked(it, size):
    it = iter(it)
    while True:
        p = tuple(itertools.islice(it, size))
        if not p:
            break
        yield p
For your use case:
for chunk in chunked(big_dict.iteritems(), batch_size):
    func(chunk)
Here are two solutions adapted from earlier answers of mine.
Either, you can just get the list of items from the dictionary and create new dicts from slices of that list. This is not optimal, though, as it does a lot of copying of that huge dictionary.
def chunks(dictionary, size):
    items = dictionary.items()
    return (dict(items[i:i+size]) for i in range(0, len(items), size))
Alternatively, you can use some of the itertools module's functions to yield (generate) new sub-dictionaries as you loop. This is similar to @georg's answer, just using a for loop.
from itertools import chain, islice
def chunks(dictionary, size):
    iterator = dictionary.iteritems()
    for first in iterator:
        yield dict(chain([first], islice(iterator, size - 1)))
Example usage. for both cases:
mydict = {i+1: chr(i+65) for i in range(26)}
for sub_d in chunks2(mydict, 10):
    some_func(sub_d)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With