Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find all differences between two dictionaries efficiently in python

So, I have 2 dictionaries, I have to check for missing keys and for matching keys, check if they have same or different values.

dict1 = {..}
dict2 = {..}
#key values in a list that are missing in each
missing_in_dict1_but_in_dict2 = []
missing_in_dict2_but_in_dict1 = []
#key values in a list that are mismatched between the 2 dictionaries
mismatch = []

What's the most efficient way to do this?

like image 367
jo2083248 Avatar asked Sep 08 '25 14:09

jo2083248


2 Answers

You can use dictionary view objects, which act as sets. Subtract sets to get the difference:

missing_in_dict1_but_in_dict2 = dict2.keys() - dict1
missing_in_dict2_but_in_dict1 = dict1.keys() - dict2

For the keys that are the same, use the intersection, with the & operator:

mismatch = {key for key in dict1.keys() & dict2 if dict1[key] != dict2[key]}

If you are still using Python 2, use dict.viewkeys().

Using dictionary views to produce intersections and differences is very efficient, the view objects themselves are very lightweight the algorithms to create the new sets from the set operations can make direct use of the O(1) lookup behaviour of the underlying dictionaries.

Demo:

>>> dict1 = {'foo': 42, 'bar': 81}
>>> dict2 = {'bar': 117, 'spam': 'ham'}
>>> dict2.keys() - dict1
{'spam'}
>>> dict1.keys() - dict2
{'foo'}
>>> [key for key in dict1.keys() & dict2 if dict1[key] != dict2[key]]
{'bar'}

and a performance comparison with creating separate set() objects:

>>> import timeit
>>> import random
>>> def difference_views(d1, d2):
...     missing1 = d2.keys() - d1
...     missing2 = d1.keys() - d2
...     mismatch = {k for k in d1.keys() & d2 if d1[k] != d2[k]}
...     return missing1, missing2, mismatch
...
>>> def difference_sets(d1, d2):
...     missing1 = set(d2) - set(d1)
...     missing2 = set(d1) - set(d2)
...     mismatch = {k for k in set(d1) & set(d2) if d1[k] != d2[k]}
...     return missing1, missing2, mismatch
...
>>> testd1 = {random.randrange(1000000): random.randrange(1000000) for _ in range(10000)}
>>> testd2 = {random.randrange(1000000): random.randrange(1000000) for _ in range(10000)}
>>> timeit.timeit('d(d1, d2)', 'from __main__ import testd1 as d1, testd2 as d2, difference_views as d', number=1000)
1.8643521590274759
>>> timeit.timeit('d(d1, d2)', 'from __main__ import testd1 as d1, testd2 as d2, difference_sets as d', number=1000)
2.811345119960606

Using set() objects is slower, especially when your input dictionaries get larger.

like image 189
Martijn Pieters Avatar answered Sep 10 '25 03:09

Martijn Pieters


One easy way is to create sets from the dict keys and subtract them:

>>> dict1 = { 'a': 1, 'b': 1 }
>>> dict2 = { 'b': 1, 'c': 1 }
>>> missing_in_dict1_but_in_dict2 = set(dict2) - set(dict1)
>>> missing_in_dict1_but_in_dict2
set(['c'])
>>> missing_in_dict2_but_in_dict1 = set(dict1) - set(dict2)
>>> missing_in_dict2_but_in_dict1
set(['a'])

Or you can avoid casting the second dict to a set by using .difference():

>>> set(dict1).difference(dict2)
set(['a'])
>>> set(dict2).difference(dict1)
set(['c'])
like image 33
Duncan Avatar answered Sep 10 '25 04:09

Duncan