Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python set difference with custom objects

I have two sets of custom objects that I build from the following tuples of dictionaries:

tupleOfDicts1 = ({'id': 1, 'name': 'peter', 'last': 'smith'},
                 {'id': 2, 'name': 'peter', 'last': 'smith'},
                 {'id': 3, 'name': 'mark', 'last':'white'},
                 {'id': 4, 'name': 'john', 'last': 'lennon'},)

tupleOfDicts2 = ({'id': 9, 'name': 'peter', 'last': 'smith'},
                 {'id': 8, 'name': 'peter', 'last': 'smith'},)

As you can see, I have elements that are the same, except by the 'id' property.

Then I am defining the following object:

class Result:    
    def __init__(self, **kwargs):
        self.id = kwargs['id']
        self.nome = kwargs['name']
        self.cognome = kwargs['last']

    def __repr__(self):
        return 'Result(%s, %s, %s)' %(self.id, self.name, self.last)

    def __hash__(self):
        #  hash must consider the id of the elements, but the id must not be considered when comparison
        return hash((self.id, self.name, self.last))

    def __eq__(self, other):
        if isinstance(other, Result):
            #  I want comparison to be made considering only name and last
            return (self.name, self.last) == (other.name, other.last)
        else:
            return False

    def __ne__(self, other):
        return not self.__eq__(other)

As you see this object is ready to receive the dictionaries in the constructor.

Now I define a function that returns a set of Result objects from the tuples containing the dictionaries:

def getSetFromTuple(tupleOfDicts):
    myset = set()
    for dictionary in tupleOfDicts:
        myset.add(Result(**dictionary))
    return myset

At this point I create my two sets:

mySet1 = getSetFromTuple(tupleOfDicts1)
mySet2 = getSetFromTuple(tupleOfDicts2)

I make all this because I want to have all elements on mySet1 that I do not have on mySet2 (for this comparison I do not want that the property 'id' gets involved):

diff = mySet1 - mySet2

But I am not getting what I want, in this case, I am getting all elements of mySet1:

print(len(mySet1 - mySet2))  # 4

I expect instead only two elements remaining from mySet1 because two of its elements are on mySet2 (with the same name and the same last the id will be always different).

It seems to me that when I call the - operator between two sets this class will compare the hash value of elements. In this case the output of 4 makes sense. BUT: Is there a way to do what I want?

like image 268
Rodriguez David Avatar asked Dec 18 '25 09:12

Rodriguez David


1 Answers

Contrary to your comment, I think id should not be in the hash. If two elements are equal their hash must be equal as well:

def __hash__(self):
  return hash((self.name, self.last))

Internally hash maps the value to a bucket. Elements with different hashes may end up in different buckets and avoid being compared completely when de-duplicated (sets)/queried (dictionaries).

That said, there is a much simpler way to get your results, without involving OOP and just working with the data itself:

dictionaries = tupleOfDicts1 + tupleOfDicts2
unique_values = {(d['name'], d['last']): d for d in dictionaries}.values()
like image 77
Reut Sharabani Avatar answered Dec 20 '25 21:12

Reut Sharabani



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!