Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to correctly handle duplicates in computing the complementary list?

While this question may seem to be related to previous ones (like this one: Python, compute list difference), it is not exactly the same, and even the best rated answer containing two suggestions will not exactly answer the following one.

I have a main (unordered) list L containing values with duplicates; take for instance a list of integers:

L = [3, 1, 4, 1, 5, 9, 2, 6, 5]

I have a smaller list contaning a choice of values from L, for instance:

x = [4, 1, 3]

The order of the elements in x is not related in any way to the order of the elements in L.

Now, I would like to compute the difference L-x in such a way that concatenating x and this difference would give the same list than L (except for the order); to be more precise:

list(sorted(x + D(L,x))) == list(sorted(L))

The first bad idea is obviously to use sets, since duplicated will not be handled correctly.

The second bad idea is to use some list comprehension with a filter like:

[ e for e in L if e not in x ]

since the value 1 in my example will be discarded though one instance of this value should occur in the expected difference.

As far as I can see, the most efficient way of doing it would be to sort both lists then iterate on both lists (an iterator could be helpful) and carefully take duplicates into account; this would be a O(n log n) solution.

I am not looking for speed; I rather wonder if some concise pythonic syntax could do it; even O(n²) or worse could be acceptable if it could do the expected task in one or two lines.

like image 924
Thomas Baruchel Avatar asked Dec 05 '25 13:12

Thomas Baruchel


2 Answers

Seems like a good use of collections.Counter:

>>> from collections import Counter
>>> 
>>> d = Counter(L) - Counter(x)
>>> list(d.elements())
[1, 5, 5, 9, 2, 6]
like image 82
bbqwings Avatar answered Dec 08 '25 21:12

bbqwings


You want the multiset operations provided by collections.Counter:

>>> L = [3, 1, 4, 1, 5, 9, 2, 6, 5]
>>> x = [4, 1, 3]
>>> list((Counter(L) - Counter(x)).elements())
[1, 5, 5, 9, 2, 6]

This is O(n). You can also preserve order and maintain O(n) using an OrderedCounter if necessary.

from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict): 
    pass
like image 23
wim Avatar answered Dec 08 '25 20:12

wim



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!