Say I have a list
l = [1, 1 , 1, 2, 3, 4, 5, 5]
and two disjoint sets of equal length
a = (1, 3) and b = (2, 5)
and I want to get the elements in l that is in a and b separately like
[1, 1, 1, 3] and [2, 5, 5]
I tried list comprehension like [x for x in l if x in a] but that takes a long time if the length of l, a, and b is 10^5
EDIT: the sets are disjoint sets of equal length.
EDIT: What I need to do is count the elements in l that is common in a (with duplicates) minus that with elements of l in b (with duplicates too). So the above example should output 1. The problem is if the list and sets are as long as 10E5. Using filter and itertools still takes too long.
EDIT: I got it now! Apparently I have to wrap the input sets with set()! I didn't at first (I only got it via input().split()) because the inputs are unique already but didn't know list and sets are very different and sets are faster! Well, TIL for me.
The fundamental problem is that you aren't using appropriate data structures for the job. Using tuples to represent sets might be ok for small sets in this case, but for large sets, you can expect to search an average of half the combined size of the sets for each element in the list that is actually in one of the sets. For each element in the list that is not in either set, we must search all elements of both sets to determine that.
So any algorithm based on these data structures
(i.e., representing sets using tuples)
will at best be O(m*n), where m is the size of the list
and n is the size of the sets.
There really isn't any way we can reduce the m component
— we have to examine each element of the list to determine which set
(if any) it belongs to.
We can, however, reduce the n component.
How? By using a more efficient data structure for our sets.
Fortunately, this is not hard, as Python includes a built-in set type.
So the first step is to construct the two sets:
a = set((1, 3))
b = set((2, 5))
Now, we can easily (and efficiently) determine if an element e is in one of the sets:
e = 1
e in a # => True
e in b # => False
Now, we just need to loop over the input list and accumulate the result:
l = [1, 1, 3, 2, 5, 7, 8, 3, 2, 1]
result = 0 # accumulator for result
for e in l:
if e in a:
result += 1
elif e in b:
result -= 1
print result # prints "2"
You can use chain and repeat functions from itertools module :
>>> from itertools import repeat,chain
>>> a={1,3}
>>> list(chain.from_iterable((repeat(i,l.count(i)) for i in a)))
[1, 1, 1, 3]
Note : As a more efficient way you can use a set container for a which has O(1) complexity for membership checking, and you don't need call the list if you don't need the result as a list, chain.from_iterable returns an iterator.
Or as a pretty much optimized approach you can use numpy which is specially strong when you are dealing with huge lists:
>>> import numpy as np
>>> l = np.array([1, 1 , 1, 2, 3, 4, 5, 5])
>>> a = (1, 3)
>>> l[np.in1d(l,a)]
array([1, 1, 1, 3])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With