Say I have a list [2, 3, 7, 2, 3, 8, 7, 3]
I would like to produce lists that contain identical values from the list above.
Expected Output something like:
[2, 2]
[3, 3, 3]
[7, 7]
[8]
The order that these lists are produced doesn't matter.
Best approach is an O(n) solution with collections.defaultdict:
>>> l = [2, 3, 7, 2, 3, 8, 7, 3]
>>> d = defaultdict(list)
>>> for e in l:
...     d[e].append(e)
... 
>>> d
defaultdict(<class 'list'>, {2: [2, 2], 3: [3, 3, 3], 7: [7, 7], 8: [8]})
>>> d.values()
dict_values([[2, 2], [3, 3, 3], [7, 7], [8]])
Alternatively you can use itertools.groupby with sorted list:
>>> for _, l in itertools.groupby(sorted(l)):
...     print(list(l))
... 
[2, 2]
[3, 3, 3]
[7, 7]
[8]
Or a list comprehension with collections.Counter:
>>> from collections import Counter
>>> [[i]*n for i,n in Counter(l).items()]
[[2, 2], [3, 3, 3], [7, 7], [8]]
As I post, the defaultdict solution is O(n) and faster than the other aproaches. Here are the tests:
from timeit import timeit
setup = (
"from collections import Counter, defaultdict;"
"from itertools import groupby;"
"l = [2, 3, 7, 2, 3, 8, 7, 3];"
)
defaultdict_call = (
"d = defaultdict(list); "
"\nfor e in l: d[e].append(e);"
)
groupby_call = "[list(g) for _,g in groupby(sorted(l))]"
counter_call = "[[i]*n for i,n in Counter(l).items()]"
for call in (defaultdict_call, groupby_call, counter_call):
  print(call)
  print(timeit(call, setup))
Results:
d = defaultdict(list); 
for e in l: d[e].append(e);
7.02662614302244
[list(g) for _,g in groupby(sorted(l))]
10.126392606005538
[[i]*n for i,n in Counter(l).items()]
19.55539561196929
Here is the live test
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With