I have a list of tuples and each tuple contains three values. I want to 'roll them up' or group them so that for all tuples where the first two values are the same it will return a list of lists where each component list contains: 1: the first value, 2: the second value, 3: a list of all the 3rd values that match the first two.
Because I am writing the whole script here I have some flexibility on data types so if I am approaching it in a completely wrong manner please let me know. I did wonder if there was an easier way to accomplish it using Pandas.
I am wondering if using itertools.groupby() it may be possible to accomplish this. I think it would probably need to be combined with operator.itemgetter() to access the correct parts of the various tuples.
import itertools
import operator
list = [(1, 1, 4), (1, 1, 9), (1, 1, 14), (2, 1, 12), (2, 1, 99), (2, 6, 14), (2, 6, 19)]
list=sorted(list)
def sorter(list):
grouper = itertools.groupby(list, operator.itemgetter(0))
for key, subiter in grouper:
l = []
grouper2 = itertools.groupby(subiter, operator.itemgetter(0))
for key, subiter in grouper2:
l.append(subiter)
yield key, l
This code represents the general direction I was thinking, but it will not yield the desired output. The desired output for this would be:
[[1, 1, [4, 9, 14]], [2, 1, [12, 99]], [2, 6, [14, 19]]]
Again I have significant flexibility in terms of the datatypes here so if I am approaching this wrong I am willing to try something completely different.
No need to use two nested groupby grouping by a single field. Instead use itemgetter with two parameters or a lambda to group by both the first two values at once, then a list comprehension to get the final elements.
>>> from itertools import groupby
>>> from operator import itemgetter
>>> lst = [(1, 1, 4), (1, 1, 9), (1, 1, 14), (2, 1, 12), (2, 1, 99), (2, 6, 14), (2, 6, 19)]
>>> [(*k, [x[2] for x in g]) for k, g in groupby(lst, key=itemgetter(0, 1))]
[(1, 1, [4, 9, 14]), (2, 1, [12, 99]), (2, 6, [14, 19])]
If, for whatever reason, you want to use two separate groupby, you can use this:
>>> [(k1, k2, [x[2] for x in g2]) for k1, g1 in groupby(lst, key=itemgetter(0))
... for k2, g2 in groupby(g1, key=itemgetter(1))]
[(1, 1, [4, 9, 14]), (2, 1, [12, 99]), (2, 6, [14, 19])]
Of course, this also works as a regular (nested) loop, more in line with your original code:
def sorter(lst):
for k1, g1 in groupby(lst, key=itemgetter(0)):
for k2, g2 in groupby(g1, key=itemgetter(1)):
yield (k1, k2, [x[2] for x in g2])
Or with the single groupby, returning a generator object:
def sorter(lst):
return ((*k, [x[2] for x in g]) for k, g in groupby(lst, key=itemgetter(0, 1)))
As always, this assumes that lst is already sorted by the same key. If it is not, sort it first.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With