I have a huge list of tuples in this format. The second field of the each tuple is the category field.
[(1, 'A', 'foo'),
(2, 'A', 'bar'),
(100, 'A', 'foo-bar'),
('xx', 'B', 'foobar'),
('yy', 'B', 'foo'),
(1000, 'C', 'py'),
(200, 'C', 'foo'),
..]
What is the most efficient way to break it down into sub-lists of the same category ( A, B, C .,etc)?
To split a tuple, just list the variable names separated by commas on the left-hand side of an equals sign, and then a tuple on the right-hand side.
Allows duplicate members. Use square brackets [] for lists. Tuple is a collection which is ordered and unchangeable. Allows duplicate members.
Method #1: Using islice to split a list into sublists of given length, is the most elegant way. # into sublists of given length. Method #2: Using zip is another way to split a list into sublists of given length.
To convert the Python list to a tuple, use the tuple() function. The tuple() is a built-in function that passes the list as an argument and returns the tuple. The list elements will not change when it converts into a tuple.
Use itertools.groupby:
import itertools
import operator
data=[(1, 'A', 'foo'),
(2, 'A', 'bar'),
(100, 'A', 'foo-bar'),
('xx', 'B', 'foobar'),
('yy', 'B', 'foo'),
(1000, 'C', 'py'),
(200, 'C', 'foo'),
]
for key,group in itertools.groupby(data,operator.itemgetter(1)):
print(list(group))
yields
[(1, 'A', 'foo'), (2, 'A', 'bar'), (100, 'A', 'foo-bar')]
[('xx', 'B', 'foobar'), ('yy', 'B', 'foo')]
[(1000, 'C', 'py'), (200, 'C', 'foo')]
Or, to create one list with each group as a sublist, you could use a list comprehension:
[list(group) for key,group in itertools.groupby(data,operator.itemgetter(1))]
The second argument to itertools.groupby is a function which itertools.groupby applies to each item in data (the first argument). It is expected to return a key. itertools.groupby then groups together all contiguous items with the same key.
operator.itemgetter(1) picks off the second item in a sequence.
For example, if
row=(1, 'A', 'foo')
then
operator.itemgetter(1)(row)
equals 'A'.
As @eryksun points out in the comments, if the categories of the tuples appear in some random order, then you must sort data first before applying itertools.groupby. This is because itertools.groupy only collects contiguous items with the same key into groups.
To sort the tuples by category, use:
data2=sorted(data,key=operator.itemgetter(1))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With