I have a huge list of tuples in this format. The second field of the each tuple is the category field. <pre class="prettyprint"><code> [(1, 'A', 'foo'), (2, 'A', 'bar'), (100, 'A', 'foo-bar'), ('xx', 'B', 'foobar'), ('yy', 'B', 'foo'), (1000, 'C', 'py'), (200, 'C', 'foo'), ..] </code></pre> What is the most efficient way to break it down into sub-lists of the same category ( A, B, C .,etc)?

Use itertools.groupby: <pre class="prettyprint"><code>import itertools import operator data=[(1, 'A', 'foo'), (2, 'A', 'bar'), (100, 'A', 'foo-bar'), ('xx', 'B', 'foobar'), ('yy', 'B', 'foo'), (1000, 'C', 'py'), (200, 'C', 'foo'), ] for key,group in itertools.groupby(data,operator.itemgetter(1)): print(list(group)) </code></pre> yields <pre class="prettyprint"><code>[(1, 'A', 'foo'), (2, 'A', 'bar'), (100, 'A', 'foo-bar')] [('xx', 'B', 'foobar'), ('yy', 'B', 'foo')] [(1000, 'C', 'py'), (200, 'C', 'foo')] </code></pre> Or, to create one list with each group as a sublist, you could use a list comprehension: <pre class="prettyprint"><code>[list(group) for key,group in itertools.groupby(data,operator.itemgetter(1))] </code></pre> <hr> The second argument to <code>itertools.groupby</code> is a function which <code>itertools.groupby</code> applies to each item in <code>data</code> (the first argument). It is expected to return a <code>key</code>. <code>itertools.groupby</code> then groups together all contiguous items with the same <code>key</code>. operator.itemgetter(1) picks off the second item in a sequence. For example, if <pre class="prettyprint"><code>row=(1, 'A', 'foo') </code></pre> then <pre class="prettyprint"><code>operator.itemgetter(1)(row) </code></pre> equals <code>'A'</code>. <hr> As @eryksun points out in the comments, if the categories of the tuples appear in some random order, then you must sort <code>data</code> first before applying <code>itertools.groupby</code>. This is because <code>itertools.groupy</code> only collects contiguous items with the same key into groups. To sort the tuples by category, use: <pre class="prettyprint"><code>data2=sorted(data,key=operator.itemgetter(1)) </code></pre>

Split a list of tuples into sub-lists of the same tuple field [duplicate]

Tags:

I have a huge list of tuples in this format. The second field of the each tuple is the category field.

    [(1, 'A', 'foo'),
    (2, 'A', 'bar'),
    (100, 'A', 'foo-bar'),

    ('xx', 'B', 'foobar'),
    ('yy', 'B', 'foo'),

    (1000, 'C', 'py'),
    (200, 'C', 'foo'),
    ..]

What is the most efficient way to break it down into sub-lists of the same category ( A, B, C .,etc)?

381

asked Nov 11 '11 10:11

Kaung Htet

1 Answers

Use itertools.groupby:

import itertools
import operator

data=[(1, 'A', 'foo'),
    (2, 'A', 'bar'),
    (100, 'A', 'foo-bar'),

    ('xx', 'B', 'foobar'),
    ('yy', 'B', 'foo'),

    (1000, 'C', 'py'),
    (200, 'C', 'foo'),
    ]

for key,group in itertools.groupby(data,operator.itemgetter(1)):
    print(list(group))

yields

[(1, 'A', 'foo'), (2, 'A', 'bar'), (100, 'A', 'foo-bar')]
[('xx', 'B', 'foobar'), ('yy', 'B', 'foo')]
[(1000, 'C', 'py'), (200, 'C', 'foo')]

Or, to create one list with each group as a sublist, you could use a list comprehension:

[list(group) for key,group in itertools.groupby(data,operator.itemgetter(1))]

The second argument to itertools.groupby is a function which itertools.groupby applies to each item in data (the first argument). It is expected to return a key. itertools.groupby then groups together all contiguous items with the same key.

operator.itemgetter(1) picks off the second item in a sequence.

For example, if

row=(1, 'A', 'foo')

then

operator.itemgetter(1)(row)

equals 'A'.

As @eryksun points out in the comments, if the categories of the tuples appear in some random order, then you must sort data first before applying itertools.groupby. This is because itertools.groupy only collects contiguous items with the same key into groups.

To sort the tuples by category, use:

data2=sorted(data,key=operator.itemgetter(1))

126

answered Oct 06 '22 01:10

unutbu

Related questions
                            
                                Ignore whitespace with PEG.js
                            
                                Cannot use ruby-debug19 with 1.9.3-p0? [duplicate]
                            
                                why osgi is used?
                            
                                How do you code an R function so that it 'knows' to look in 'data' for the variables in other arguments?
                            
                                Difference between fs.writeFile and fs.writeStream
                            
                                Android SDK source code
                            
                                How to calculate an exponential moving average on postgres?
                            
                                C# anonymously implement interface (or abstract class) [duplicate]
                            
                                Android Add image to text (in text View)?
                            
                                Java DatagramPacket (UDP) maximum send/recv buffer size
                            
                                Accessing resources in an android test project
                            
                                Debugging when using require.js cache

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With