I have a DataFrame in pandas like this:
a b c
A 1 2 3
B 4 5 6
C 7 8 9
I want to extract the pairs of index name and column name whose cell has a value bigger than 6.
In other words, I want to obtain
[["B","c"], ["C","a"], ["C","b"], ["C","c"]]
Is there any smart way to do this?
You could use stack to flatten the frame to a series, use boolean indexing to select the terms you want, and finally turn the resulting index into a list:
s = df.stack()
ii = s[s >= 6].index.tolist()
For example:
>>> s = df.stack()
>>> s
A a 1
b 2
c 3
B a 4
b 5
c 6
C a 7
b 8
c 9
dtype: int64
>>> s[s >= 6]
B c 6
C a 7
b 8
c 9
dtype: int64
>>> s[s >= 6].index
MultiIndex(levels=[[u'A', u'B', u'C'], [u'a', u'b', u'c']],
labels=[[1, 2, 2, 2], [2, 0, 1, 2]])
>>> s[s >= 6].index.tolist()
[('B', 'c'), ('C', 'a'), ('C', 'b'), ('C', 'c')]
Note that (1) I'm using >= 6 because that matches your example, and (2) this is strictly a list of tuples not a list of lists like you asked for, but you can convert if you really want to.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With