Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting listed column names of all not nan rows

Tags:

python

pandas

I've pandas dataframe based on pivot table with index and columns. Index are presented with values that are not nan at least in one column, while others are nans.

          col_1  col_2  col_3  col_4 ...  col_100
index_1     1      2      nan   nan  ...     5 
index_2    nan    nan      1     1   ...     10
...        ...    ...     ...   ...  ...     ...
index_100  nan     9       4    ...  ...     nan

How can I get column names of all the not nan values in a row and put them into automatically suffixed list names by each index? Need to get this:

list_1=[col_1, col_2, col_100]
list_2=[col_3, col_4, col_100]
list_100=[col_2, col_3]
like image 896
Alexey7 Avatar asked Sep 17 '25 00:09

Alexey7


2 Answers

You can use stack to remove nan and groupby to gather all column names:

(df.stack()
   .reset_index(level=1)
   .groupby(level=0, sort=False)
   ['level_1'].apply(list)
)

Output:

index_1      [col_1, col_2, col_100]
index_2      [col_3, col_4, col_100]
index_100             [col_2, col_3]
Name: level_1, dtype: object
like image 79
Quang Hoang Avatar answered Sep 18 '25 18:09

Quang Hoang


Generate sample data

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(100,100), columns=['col_' + str(i) for i in 
    range(1,101)], index=['index_' + str(i) for i in range(1, 101)])
for i in range(len(df)):
    df.iloc[i, np.random.randint(0, 100, size=90).astype(int)] = np.nan

df
Out[45]: 
              col_1     col_2     col_3  ...    col_98    col_99   col_100
index_1         NaN       NaN       NaN  ...  0.520094       NaN  0.914679
index_2         NaN       NaN       NaN  ...       NaN  0.606264  0.615714
index_3         NaN       NaN       NaN  ...  0.350248       NaN       NaN
index_4    0.018335  0.296533  0.128359  ...       NaN  0.463084       NaN
index_5    0.164180  0.321482  0.423314  ...  0.909450       NaN  0.968680
            ...       ...       ...  ...       ...       ...       ...
index_96   0.444807  0.010812       NaN  ...  0.763669  0.074926       NaN
index_97        NaN       NaN       NaN  ...       NaN       NaN       NaN
index_98        NaN  0.285217       NaN  ...  0.913199  0.699628       NaN
index_99   0.800824  0.004250       NaN  ...       NaN       NaN  0.302858
index_100       NaN       NaN       NaN  ...  0.875435  0.700814       NaN

[100 rows x 100 columns]

compute result

result = {}
for i in range(len(df)):
    l = df.columns[np.isfinite(df.iloc[i])]
    result['list_' + str(i)] = list(l)

get list_1

result['list_1']
Out[47]: 
['col_4',
 'col_6',
 'col_9',
 'col_10',
 'col_14',
 'col_15',
 'col_18',
 'col_19',
 'col_26',
 'col_27',
 'col_33',
 'col_34',
 'col_35',
 'col_37',
 'col_38',
 'col_39',
 'col_40',
 'col_42',
 'col_43',
 'col_50',
 'col_57',
 'col_59',
 'col_60',
 'col_63',
 'col_64',
 'col_65',
 'col_66',
 'col_67',
 'col_69',
 'col_71',
 'col_76',
 'col_80',
 'col_81',
 'col_85',
 'col_87',
 'col_88',
 'col_89',
 'col_92',
 'col_96',
 'col_97',
 'col_99',
 'col_100']
like image 24
Yeghishe Kerobyan Avatar answered Sep 18 '25 19:09

Yeghishe Kerobyan