Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find cells with data and use as index in dataframe

Tags:

python

pandas

I'm reading an excel file, but for this question purposes I will provide an example of what my dataframe looks like. I have a dataframe like so:

df = pd.DataFrame([
        ['Texas 1', '111', '222', '333'],
        ['Texas 1', '444', '555', '666'],
        ['Texas 2', '777','888','999']
    ])
df[2] = df[2].replace('222', '')


          0    1    2    3
a   Texas 1  111       333
b   Texas 1  444  555  666
c   Texas 2  777  888  999

And I want to be able to define a multiindex based on the values of the first row that are not blank. So something like this:

      0     1    3
Texas 1   111  333 444  555  666
Texas 2   111  333 777  888  999

The problem is that the values in row a will not always be in the same column, so I need a way to find which columns have a value in the first row and use that column number as an index. So far, I read my excel file like so:

df1 = pd.read_excel('excel.XLS', index_col=[1,11,24,37])

And I've been looking for a way to read the cells that are not NaN and are in row a and find their column number to store in a list and use that as for my index_col=(). But I can't figure out how. Any pointers in the right direction would be awesome!

like image 869
rubito Avatar asked Jan 25 '26 10:01

rubito


1 Answers

first of all, you say "where is not NaN" but you replace with ''.
I'll replace '' with np.nan then dropna

df.iloc[0].replace('', np.nan).dropna().index

Int64Index([0, 1, 3], dtype='int64')

df[df.iloc[0].replace('', np.nan).dropna().index]

enter image description here

like image 130
piRSquared Avatar answered Jan 28 '26 00:01

piRSquared



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!