I'm reading an excel file, but for this question purposes I will provide an example of what my dataframe looks like.
I have a dataframe like so:
df = pd.DataFrame([
['Texas 1', '111', '222', '333'],
['Texas 1', '444', '555', '666'],
['Texas 2', '777','888','999']
])
df[2] = df[2].replace('222', '')
0 1 2 3
a Texas 1 111 333
b Texas 1 444 555 666
c Texas 2 777 888 999
And I want to be able to define a multiindex based on the values of the first row that are not blank. So something like this:
0 1 3
Texas 1 111 333 444 555 666
Texas 2 111 333 777 888 999
The problem is that the values in row a will not always be in the same column, so I need a way to find which columns have a value in the first row and use that column number as an index. So far, I read my excel file like so:
df1 = pd.read_excel('excel.XLS', index_col=[1,11,24,37])
And I've been looking for a way to read the cells that are not NaN and are in row a and find their column number to store in a list and use that as for my index_col=(). But I can't figure out how. Any pointers in the right direction would be awesome!
first of all, you say "where is not NaN" but you replace with ''.
I'll replace '' with np.nan then dropna
df.iloc[0].replace('', np.nan).dropna().index
Int64Index([0, 1, 3], dtype='int64')
df[df.iloc[0].replace('', np.nan).dropna().index]

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With