Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing a Multi-Index Excel File in Pandas

I have a time series excel file with a tri-level column MultiIndex that I would like to successfully parse if possible. There are some results on how to do this for an index on stack overflow but not the columns and the parse function has a header that does not seem to take a list of rows.

The ExcelFile looks like is like the following:

  • Column A is all the time series dates starting on A4
  • Column B has top_level1 (B1) mid_level1 (B2) low_level1 (B3) data (B4-B100+)
  • Column C has null (C1) null (C2) low_level2 (C3) data (C4-C100+)
  • Column D has null (D1) mid_level2 (D2) low_level1 (D3) data (D4-D100+)
  • Column E has null (E1) null (E2) low_level2 (E3) data (E4-E100+)
  • ...

So there are two low_level values many mid_level values and a few top_level values but the trick is the top and mid level values are null and are assumed to be the values to the left. So, for instance all the columns above would have top_level1 as the top multi-index value.

My best idea so far is to use transpose, but the it fills Unnamed: # everywhere and doesn't seem to work. In Pandas 0.13 read_csv seems to have a header parameter that can take a list, but this doesn't seem to work with parse.

like image 963
rhaskett Avatar asked Nov 21 '25 03:11

rhaskett


1 Answers

You can fillna the null values. I don't have your file, but you can test

#Headers as rows for now
df = pd.read_excel(xls_file,0, header=None, index_col=0) 

#fill in Null values in "Headers"
df = df.fillna(method='ffill', axis=1) 

#create multiindex column names
df.columns=pd.MultiIndex.from_arrays(df[:3].values, names=['top','mid','low']) 

#Just name of index
df.index.name='Date' 

#remove 3 rows which are already used as column names
df = df[pd.notnull(df.index)] 
like image 70
Happy001 Avatar answered Nov 22 '25 17:11

Happy001



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!