Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I save multi-indexed pandas dataframes to parquet?

Tags:

pandas

parquet

How do I save the dataframe shown at the end to parquet?
It was constructed this way:

df_test = pd.DataFrame(np.random.rand(6,4))
df_test.columns = pd.MultiIndex.from_arrays([('A', 'A', 'B', 'B'), 
      ('c1', 'c2', 'c3', 'c4')], names=['lev_0', 'lev_1'])
df_test.to_parquet("c:/users/some_folder/test.parquet")

The last line of that code returns:

ValueError: parquet must have string column names

Should I assume I can't save a dataframe with column headers created by multi-indexes (of strings)? Thanks.

--The dataframe looks like this:

lev_0         A                   B          
lev_1        c1        c2        c3        c4
0      0.713922  0.551404  0.289861  0.178739
1      0.693925  0.425073  0.660924  0.695474
2      0.280258  0.827231  0.282844  0.523069
3      0.424731  0.380963  0.462356  0.491140
4      0.786677  0.102935  0.382453  0.199056
5      0.783115  0.295409  0.236880  0.388399
like image 244
techvslife Avatar asked Dec 14 '25 18:12

techvslife


1 Answers

pyarrow can write pandas multi-index to parquet files.

import pandas as pd
import numpy as np
import pyarrow.parquet as pq
import pyarrow as pa

df_test = pd.DataFrame(np.random.rand(6,4))
df_test.columns = pd.MultiIndex.from_arrays([('A', 'A', 'B', 'B'), 
      ('c1', 'c2', 'c3', 'c4')], names=['lev_0', 'lev_1'])
table = pa.Table.from_pandas(df_test)
pq.write_table(table, 'test.parquet')

df_test_read = pd.read_parquet('test.parquet')
like image 177
cheekybastard Avatar answered Dec 16 '25 16:12

cheekybastard



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!