I am trying to create a parquet using pandas dataframe, and even though I delete the index of the file, it is still appearing when I am re-reading the parquet file. Can anyone help me with this? I want index.name to be set as None.
>>> df = pd.DataFrame({'key': 1}, index=[0])
>>> df
  key
0    1
>>> df.to_parquet('test.parquet')
>>> df = pd.read_parquet('test.parquet')
>>> df
     key
index     
0        1
>>> del df.index.name
>>> df
     key
0    1
>>> df.to_parquet('test.parquet')
>>> df = pd.read_parquet('test.parquet')
>>> df
     key
index     
0        1
It works as expected using pyarrow:
>>> df = pd.DataFrame({'key': 1}, index=[0])
>>> df.to_parquet('test.parquet', engine='fastparquet')
>>> df = pd.read_parquet('test.parquet')
>>> del df.index.name
>>> df
   key
0    1
>>> df.to_parquet('test.parquet', engine='fastparquet')
>>> df = pd.read_parquet('test.parquet')
>>> df
       key
index     
0        1 ---> INDEX NAME APPEARS EVEN AFTER DELETING USING fastparquet
>>> del df.index.name
>>> df.to_parquet('test.parquet', engine='pyarrow')
>>> df = pd.read_parquet('test.parquet')
>>> df
   key
0    1 --> INDEX NAME IS NONE WHEN CONVERSION IS DONE WITH pyarrow
Hey this works with pyarrow with the following
df = pd.DataFrame({'key': 1}, index=[0])
df.to_parquet('test.parquet', engine='pyarrow', index=False)
df = pd.read_parquet('test.parquet', engine='pyarrow')
df.head()
As @alexopoulos7 mentioned in the to_parquet documentation it states you can use the "index" argument as a parameter. It seems to work, perhaps because I'm explicitly stating the engine='pyarrow'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With