Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle empty dictionary while writing table with pyarrow

I'm creating parquet file from python list of dictionary with pandas and pyarrow. But getting following error for empty nasted dictionary.

Cannot write struct type 'subject' with no child field to Parquet. Consider adding a dummy child field

code below.

import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

data =[
      {
       "name":"david",
       "subject":{}
     }
    ]

df = pd.DataFrame.from_records(data)
table = pa.Table.from_pandas(df)
pq.write_table(table, 'file1.parquet')
like image 315
Akhilendra Avatar asked Nov 25 '25 07:11

Akhilendra


1 Answers

Arrow is unable to guess the type of "subject" with the data you gave it (because it's empty). "subject" could be either:

  • a dictionary
  • or a struct.

In order to clear this ambiguity you need to provide an explicit schema to Table.from_pandas function:


schema = pa.schema([
    pa.field("name", pa.string()),
    pa.field("subject", pa.map_(pa.string(), pa.string())),
])

table = pa.Table.from_pandas(df, schema=schema)

But even with the schema it doesn't work becuase arrow expects the dictionary data to be represented as a list of tuples (instead of a dict):

data =[
    { "name":"david","subject": []},
    { "name":"john","subject": [("key1", "value1"), ("key2", "value2")]},
]

schema = pa.schema([
    pa.field("name", pa.string()),
    pa.field("subject", pa.map_(pa.string(), pa.string())),
])
table = pa.Table.from_pandas(df, schema=schema)
like image 78
0x26res Avatar answered Nov 27 '25 00:11

0x26res



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!