I get JSON data from an API service, and I would like to use a DataFrame to then output the data into CSV.
So, I am trying to convert a list of dictionaries, with about 100.000 dictionaries with about 100 key value pairs, nested up to 4 levels deep, into a Pandas DataFrame.
I am using the following code, but it is painfully slow:
try:
# Convert each JSON data event to a Pandas DataFrame
df_i = []
for d in data:
df_i.append( json_normalize(d) )
# Concatenate all DataFrames into a single one
df = concat(df_i, axis=0)
except AttributeError:
print "Error: Expected a list of dictionaries to parse JSON data"
Does anyone know of a better and faster way to do this?
There's a whole section in the io docs on reading json (as strings or files) directly using pd.read_json
.
You ought to be able to do something like:
pd.concat((pd.read_json(d) for d in data), axis=0)
This will often be much faster than creating a temporary dict.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With