Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JSON dictionaries to Pandas DataFrame in Python

I get JSON data from an API service, and I would like to use a DataFrame to then output the data into CSV.

So, I am trying to convert a list of dictionaries, with about 100.000 dictionaries with about 100 key value pairs, nested up to 4 levels deep, into a Pandas DataFrame.

I am using the following code, but it is painfully slow:

try:
    # Convert each JSON data event to a Pandas DataFrame
    df_i = []
    for d in data:
        df_i.append( json_normalize(d) )

    # Concatenate all DataFrames into a single one
    df = concat(df_i, axis=0)

except AttributeError:
    print "Error: Expected a list of dictionaries to parse JSON data"

Does anyone know of a better and faster way to do this?

like image 666
MauricioRoman Avatar asked Sep 14 '25 09:09

MauricioRoman


1 Answers

There's a whole section in the io docs on reading json (as strings or files) directly using pd.read_json.

You ought to be able to do something like:

pd.concat((pd.read_json(d) for d in data), axis=0)

This will often be much faster than creating a temporary dict.

like image 69
Andy Hayden Avatar answered Sep 15 '25 23:09

Andy Hayden