How to flatten nest Json data with json_normalize

Question

I'm trying to import JSON data to Dataframe via json_normalize but cannot get it to work.

My data:

a key is same as c1 key

[
  {
    "a": "A1",
    "b": "B1",
    "c": [
      {
        "c1": "C111",
        "c2": "C121",
        "c3": ["C1131","C1132"]
      }
    ]
  },
  {
    "a": "A2",
    "b": "B2",
    "c": [
      {
        "c1": "C211",
        "c2": "C212",
        "c3": ["C2131","C2132"]
      },
      {
        "c1": "C221",
        "c2": "C222",
        "c3": ["C2231"]
      }
    ]
  }
]

I want to make a DataFrame like

           a     c1(a)      c2                    c3
0         A1      C111    C121     ["C1131","C1132"]
1         A2      C211    C212     ["C2131","C2132"]
2         A2      C221    C222             ["C2231"]

When I use json_normalize it shows ValueError:

entity_df = json_normalize(data, 'c', 'a')

ValueError: Conflicting metadata name a, need distinguishing prefix

How should I change the json_normalize parameters? Any help will be appreciated.

kederrac · Accepted Answer

you can try:

from collections import defaultdict

norm_data = defaultdict(list)
for item in data:
    for element in item['c']:
        norm_data['a'].append(item['a'])
        for k, v in element.items():
            if k in {'a', 'c1'}:
                norm_data['c1(a)'].append(v)
            else:
                norm_data[k].append(v)

pd.DataFrame(norm_data)

enter image description here

Code Different · Answer

If you have already gone through the pain of obscuring your real data, make the mock data having the same features as the real one as well.

Assuming you have this JSON:

json_data =\
[{'a': 'A1',
  'b': 'B1',
  'c': [{'a': 'C111', 'c2': 'C121', 'c3': ['C1131', 'C1132']}]},
 {'a': 'A2',
  'b': 'B2',
  'c': [{'a': 'C211', 'c2': 'C212', 'c3': ['C2131', 'C2132']},
   {'a': 'C221', 'c2': 'C222', 'c3': ['C2231']}]}]

You only need one line of code to extract it:

pd.json_normalize(json_data, 'c', ['a', 'b'], record_prefix='data.')

Result:

  data.a data.c2         data.c3   a   b
0   C111    C121  [C1131, C1132]  A1  B1
1   C211    C212  [C2131, C2132]  A2  B2
2   C221    C222         [C2231]  A2  B2

The record_prefix='data.' is what the ValueError: Conflicting metadata name a, need distinguishing prefix error message meant.

How to flatten nest Json data with json_normalize

Tags:

python

json

pandas

iamshaojin

2 Answers

kederrac

Code Different

Recent Activity

Donate For Us

How to flatten nest Json data with json_normalize

Tags:

python

json

pandas

iamshaojin

2 Answers

kederrac

Code Different

Related questions

Recent Activity

Donate For Us