Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to flatten nest Json data with json_normalize

I'm trying to import JSON data to Dataframe via json_normalize but cannot get it to work.

My data:

a key is same as c1 key

[
  {
    "a": "A1",
    "b": "B1",
    "c": [
      {
        "c1": "C111",
        "c2": "C121",
        "c3": ["C1131","C1132"]
      }
    ]
  },
  {
    "a": "A2",
    "b": "B2",
    "c": [
      {
        "c1": "C211",
        "c2": "C212",
        "c3": ["C2131","C2132"]
      },
      {
        "c1": "C221",
        "c2": "C222",
        "c3": ["C2231"]
      }
    ]
  }
]

I want to make a DataFrame like

           a     c1(a)      c2                    c3
0         A1      C111    C121     ["C1131","C1132"]
1         A2      C211    C212     ["C2131","C2132"]
2         A2      C221    C222             ["C2231"]

When I use json_normalize it shows ValueError:

entity_df = json_normalize(data, 'c', 'a')

ValueError: Conflicting metadata name a, need distinguishing prefix 

How should I change the json_normalize parameters? Any help will be appreciated.

like image 688
iamshaojin Avatar asked Jan 29 '26 22:01

iamshaojin


2 Answers

you can try:

from collections import defaultdict

norm_data = defaultdict(list)
for item in data:
    for element in item['c']:
        norm_data['a'].append(item['a'])
        for k, v in element.items():
            if k in {'a', 'c1'}:
                norm_data['c1(a)'].append(v)
            else:
                norm_data[k].append(v)

pd.DataFrame(norm_data)

enter image description here

like image 105
kederrac Avatar answered Feb 01 '26 12:02

kederrac


If you have already gone through the pain of obscuring your real data, make the mock data having the same features as the real one as well.

Assuming you have this JSON:

json_data =\
[{'a': 'A1',
  'b': 'B1',
  'c': [{'a': 'C111', 'c2': 'C121', 'c3': ['C1131', 'C1132']}]},
 {'a': 'A2',
  'b': 'B2',
  'c': [{'a': 'C211', 'c2': 'C212', 'c3': ['C2131', 'C2132']},
   {'a': 'C221', 'c2': 'C222', 'c3': ['C2231']}]}]

You only need one line of code to extract it:

pd.json_normalize(json_data, 'c', ['a', 'b'], record_prefix='data.')

Result:

  data.a data.c2         data.c3   a   b
0   C111    C121  [C1131, C1132]  A1  B1
1   C211    C212  [C2131, C2132]  A2  B2
2   C221    C222         [C2231]  A2  B2

The record_prefix='data.' is what the ValueError: Conflicting metadata name a, need distinguishing prefix error message meant.

like image 36
Code Different Avatar answered Feb 01 '26 11:02

Code Different



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!