normalize a column in pandas dataframe

Question

I am able to import data from json file using this code...

import requests
from pandas.io.json import json_normalize

url = "https://datameetgeobk.s3.amazonaws.com/image_list.json"

resp = requests.get(url=url)
df = json_normalize(resp.json()['Images'])
df.head()

But the column "BlockDeviceMappings" is actually a list and each item has DeviceName and Ebs parameters those are string and dicts. How do I further normalize my dataframe to include all the details in separate columns?

My screenshot does not match with the one shown in the answer. The Ebs column (second from left) is a dictionary.

enter image description here

Trenton McKinney · Accepted Answer

import requests
import pandas as pd

url = "https://datameetgeobk.s3.amazonaws.com/image_list.json"

resp = requests.get(url=url)
resp = resp.json()

What you have so far:

df = pd.json_normalize(resp['Images'])

BlockDeviceMappings cast to all columns

inner_keys = [x for x in resp['Images'][0].keys() if x != 'BlockDeviceMappings']

df_bdm = pd.json_normalize(resp['Images'], record_path=['BlockDeviceMappings'], meta=inner_keys, errors='ignore')

Separate `bdm_df`:

bdm_df = pd.json_normalize(resp['Images'], record_path=['BlockDeviceMappings'])

You will no doubt wonder why df has 39995 entries, while bdm_df has 131691 entries. This is because BlockDeviceMappings is a list of dicts of varying lengths:

bdm_len = [len(x) for x in df.BlockDeviceMappings]
max(bdm_len)
>>> 31

Sample `BlockDeviceMappings` entry:

[{'DeviceName': '/dev/sda1',
  'Ebs': {'DeleteOnTermination': True,
   'SnapshotId': 'snap-0aac2591b85fe677e',
   'VolumeSize': 80,
   'VolumeType': 'gp2',
   'Encrypted': False}},
 {'DeviceName': 'xvdb',
  'Ebs': {'DeleteOnTermination': True,
   'SnapshotId': 'snap-0bd8d7828225924a7',
   'VolumeSize': 80,
   'VolumeType': 'gp2',
   'Encrypted': False}}]

`df_bdm.head()`

enter image description here

normalize a column in pandas dataframe

Tags:

python

pandas

json-normalize

shantanuo

1 Answers

What you have so far:

BlockDeviceMappings cast to all columns

Separate `bdm_df`:

Sample `BlockDeviceMappings` entry:

`df_bdm.head()`

Trenton McKinney

Recent Activity

Donate For Us

normalize a column in pandas dataframe

Tags:

python

pandas

json-normalize

shantanuo

1 Answers

What you have so far:

BlockDeviceMappings cast to all columns

Separate bdm_df:

Sample BlockDeviceMappings entry:

df_bdm.head()

Trenton McKinney

Related questions

Recent Activity

Donate For Us

Separate `bdm_df`:

Sample `BlockDeviceMappings` entry:

`df_bdm.head()`