Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert Excel file with multiple header and sub header to Nested Json as show

enter image description here

i have tried to convert the excel file data as shown in the picture from excel to Json format but not able to properly convert to the structure that i required, the structure I required is as below

{
  "ID": "000 001 234",
  "Label": "XYZ",
  "Category 1": { "Sub1": "129.75","Sub2" : "0.00","Sub3":"0.00" }
  "Order Amount": "234.00",
  "Penalty": "111.00",
  "Fees": "3,456.00",
  "Category2":  { "Sub21": "0.00","Sub22" : "0.00","Sub23": "0.00" }
  "Invoice": "11.00"
},

{
  "ID": "000 001 235",
  "Label": "XYZ",
  "Category 1": { "Sub1": "1.75","Sub2" : "0.00","Sub3":"0.00" }
  "Order Amount": "111.00",
  "Penalty": "0.00",
  "Fees": "2,343.00",
  "Category2":  { "Sub21": "0.00","Sub22" : "0.00","Sub23": "0.00" }
  "Invoice": "2.00"
},

code i tried is as below

from openpyxl import load_workbook
from json import dumps

# Load Excel workbook
wb = load_workbook("sample.xlsx")

# Choose a specific sheet
sheet = wb["Sheet1"]

# Find the number of rows and columns in the sheet
rows = sheet.max_row
columns = sheet.max_column
# List to store all rows as dictionaries
lst = []

# Iterate over rows and columns to extract data
for i in range(1, rows):
    row = {}
    for j in range(1, columns):
        column_name = sheet.cell(row=1, column=j)
        row_data = sheet.cell(row=i+1, column=j)

        row.update(
            {
                column_name.value: row_data.value
            }
        )
    lst.append(row)

# Convert extracted data into JSON format
json_data = dumps(lst)

# Print the JSON data
print(json_data)

Output I am getting is as below

[{
  "ID": null,
  "Label": null,
  "Category 1": "Sub3",
  "Order Amount": null,
  "Penalty": null,
  "Fees": null,
  "Category2": "Sub23"
}, 
{"ID": 1234, "Label": "XYZ", "Category 1": 0, "Order Amount": 234, "Penalty": 111, "Fees": 3456, "Category2": 0}, 
{"ID": 1235, "Label": "XYZ", "Category 1": 0, "Order Amount": 111, "Penalty": 0, "Fees": 2343, "Category2": 0}]

I am not able to get the nested json in the proper format that i actually require, Any help would be appreciated.

like image 308
JagaSrik Avatar asked Nov 15 '25 15:11

JagaSrik


1 Answers

You can handle multi-row headers easily with pandas, which automatically detects hierarchical columns.
To get exactly your desired nested JSON structure you can do this:

import pandas as pd
import json

# Read Excel file with two header rows
df = pd.read_excel("sample.xlsx", header=[0, 1])

records = []
for _, row in df.iterrows():
    record = {}
    for (main, sub), value in row.items():
        # If the subheader is empty or "Unnamed", treat it as a top-level key
        if pd.isna(sub) or str(sub).startswith("Unnamed"):
            record[main] = str(value)
        else:
            record.setdefault(main, {})[sub] = str(value)
    records.append(record)

# Convert to JSON
print(json.dumps(records, indent=2, ensure_ascii=False))

Output:

[
  {
    "ID": "000 001 234",
    "Label": "XYZ",
    "Category 1": {"Sub1": "129.75", "Sub2": "0.00", "Sub3": "0.00"},
    "Order Amount": "234.00",
    "Penalty": "111.00",
    "Fees": "3,456.00",
    "Category2": {"Sub21": "0.00", "Sub22": "0.00", "Sub23": "0.00"},
    "Invoice": "11.00"
  },
  {
    "ID": "000 001 235",
    "Label": "XYZ",
    "Category 1": {"Sub1": "1.75", "Sub2": "0.00", "Sub3": "0.00"},
    "Order Amount": "111.00",
    "Penalty": "0.00",
    "Fees": "2,343.00",
    "Category2": {"Sub21": "0.00", "Sub22": "0.00", "Sub23": "0.00"},
    "Invoice": "2.00"
  }
]

Explanation:

  • header=[0,1] reads both header rows as a hierarchical index.

  • The code checks if the subheader is "Unnamed" (which Excel automatically creates when there’s no actual subheader) and places that value directly at the top level.
    Without this check, you’ll get unwanted field names like:

    "ID": {
      "Unnamed: 0_level_1": "000 001 235"
    },
    "Label": {
      "Unnamed: 1_level_1": "XYZ"
    }
    
  • All other columns with real subheaders become nested dictionaries.

This produces exactly the nested JSON format you want.

like image 191
charly_0x13 Avatar answered Nov 17 '25 05:11

charly_0x13



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!