Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Multinested Dictonary To Pandas Dataframe Boolean Table

I am looking for any advice on how to cleanly convert a python multi level nested dictionary (from JSON) into a data frame boolean table.

Rules:

  • Only True is recorded. If empty it is False.
  • The list may be of N Length
  • The groups may be of N length
  • The bools may be of N types

Example Input:

{1:{'group_a':{'bool_a':True,
               'bool_b':True,
               'bool_n':True},
    'group_n':{'bool_b':True,
               'bool_n':True}
   },
 2:{'group_a':{'bool_a':True,
               'bool_b':True,
               'bool_n':True},
    'group_n':{'bool_b':True,
               'bool_n':True}
   },
'n':{'group_a':{'bool_a':True,
                'bool_c':True},
     'group_n':{'bool_b':True}
   },
}

Desired Output:

   Ga_Ba, Ga_Bb, Ga_Bc, Ga_Bn, Gn_Ba, Gn_Bb, ... Gn_Bn....
1   True   True  False  True   False   True       True
2   True   True  False  True   False   True       True
n   True   False True   False  False   False      False
...

Ideas? Bonus points for speed and conciseness. I have a solution but I am looking for something more elegant than the for loop mess I have now. Alternative data structures may also be welcome.

like image 496
aris-t Avatar asked Sep 15 '25 16:09

aris-t


2 Answers

Goofy method #1

s = pd.DataFrame.from_dict(data, orient='index').stack()

pd.json_normalize(s).set_index(s.index) \
  .stack().unstack([1, 2], fill_value=False) \
  .sort_index(axis=1)

  group_a                      group_n       
   bool_a bool_b bool_c bool_n  bool_b bool_n
1    True   True  False   True    True   True
2    True   True  False   True    True   True
3    True  False   True  False    True  False

Straight from comprehension

pd.DataFrame.from_dict({
    k0: {
            f'G{k1.split("_")[1]}_B{k2.split("_")[1]}': val
            for k1, d1 in d0.items()
            for k2, val in d1.items()
        }
    for k0, d0 in data.items()
}, orient='index').fillna(False)

   Ga_Ba  Ga_Bb  Ga_Bn  Gn_Bb  Gn_Bn  Ga_Bc
1   True   True   True   True   True  False
2   True   True   True   True   True  False
3   True  False  False   True  False   True
like image 116
piRSquared Avatar answered Sep 18 '25 10:09

piRSquared


You could use a dictionary comprehension and concat:

import pandas as pd

values = {
    "1": {
        "group_a": {"bool_a": True, "bool_b": True, "bool_n": True},
        "group_n": {"bool_b": True, "bool_n": True},
    },
    "2": {
        "group_a": {"bool_a": True, "bool_b": True, "bool_n": True},
        "group_n": {"bool_b": True, "bool_n": True},
    },
    "n": {"group_a": {"bool_a": True, "bool_c": True}, "group_n": {"bool_b": True}},
}
stacked_values = {k: pd.DataFrame(v).stack() for k, v in values.items()}
df = (
    pd.concat(stacked_values, axis=1)
    .T.fillna(False)
    .swaplevel(axis=1)  # optional
    .sort_index(axis=1)
)

Output:

  group_a                      group_n       
   bool_a bool_b bool_c bool_n  bool_b bool_n
1    True   True  False   True    True   True
2    True   True  False   True    True   True
n    True  False   True  False    True  False
like image 23
mozway Avatar answered Sep 18 '25 08:09

mozway