Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Melt pandas dataframe containing column of dictionaries such that the dictionary values are also melted

This is not a duplicate

This question: Pandas column dict split to new column and rows does not answer the question within this post. I have included an approach to converting a column of dictionaries to a dataframe at the end of this post, that is not what I'm finding difficult here.


Setup

Given the following data:

d1 = {'a' : 12, 'b' : 44}
d2 = {'this' : 9, 'that' : 33, 'there' : 82}
d3 = {'x' : 19, 'y' : 38, 'z' : 12, 't' : 90}
df = pd.DataFrame(dict(
    var_1 = [1,2,3],
    var_2 = ['one', 'two', 'four'],
    var_3 = [d1, d2, d3]
))

Which looks as:

   var_1 var_2                                 var_3
0      1   one                    {'a': 12, 'b': 44}
1      2   two  {'this': 9, 'that': 33, 'there': 82}
2      3  four  {'x': 19, 'y': 38, 'z': 12, 't': 90}

I would like to be able to .melt, with particular id_vars, in a way which also extracted the dictionaries from the var_3 column.

Using just the first row:

   var_1 var_2                                 var_3
0      1   one                    {'a': 12, 'b': 44}

The expected interim result would be:

   var_1 var_2   key   value    
0      1   one    a     12
1      1   one    b     44

After melting this would be :

# using df.melt(id_vars = ['var_1', 'var_2'])

   var_1 var_2 variable value
0      1   one      key     a
1      1   one      key     b
2      1   one    value    12
3      1   one    value    44

I would like to do this across all the data.

Attempt

To be honest I'm quite unsure how to go about this.

# make key : value dataframe
row_i = 0
col_i = 2
key_value_df = (pd.DataFrame( df.iloc[ row_i, col_i], index= [0 ]  )
                    .T.reset_index()
                    .rename(columns = {'index' : 'key', 0 : 'value'})
            )

data_thing = (pd.concat( [pd.DataFrame(df.iloc[ 0 , [0,1]]
                    .to_dict(), index=[0])] * len(key_value_df) ))

Then

data_thing.join(key_value_df).reset_index(drop=True)

will give

   var_1 var_2 key  value
0      1   one   a     12
1      1   one   a     12

But this feels like it could be dramatically improved, and i'm unsure about generalising it to other rows.

Edit

I can get a column of dictionaries as a dataframe using something such as

all_keys = functools.reduce(lambda x,y: x+y , [list(x.keys()) for x in var3])
all_values = functools.reduce(lambda x,y: x+y, [list(x.values()) for x in var3])
pd.DataFrame(dict( keys = all_keys, values = all_values ))

giving

    keys  values
0      a      12
1      b      44
2   this       9
3   that      33
4  there      82
5      x      19
6      y      38
7      z      12
8      t      90

But this doesn't answer the question that I've asked

like image 845
baxx Avatar asked Oct 20 '25 03:10

baxx


1 Answers

Using your df

import pandas as pd

var3 = pd.DataFrame(pd.DataFrame(df['var_3'].values.tolist()).stack().reset_index(level=1))
var3.columns = ['keys','values']

print(var3)

    keys    values
0   a       12.0
0   b       44.0
1   this    9.0
1   that    33.0
1   there   82.0
2   x       19.0
2   y       38.0
2   z       12.0
2   t       90.0

df = df.join(var3)

print(df)

enter image description here

pd.json_normalize

  • This might be better
var3 = pd.DataFrame(pd.json_normalize(df.var_3).stack()).reset_index(level=1)
var3.columns = ['keys','values']
like image 121
Trenton McKinney Avatar answered Oct 21 '25 18:10

Trenton McKinney