I have a dataframe where a column consists of a list of dictionaries, something like this-
column1 column2
0 abc [{key1:value_A, key2:value_1}, {key1:value_B, key2:value_2}, {key1:value_C, key2:value_3},...]
.
.
.
n xyz [{key1:value_A, key2:value_4}, {key1:value_B, key2:value_5}, {key1:value_C, key2:value_6},...]
I want to convert this dataframe to something like this-
column1 value_A value_B value_C ....
0 abc value_1 value_2 value_3
.
.
.
n xyz value_4 value_5 value_6
What is a fast and efficient way to do this?
You can use the following code snippet to generate the df -
import pandas as pd
df = pd.DataFrame([[1, [
{'id': 1144801690551941, 'value': 20},
{'id': 8202109018383881, 'value': 26},
{'id': 3025222222235562, 'value': 37},
{'id': 5834245818862827, 'value': 35},
{'id': 4689782481420271, 'value': 27},
{'id': 7385168421196875, 'value': 56},
]], [2, [
{'id': 1144801690551941, 'value': 25},
{'id': 8202109018383881, 'value': 26},
{'id': 3025222222235562, 'value': 38},
{'id': 5834245818862827, 'value': 35},
{'id': 4689782481420271, 'value': 21},
{'id': 7385168421196875, 'value': 53},
]], [3, [
{'id': 1144801690551941, 'value': 20},
{'id': 8202109018383881, 'value': 29},
{'id': 3025222222235562, 'value': 37},
{'id': 5834245818862827, 'value': 32},
{'id': 4689782481420271, 'value': 27},
{'id': 7385168421196875, 'value': 50},
]]], columns=['column1', 'column2'])
Which results to -
column1 column2
0 1 [{'id': 1144801690551941, 'value': 20}, {'id':...
1 2 [{'id': 1144801690551941, 'value': 25}, {'id':...
2 3 [{'id': 1144801690551941, 'value': 20}, {'id':...
The output I expect-
column1 1144801690551941 8202109018383881 3025222222235562 ...
0 1 20 26 37
1 2 25 26 38
2 3 20 29 37
Thanks!
From the column2, use tolist and recreate a dataframe that you stack to get one dictionary {'id':...,'value':...} per row.
s = pd.DataFrame(df['column2'].tolist()).stack()
print(s)
# 0 0 {'id': 1144801690551941, 'value': 20}
# 1 {'id': 8202109018383881, 'value': 26}
# 2 {'id': 3025222222235562, 'value': 37}
# 3 {'id': 5834245818862827, 'value': 35}
# 4 {'id': 4689782481420271, 'value': 27}
# 5 {'id': 7385168421196875, 'value': 56}
# 1 0 {'id': 1144801690551941, 'value': 25}
# 1 {'id': 8202109018383881, 'value': 26}
Then from there, use again tolist on this Series s and create a Dataframe, ensure to keep the original index. Append the column id just created with set_index, and unstack to get all id number as column header. You get the wanted shape for the id-value. Just need to join to column1.
res = (
df[['column1']]
.join(pd.DataFrame(s.tolist(),
s.index.get_level_values(0)) # keep original index
.set_index('id', append=True)
['value'].unstack()
.rename_axis(columns=None))
)
and you get as expected
print(res)
column1 1144801690551941 3025222222235562 4689782481420271 \
0 1 20 37 27
1 2 25 38 21
2 3 20 37 27
5834245818862827 7385168421196875 8202109018383881
0 35 56 26
1 35 53 26
2 32 50 29
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With