I have this Dataframe
temp = pd.DataFrame({'Person': ['P1', 'P2'], 'Dictionary': [{'value1': 0.31, 'value2': 0.304}, {'value2': 0.324}]})
Person Dictionary
0 P1 {'value1': 0.31, 'value2': 0.304}
1 P2 {'value2': 0.324}
I want an output in this format:
temp1 = pd.DataFrame({'Person': ['P1', 'P1', 'P2'], 'Values_Number': ['value1', 'value2', 'value2'], 'Values': [0.31, 0.304, 0.324]})
I tried using this:
temp['Dictionary'].apply(pd.Series).T.reset_index()
Person Values_Number Values
0 P1 value1 0.310
1 P1 value2 0.304
2 P2 value2 0.324
But i am not able to concat this with the previous Dataframe. Also, we would be chances of error.
IIUC, We could useSeries.tolist
in order to build a new DataFrame
that we can melt
with DataFrame.melt
new_df = (pd.DataFrame(temp['Dictionary'].tolist(), index=temp['Person'])
.reset_index()
.melt('Person', var_name='Values_Number', value_name='Values')
.dropna()
.reset_index(drop=True))
print(new_df)
Person Values_Number Values
0 P1 value1 0.310
1 P1 value2 0.304
2 P2 value2 0.324
it is much more efficient to use pd.DataFrame(df['Dictionary'].tolist())
than .apply(pd.Series)
. You can see when you should use apply
in you code here
This is result for apply(pd.Series)
obtained in this publication.
%timeit s.apply(pd.Series)
%timeit pd.DataFrame(s.tolist())
2.65 ms ± 294 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
816 µs ± 40.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With