I'm trying to create a DataFrame with subcolumns where the subcolumns are Dataframes.
Data:
import pandas as pd
from pandas import Timestamp
df1={'Open': {Timestamp('2020-12-15 01:05:00'): 152.28, Timestamp('2020-12-15 01:10:00'): 151.59, Timestamp('2020-12-15 01:15:00'): 152.19}, 'High': {Timestamp('2020-12-15 01:05:00'): 152.28, Timestamp('2020-12-15 01:10:00'): 152.39, Timestamp('2020-12-15 01:15:00'): 152.38}, 'Low': {Timestamp('2020-12-15 01:05:00'): 150.0, Timestamp('2020-12-15 01:10:00'): 151.34, Timestamp('2020-12-15 01:15:00'): 150.67}, 'Close': {Timestamp('2020-12-15 01:05:00'): 151.58, Timestamp('2020-12-15 01:10:00'): 152.21, Timestamp('2020-12-15 01:15:00'): 151.12}, 'price': {Timestamp('2020-12-15 01:05:00'): 149.305, Timestamp('2020-12-15 01:10:00'): nan, Timestamp('2020-12-15 01:15:00'): nan}, 'executedQty': {Timestamp('2020-12-15 01:05:00'): 6.991142857142856, Timestamp('2020-12-15 01:10:00'): nan, Timestamp('2020-12-15 01:15:00'): nan}, 'side': {Timestamp('2020-12-15 01:05:00'): 1.0, Timestamp('2020-12-15 01:10:00'): nan, Timestamp('2020-12-15 01:15:00'): nan}}
df2={'Open': {Timestamp('2020-12-15 01:05:00'): 5.385, Timestamp('2020-12-15 01:10:00'): 5.403, Timestamp('2020-12-15 01:15:00'): 5.419}, 'High': {Timestamp('2020-12-15 01:05:00'): 5.417999999999999, Timestamp('2020-12-15 01:10:00'): 5.428999999999999, Timestamp('2020-12-15 01:15:00'): 5.42}, 'Low': {Timestamp('2020-12-15 01:05:00'): 5.3839999999999995, Timestamp('2020-12-15 01:10:00'): 5.395, Timestamp('2020-12-15 01:15:00'): 5.351}, 'Close': {Timestamp('2020-12-15 01:05:00'): 5.406000000000001, Timestamp('2020-12-15 01:10:00'): 5.414, Timestamp('2020-12-15 01:15:00'): 5.37}, 'price': {Timestamp('2020-12-15 01:05:00'): nan, Timestamp('2020-12-15 01:10:00'): nan, Timestamp('2020-12-15 01:15:00'): nan}, 'executedQty': {Timestamp('2020-12-15 01:05:00'): nan, Timestamp('2020-12-15 01:10:00'): nan, Timestamp('2020-12-15 01:15:00'): nan}, 'side': {Timestamp('2020-12-15 01:05:00'): nan, Timestamp('2020-12-15 01:10:00'): nan, Timestamp('2020-12-15 01:15:00'): nan}}
df3={'Open': {Timestamp('2020-12-15 01:05:00'): 12.455, Timestamp('2020-12-15 01:10:00'): 12.429, Timestamp('2020-12-15 01:15:00'): 12.442}, 'High': {Timestamp('2020-12-15 01:05:00'): 12.458, Timestamp('2020-12-15 01:10:00'): 12.456, Timestamp('2020-12-15 01:15:00'): 12.443}, 'Low': {Timestamp('2020-12-15 01:05:00'): 12.425999999999998, Timestamp('2020-12-15 01:10:00'): 12.425, Timestamp('2020-12-15 01:15:00'): 12.383}, 'Close': {Timestamp('2020-12-15 01:05:00'): 12.435, Timestamp('2020-12-15 01:10:00'): 12.442, Timestamp('2020-12-15 01:15:00'): 12.401}, 'price': {Timestamp('2020-12-15 01:05:00'): nan, Timestamp('2020-12-15 01:10:00'): nan, Timestamp('2020-12-15 01:15:00'): nan}, 'executedQty': {Timestamp('2020-12-15 01:05:00'): nan, Timestamp('2020-12-15 01:10:00'): nan, Timestamp('2020-12-15 01:15:00'): nan}, 'side': {Timestamp('2020-12-15 01:05:00'): nan, Timestamp('2020-12-15 01:10:00'): nan, Timestamp('2020-12-15 01:15:00'): nan}}
df1=pd.DataFrame(df1)
df2=pd.DataFrame(df2)
df3=pd.DataFrame(df3)
Something like this would be expected output, however I want them to share the Timestamp index. So one row would contain all data from all dataframes under one index.
df3=pd.DataFrame()
dfList=[df1,df2,df3]
for df in dfList:
cols = pd.MultiIndex.from_frame(df, ['Open','High','Low','Close','price','executedQty' ,'side'])
df=pd.DataFrame(df, columns=cols)
df3=df3.join(df)
print(df3)
df1
Open High Low Close price executedQty \
2020-12-15 01:05:00 152.28 152.28 150.00 151.58 149.305 6.991143
2020-12-15 01:10:00 151.59 152.39 151.34 152.21 NaN NaN
2020-12-15 01:15:00 152.19 152.38 150.67 151.12 NaN NaN
side
2020-12-15 01:05:00 1.0
2020-12-15 01:10:00 NaN
2020-12-15 01:15:00 NaN
df2
Open High Low Close price executedQty side
2020-12-15 01:05:00 5.385 5.418 5.384 5.406 NaN NaN NaN
2020-12-15 01:10:00 5.403 5.429 5.395 5.414 NaN NaN NaN
2020-12-15 01:15:00 5.419 5.420 5.351 5.370 NaN NaN NaN
df3
Open High Low Close price executedQty side
2020-12-15 01:05:00 12.455 12.458 12.426 12.435 NaN NaN NaN
2020-12-15 01:10:00 12.429 12.456 12.425 12.442 NaN NaN NaN
2020-12-15 01:15:00 12.442 12.443 12.383 12.401 NaN NaN NaN
I would also like to have this function in a loop where the other DataFrames are created. This since there's alot more than 3 DataFrames and they are created through data from a request and otherwise I would have to name each DataFrame before concat which would be too much for my usecase.
So something like this
dfList=[df1,df2,df3]
dataFrame=pd.DataFrame
for d in dfList:
df=requestFuncThatCreatesDf(d)
dataFrame=dataFrame.concat([df],key=(d))
You can use pd.concat along axis=1 with optional keys parameter to concat the dataframes such that the resulting frame shares the same Timestamp index and has the MultiIndex columns:
pd.concat([df1, df2, df3], axis=1, keys=('df1', 'df2', 'df3'))
EDIT (If you want to dynamically generate keys corresponding to order of dataframes in dfList):
dfs = [requestFuncThatCreatesDf(d) for d in dfList]
pd.concat(dfs, axis=1, keys=[f'df{i + 1}' for i in range(len(dfs))])
Result:
df1 df2 df3
Open High Low Close price executedQty side Open High Low Close price executedQty side Open High Low Close price executedQty side
2020-12-15 01:05:00 152.28 152.28 150.00 151.58 149.305 6.991143 1.0 5.385 5.418 5.384 5.406 NaN NaN NaN 12.455 12.458 12.426 12.435 NaN NaN NaN
2020-12-15 01:10:00 151.59 152.39 151.34 152.21 NaN NaN NaN 5.403 5.429 5.395 5.414 NaN NaN NaN 12.429 12.456 12.425 12.442 NaN NaN NaN
2020-12-15 01:15:00 152.19 152.38 150.67 151.12 NaN NaN NaN 5.419 5.420 5.351 5.370 NaN NaN NaN 12.442 12.443 12.383 12.401 NaN NaN NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With