Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging two multiindex dataframes

I have 2 dataframes:

df1 = pd.DataFrame.from_dict({('category', ''): {0: 'A',
  1: 'B',
  2: 'C',
  3: 'D',
  4: 'E',
  5: 'F',
  6: 'G'},
 (pd.Timestamp('2021-06-28 00:00:00'),
  'metric_1'): {0: 4120.549999999999, 1: 11226.016666666665, 2: 25049.443333333333, 3: 18261.083333333332, 4: 2553.1208333333334, 5: 2843.01, 6: 73203.51333333334},
 (pd.Timestamp('2021-06-28 00:00:00'), 'metric_2'): {0: 9907.79,
  1: 7614.650000000001,
  2: 13775.259999999998,
  3: 13158.250000000004,
  4: 1457.85,
  5: 1089.5600000000002,
  6: 38864.9},
 (pd.Timestamp('2021-07-05 00:00:00'),
  'metric_1'): {0: 5817.319999999998, 1: 10799.45, 2: 23521.51, 3: 22062.350833333334, 4: 1249.5974999999999, 5: 3229.77, 6: 52796.06083333332},
 (pd.Timestamp('2021-07-05 00:00:00'), 'metric_2'): {0: 6321.21,
  1: 5606.01,
  2: 10239.689999999999,
  3: 17476.600000000002,
  4: 943.7199999999999,
  5: 1410.33,
  6: 29645.45}}).set_index('category')
df2 = pd.DataFrame.from_dict({'category': {0: 'A',
  1: 'B',
  2: 'C',
  3: 'D',
  4: 'E',
  5: 'F',
  6: 'G'},
 1: {0: 36234.035577957984,
  1: 69078.07089184562,
  2: 128879.5397517309,
  3: 178376.63536908248,
  4: 9293.956915067887,
  5: 8184.780211399392,
  6: 177480.74540313095},
 2: {0: 37887.581678419825,
  1: 72243.67956241772,
  2: 134803.02342121338,
  3: 186603.8963173654,
  4: 9716.385738295368,
  5: 8555.606693927,
  6: 185658.87577993725}}).set_index('category')

First I change the column names of df2 to be the same as df

date_mappings = {
1 : '2021-06-28',
2 : '2021-07-05'}

df2 = df2.rename(columns=date_mappings)

Then I try to merge it

f = lambda x: pd.to_datetime(x)
df = (df2.merge(df1.unstack(), left_index=True, right_index=True).sort_index(axis=1))

But I get an error:

ValueError: Cannot merge a Series without a name

What is my mistake?

My goal is to add columns from df2 to df1 in each week like so that df1 would have 3 columns instead of 2.

enter image description here

After using

c = [df2.columns.map(date_mappings.get), df2.columns]
df1.join(df2.set_axis(c, axis=1)).sort_index(axis=1)

I get the values appended to the end of the dataframe rather than to the same columns with the same week naming:

enter image description here

Maybe this could be an issue that df2 holds dates from 2021-06-28 to 2022-06-27 while df1 holds dates from 2020 to today.

Unwanted adding to the end of the df enter image description here

like image 630
Jonas Palačionis Avatar asked Nov 18 '25 04:11

Jonas Palačionis


1 Answers

Idea is create MultiIndex in both DataFrames:

date_mappings = {
1 : '2021-06-28',
2 : '2021-07-05'}

#create MultiIndex in df2 with datetimes in first level
df2.columns = pd.MultiIndex.from_product([pd.to_datetime(df2.columns.map(date_mappings)), 
                                          ['metric_3']])

#removed unused levels, here category, so possible convert first leve to datetimes
df1.columns = df1.columns.remove_unused_levels()
df1.columns = df1.columns.set_levels(pd.to_datetime(df1.columns.levels[0]), level=0)

#join together and sorting MultiIndex
df = df1.join(df2).sort_index(axis=1)

print (df)
            2021-06-28                             2021-07-05            \
              metric_1  metric_2       metric_3      metric_1  metric_2   
category                                                                  
A          4120.550000   9907.79   36234.035578   5817.320000   6321.21   
B         11226.016667   7614.65   69078.070892  10799.450000   5606.01   
C         25049.443333  13775.26  128879.539752  23521.510000  10239.69   
D         18261.083333  13158.25  178376.635369  22062.350833  17476.60   
E          2553.120833   1457.85    9293.956915   1249.597500    943.72   
F          2843.010000   1089.56    8184.780211   3229.770000   1410.33   
G         73203.513333  38864.90  177480.745403  52796.060833  29645.45   

                         
               metric_3  
category                 
A          37887.581678  
B          72243.679562  
C         134803.023421  
D         186603.896317  
E           9716.385738  
F           8555.606694  
G         185658.875780  

If need remove greater datetimes like maximal df1 datetimes use:

#change mapping for test
date_mappings = {
1 : '2021-06-28',
2 : '2022-07-05'}

df2.columns = pd.MultiIndex.from_product([pd.to_datetime(df2.columns.map(date_mappings)), 
                                          ['metric_3']])

df1.columns = df1.columns.remove_unused_levels()
df1.columns = df1.columns.set_levels(pd.to_datetime(df1.columns.levels[0]), level=0)


df2 = df2.loc[:, df2.columns.get_level_values(0) <= df1.columns.get_level_values(0).max()]
print (df2)
             2021-06-28
               metric_3
category               
A          36234.035578
B          69078.070892
C         128879.539752
D         178376.635369
E           9293.956915
F           8184.780211
G         177480.745403

#join together and sorting MultiIndex
df = df1.join(df2).sort_index(axis=1)

print (df)
            2021-06-28                             2021-07-05          
              metric_1  metric_2       metric_3      metric_1  metric_2
category                                                               
A          4120.550000   9907.79   36234.035578   5817.320000   6321.21
B         11226.016667   7614.65   69078.070892  10799.450000   5606.01
C         25049.443333  13775.26  128879.539752  23521.510000  10239.69
D         18261.083333  13158.25  178376.635369  22062.350833  17476.60
E          2553.120833   1457.85    9293.956915   1249.597500    943.72
F          2843.010000   1089.56    8184.780211   3229.770000   1410.33
G         73203.513333  38864.90  177480.745403  52796.060833  29645.45
like image 94
jezrael Avatar answered Nov 19 '25 18:11

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!