Given three dataframes:
df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4], 'C': 'dog'})
df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3], 'C': 'dog'})
df3 = pd.DataFrame({'A': [2, 1], 'B': [5, 1], 'C': 'dog'})
how can one combine them into a single dataframe, by adding the values of a subset of given dataframes, such that the result becomes:
pd.DataFrame({'A': [8, 2], 'B': [10, 8], 'C': 'dog'})
for this example? My problem is that I also have columns which are identical, but cannot be summed (like 'C' here).
One possible solution with sum if numeric values and if strings then join unique values per groups in GroupBy.agg after concat list of DataFrames:
f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ','.join(x.unique())
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
A B C
0 8 10 dog
1 2 8 dog
If possible different values like cat and dog:
df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4], 'C': 'dog'})
df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3], 'C': 'dog'})
df3 = pd.DataFrame({'A': [2, 1], 'B': [5, 1], 'C': ['cat','dog']})
f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ','.join(x.unique())
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
A B C
0 8 10 dog,cat
1 2 8 dog
If need lists:
f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else x.unique().tolist()
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
A B C
0 8 10 [dog, cat]
1 2 8 [dog]
And for combination lists with scalars for nonnumeric values use custom function:
def f(x):
if np.issubdtype(x.dtype, np.number):
return x.sum()
else:
u = x.unique().tolist()
if len(u) == 1:
return u[0]
else:
return u
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
A B C
0 8 10 [dog, cat]
1 2 8 dog
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With