Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge dataframes without duplicating rows in python pandas [duplicate]

I'd like to combine two dataframes using their similar column 'A':

>>> df1
    A   B
0   I   1
1   I   2
2   II  3

>>> df2
    A   C
0   I   4
1   II  5
2   III 6

To do so I tried using:

merged = pd.merge(df1, df2, on='A', how='outer')

Which returned:

>>> merged
    A   B   C
0   I   1.0 4
1   I   2.0 4
2   II  3.0 5
3   III NaN 6

However, since df2 only contained one value for A == 'I', I do not want this value to be duplicated in the merged dataframe. Instead I would like the following output:

>>> merged
    A   B   C
0   I   1.0 4
1   I   2.0 NaN
2   II  3.0 5
3   III NaN 6

What is the best way to do this? I am new to python and still slightly confused with all the join/merge/concatenate/append operations.

like image 903
Martijn Avatar asked May 17 '26 18:05

Martijn


1 Answers

Let us create a new variable g, by cumcount

df1['g']=df1.groupby('A').cumcount()
df2['g']=df2.groupby('A').cumcount()
df1.merge(df2,how='outer').drop('g',1)
Out[62]: 
     A    B    C
0    I  1.0  4.0
1    I  2.0  NaN
2   II  3.0  5.0
3  III  NaN  6.0
like image 174
BENY Avatar answered May 20 '26 06:05

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!