I have got two dataframes:
df1 = pd.DataFrame(data =
{'Invoice' : [1, 2, 3, 4, 5], 'Value' : [10, 25, 40, 10, 15]})
df2 = pd.DataFrame(data =
{'Invoice' : [2, 3, 5, 2], 'Value' : [25, 11, 15,25], 'TestData':["A",'B','C','D']})
I've already merged them and get df3:
df3=pd.merge(df1,df2, left_on=["Invoice","Value"], right_on=["Invoice","Value"])
Df3 output:
Invoice Value TestData
0 2 25 A
1 2 25 D
2 5 15 C
My question is how to get merged dataframe with "one-to-one" (I mean - when invoice number 2 occurs only once (or generally less) in one of two dataframes, then don't create another row with invoice number 2 in merged dataframe). I'd like to get something like this:
Invoice Value TestData
0 2 25 A
1 5 15 C
or this:
Invoice Value TestData
0 2 25 D
1 5 15 C
I tried only left and right merging, but this doesn't work - there are always two rows with invoice number 2.
Thank you,
Jarek
Use drop_duplicates
with specify columns names, parameter keep='last'
is for last duplicated row:
df2 = df2.drop_duplicates(["Invoice","Value"])
#same as
#df2 = df2.drop_duplicates(["Invoice","Value"], keep='first')
df3=pd.merge(df1,df2, on=["Invoice","Value"])
print (df3)
Invoice Value TestData
0 2 25 A
1 5 15 C
df2 = df2.drop_duplicates(["Invoice","Value"], keep='last')
df3=pd.merge(df1,df2, on=["Invoice","Value"])
print (df3)
Invoice Value TestData
0 2 25 D
1 5 15 C
EDIT:
If need groupby by all rows, is necessary add new column for uniqueness:
df1['g'] = df1.groupby(['Invoice','Value']).cumcount()
df2['g'] = df2.groupby(['Invoice','Value']).cumcount()
print (df1)
Invoice Value g
0 1 10 0
1 2 25 0
2 3 40 0
3 4 10 0
4 5 15 0
print (df2)
Invoice TestData Value g
0 2 A 25 0
1 3 B 11 0
2 5 C 15 0
3 2 D 25 1
df3=pd.merge(df1,df2, on=["Invoice","Value", "g"]).drop('g', axis=1)
print (df3)
Invoice Value TestData
0 2 25 A
1 5 15 C
You can try a deduplication method by specific columns:
df3.drop_duplicates(subset="Invoice")
Invoice Value TestData
0 2 25 A
2 5 15 C
For more information:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With