In the following script
import pandas as pd
def start():
df_dict = {"A": [1,2,3,3,4], "B": [1,2,2,3,4]}
df = pd.DataFrame(df_dict)
df.drop_duplicates(inplace = True, keep = "last")
print(df)
if __name__ == "__main__":
start()
The duplicates in df are not removed. What could be the reason
Current output:
A B
0 1 1
1 2 2
2 3 2
3 3 3
4 4 4
Expected output:
A B
0 1 1
1 2 2
3 3 3
4 4 4
The .drop_duplicates() method looks at duplicate rows for all columns of the dataframe, so you need to use .drop_duplicates() while subsetting for each of the two columns, then get the intersection of these two subset dataframes (inner merge). Instead of printing out the resulting dataframe, it's probably more in your interest to have your function return the dataframe.
import pandas as pd
def start():
df_dict = {"A": [1,2,3,3,4], "B": [1,2,2,3,4]}
df = pd.DataFrame(df_dict)
# drop duplicates within each column
df1 = df.drop_duplicates(subset='A', keep='last')
df2 = df.drop_duplicates(subset='B', keep='last')
return pd.merge(df1,df2,how='inner')
if __name__ == "__main__":
result = start()
Output:
>>> result
A B
0 1 1
1 3 3
2 4 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With