first i have a df, when i groupby it with a column, will it remove duplicate values?. Second, how to know which group have duplicate values ( i tried to find how to know which columns of a df have duplicate values but couldn't find anything, they just talk about how each element duplicated or not)
ex i have a df like this:
A B C
1 1 2 3
2 1 4 3
3 2 2 2
4 2 3 4
5 2 2 3
after groupby('A')
A B C
1 2 3
4 3
2 2 2
3 2
2 3
i want to know how many group A have B duplicated, and how many group A have C duplicated
result:
B C
1 1 2
or maybe better can caculate percent
B : 50%
C : 100%
thanks
You could use a lambda function inside GroupBy.agg to compare number of unique values that is not equal to the number of values in a group. To get the number of unique we can use Series.nunique and Series.size for the number of values in a group.
df.groupby(level=0).agg(lambda x: x.size!=x.nunique())
# B C
# 1 False True
# 2 True False
Let us try
out = df.groupby(level=0).agg(lambda x : x.duplicated().any())
B C
1 False True
2 True False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With