This must be a simple question, but, however, it is bugging my head for a while.
For a dataframe below:
df = pd.DataFrame({'c0': ['a','b','a'],'c1': ['a','bb','a'],'c2':[10,20,30]})
  c0  c1  c2
0  a   a  10
1  b  bb  20
2  a   a  30
How to get output where count > 1?
I have tried:
df.groupby(['c0','c1'])['c2'].count()
c0  c1
a   a     2
b   bb    1
Required is:
c0  c1
a   a     2
I am looking other than
x = df.groupby(['c0','c1'])['c2'].count()
x[x>1]
i.e. a one-liner answer.
Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values.
The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.
le (equivalent to <= ) — less than or equals to. lt (equivalent to < ) — less than. ge (equivalent to >= ) — greater than or equals to. gt (equivalent to > ) — greater than.
Use GroupBy.transform for Series with same size like original DataFrame:
df1 = df[df.groupby(['c0','c1'])['c2'].transform('count') > 1]
Or use DataFrame.duplicated for filtered all dupe rows by specified columns in list:
df1 = df[df.duplicated(['c0','c1'], keep=False)]
If performance is in not important or small DataFrame use DataFrameGroupBy.filter:
df1 = df.groupby(['c0','c1']).filter(lambda x: len(x) > 1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With