Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop rows of a pandas dataFrame with unique elements in a given column. (by unique I mean repeated once)

Let's say I have the following dataFrame and I want to drop the rows containing 10, and 100, i.e. the elements that have appeared only once in col1.

DataFrame

I can do the following:

a = df.groupby('col1').size()
b = list(a[a == 1].index)

and then have a for loop and drop the rows one by one:

d_ind = df[df['col1']==b[0]].index
df.drop(d_ind, axis=0, inplace=True)

Is there any faster, more efficient way?

like image 527
Hossein Noorazar Avatar asked Sep 14 '25 14:09

Hossein Noorazar


1 Answers

You can use the duplicated method on col1, which can detect whether an element has duplicates with keep=False parameter and returns a boolean Series which you can use to subset/filter/drop rows:

df[df.col1.duplicated(keep=False)]

#   col1  col2  months
#0     1     3       6
#1     1     4       6
#4     4    20       6
#5     4    11       7
#6     4    12       7
like image 98
Psidom Avatar answered Sep 17 '25 06:09

Psidom