Let's say I have the following dataFrame and I want to drop the rows containing 10, and 100, i.e. the elements that have appeared only once in col1.
I can do the following:
a = df.groupby('col1').size()
b = list(a[a == 1].index)
and then have a for loop and drop the rows one by one:
d_ind = df[df['col1']==b[0]].index
df.drop(d_ind, axis=0, inplace=True)
Is there any faster, more efficient way?
You can use the duplicated
method on col1
, which can detect whether an element has duplicates with keep=False
parameter and returns a boolean Series which you can use to subset/filter/drop rows:
df[df.col1.duplicated(keep=False)]
# col1 col2 months
#0 1 3 6
#1 1 4 6
#4 4 20 6
#5 4 11 7
#6 4 12 7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With