Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: How to get the value_counts() above a threshold

How can I get the value_counts above a threshold? I tried

df[df[col].value_counts(dropna=False) > 3]

to get all counts greater than 3, but I am getting

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

Any hint? Thanks

like image 578
AndreasInfo Avatar asked Nov 15 '25 07:11

AndreasInfo


2 Answers

Try:

df[df.groupby(col)[col].transform('size')>3]

Or with value_counts:

counts = df[col].value_counts(dropna=False) 
valids = counts[counts>3].index

df[df[col].isin(valids)]

Another approach with value_counts and map:

counts = df[col].value_counts(dropna=False)
df[df[col].map(counts)>3]
like image 134
Quang Hoang Avatar answered Nov 17 '25 19:11

Quang Hoang


Try with isin and chain with your original value_counts

out = df[df.col.isin(df[col].value_counts(dropna=False).loc[lambda x : x>3].index)].copy()

Also Let us try filter

out = df.groupby(col).filter(lambda x : len(x)>3)
like image 36
BENY Avatar answered Nov 17 '25 20:11

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!