I have a data frame in pandas and would like to get all the values of a certain column that appear more than X times. I know this should be easy but somehow I am not getting anywhere with my current attempts.
Here is an example:
>>> df2 = pd.DataFrame([{"uid": 0, "mi":1}, {"uid": 0, "mi":2}, {"uid": 0, "mi":1}, {"uid": 0, "mi":1}]) >>> df2      mi  uid 0    1   0 1    2   0 2    1   0 3    1   0 Now supposed I want to get all values from column "mi" that appear more than 2 times, the result should be
>>> <fancy query> array([1]) I have tried a couple of things with groupby and count but I always end up with a series with the values and their respective counts but don't know how to extract the values that have count more than X from that:
>>> df2.groupby('mi').mi.count() > 2 mi 1      True 2     False dtype: bool But how can I use this now to get the values of mi that are true?
Any hints appreciated :)
To sum the number of times an element or number appears, Python's value_counts() function is used. The mode() method can then be used to get the most often occurring element.
How do you Count the Number of Occurrences in a data frame? To count the number of occurrences in e.g. a column in a dataframe you can use Pandas value_counts() method. For example, if you type df['condition']. value_counts() you will get the frequency of each unique value in the column “condition”.
Return a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.
Or how about this:
Create the table:
>>> import pandas as pd >>> df2 = pd.DataFrame([{"uid": 0, "mi":1}, {"uid": 0, "mi":2}, {"uid": 0, "mi":1}, {"uid": 0, "mi":1}]) Get the counts of each occurance:
>>> vc = df2.mi.value_counts() >>> print vc 1    3 2    1 Print out those that occur more than 2 times:
>>> print vc[vc > 2].index[0] 1 I use this:
 df2.mi.value_counts().reset_index(name="count").query("count > 5")["index"] The part before query() gives me a data frame with two columns: index and count. The query() filters on count and then we pull out the values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With