I have a dataframe of taxi data with two columns that looks like this:
Neighborhood    Borough        Time Midtown         Manhattan      X Melrose         Bronx          Y Grant City      Staten Island  Z Midtown         Manhattan      A Lincoln Square  Manhattan      B Basically, each row represents a taxi pickup in that neighborhood in that borough. Now, I want to find the top 5 neighborhoods in each borough with the most number of pickups. I tried this:
df['Neighborhood'].groupby(df['Borough']).value_counts() Which gives me something like this:
borough                           Bronx          High  Bridge          3424                Mott Haven            2515                Concourse Village     1443                Port Morris           1153                Melrose                492                North Riverdale        463                Eastchester            434                Concourse              395                Fordham                252                Wakefield              214                Kingsbridge            212                Mount Hope             200                Parkchester            191 ......  Staten Island  Castleton Corners        4                Dongan Hills             4                Eltingville              4                Graniteville             4                Great Kills              4                Castleton                3                Woodrow                  1 How do I filter it so that I get only the top 5 from each? I know there are a few questions with a similar title but they weren't helpful to my case.
Python's Pandas module provide easy ways to do aggregation and calculate metrics. Finding Top 5 maximum value for each group can also be achieved while doing the group by. The function that is helpful for finding the Top 5 maximum value is nlargest().
Return a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.
You can use pandas DataFrame. groupby(). count() to group columns and compute the count or size aggregate, this calculates a rows count for each group combination.
I think you can use nlargest - you can change 1 to 5:
s = df['Neighborhood'].groupby(df['Borough']).value_counts() print s Borough                       Bronx          Melrose            7 Manhattan      Midtown           12                Lincoln Square     2 Staten Island  Grant City        11 dtype: int64  print s.groupby(level=[0,1]).nlargest(1) Bronx          Bronx          Melrose        7 Manhattan      Manhattan      Midtown       12 Staten Island  Staten Island  Grant City    11 dtype: int64 additional columns were getting created, specified level info
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With