Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would I group, summarize and filter a DF in pandas in dplyr-fashion?

I'm currently studying pandas and I come from an R/dplyr/tidyverse background.

Pandas has a not-so-intuitive API and how would I elegantly rewrite such operation from dplyr using pandas syntax?

library("nycflights13")
library("tidyverse")

delays <- flights %>%
  group_by(dest) %>%
  summarize(
    count = n(),
    dist = mean(distance, na.rm = TRUE),
    delay = mean(arr_delay, na.rm = TRUE)
  ) %>%
  filter(count > 20, dest != "HNL")
like image 993
Pedro Vinícius Avatar asked Oct 22 '25 00:10

Pedro Vinícius


1 Answers

pd.DataFrame.agg method doesn't allow much flexibility for changing columns' names in the method itself

That's not exactly true. You could actually rename the columns inside agg similar to in R although it is a better idea to not use count as a column name as it is also an attribute:

    delays = (
    flights
    .groupby('dest', as_index=False)
    .agg(
        count=('year', 'count'),
        dist=('distance', 'mean'),
        delay=('arr_delay', 'mean'))
    .query('count > 20 & dest != "HNL"')
    .reset_index(drop=True)
)
like image 130
Nuri Taş Avatar answered Oct 24 '25 15:10

Nuri Taş



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!