In pandas, what would be the idiomatic way to select multiple columns based on different patterns?

Question

I'm trying to replicate some of R4DS's dplyr exercises using Python's pandas, with the nycflights13.flights dataset. What I want to do is select, from that dataset:

Columns through year to day (inclusive);
All columns that end with "delay";
The distance and air_time columns

In the book, Hadley uses the following syntax:

library("tidyverse")
library("nycflights13")

flights_sml <- select(flights,
   year:day,
   ends_with("delay"),
   distance,
   air_time
)

In pandas, I came up with the following "solution":

import pandas as pd
from nycflights13 import flights

flights_sml = pd.concat([
    flights.loc[:, 'year':'day'],
    flights.loc[:, flights.columns.str.endswith("delay")],
    flights.distance,
    flights.air_time,
], axis=1)

Another possible implementation:

flights_sml = flights.filter(regex='year|day|month|delay$|^distance$|^air_time$', axis=1)

But I'm sure this is not the idiomatic way to write such DF-operation. I digged around, but haven't found something that fits in this situation from pandas API.

Shaido · Accepted Answer

You are correct. This will create multiple dataframes/series and then concatenate them together, resulting in a lot of extra work. Instead, you can create a list of the columns you want to use and then simply select those.

For example (keeping the same column order):

cols = ['year', 'month', 'day'] + [col for col in flights.columns if col.endswith('delay')] + ['distance', 'air_time']
flights_sml = flights[cols]

In pandas, what would be the idiomatic way to select multiple columns based on different patterns?

Tags:

python

pandas

r

Pedro Vinícius

1 Answers

Shaido

Recent Activity

Donate For Us

In pandas, what would be the idiomatic way to select multiple columns based on different patterns?

Tags:

python

pandas

r

Pedro Vinícius

1 Answers

Shaido

Related questions

Recent Activity

Donate For Us