Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pipe or sequence of function in python pandas or Filter then summarize (as dplyr)

To contextualize. I'm an R heavy user, but currently switching between python (with pandas). Let's say I have this data frame

data = {'participant': ['p1','p1','p2','p3'],
        'metadata': ['congruent_1','congruent_2','incongruent_1','incongruent_2'],
        'reaction': [22000,25000,27000,35000]
        }

df_s1 = pd.DataFrame(data, columns = ['participant','metadata', 'reaction'])
df_s1 = df_s1.append([df_s1]*15,ignore_index=True)
df_s1

and I want to reproduce what I can easily do in R (pipe functions), by:

df_s1[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1")].df_s1["reaction"].mean()

This is not possible. I just can success when I split this code into parts/variables:

x = df_s1[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1")]
x = x["reaction"].mean()
x

In dplyr way, I'd go with

ds_s1 %>% 
  filter(metadata == "congruent_1" | metadata == "incongruent_1") %>% 
  summarise(mean(reaction))

Note: I highly appreciate concise references to a site in which I could transpose my R code to Python. Several literature is available, but with mixed formats and flexible styles.

Thanks

like image 794
Luis Avatar asked Sep 13 '25 18:09

Luis


2 Answers

We have .loc here

df_s1.loc[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1"), 'reaction'].mean()
Out[117]: 24500.0

Change to isin as Quang mentioned try to reduce the line of code


In base R

mean(ds_s1$reaction[ds_s1$metadata%in%c('congruent_1','incongruent_1')])
like image 199
BENY Avatar answered Sep 16 '25 08:09

BENY


Do you mean:

df_s1.loc[(df_s1.metadata == "congruent_1") | (df_s1.metadata == "incongruent_1"), "reaction"].mean()

Or simpler with isin:

df_s1.loc[df_s1.metadata.isin(["congruent_1", "incongruent_1"]), "reaction"].mean()

Out:

24500.0
like image 36
Quang Hoang Avatar answered Sep 16 '25 08:09

Quang Hoang