I am trying to filter a list of dataframes depending on the mean value of one of their columns. If taking the following example:
# creating df1
df1 <- as_tibble(mtcars)
# creating df2
df2 <- as_tibble(iris)
# creating list of df (df_list)
df_list <- list(mtcars,iris)
# Checking the structure of the list
str(df_list)
List of 2
$ : tibble [32 × 11] (S3: tbl_df/tbl/data.frame)
..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
..$ disp: num [1:32] 160 160 108 258 360 ...
..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
$ : tibble [150 × 5] (S3: tbl_df/tbl/data.frame)
..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
I would like to obtain the means of the 3rd column for each df (disp
and Petal.Lenght
in this example), and then I would like to keep only the df for which the means of these columns are > 10.
I have tried the following approach:
I created a function that returns a logical value depending on the calcualted mean:
mean_logical <- function(column_mean) {
column_mean_logical <- if_else(mean(column_mean) > 10, TRUE, FALSE)
return(column_mean_logical)
}
Then, I wanted to use keep from {purrr}
and apply my function (mean_logical
) to filter the df with a mean in the third column < 10. However I am struggling on how to instruct to check the third column of each df in my list.
Of note, the only way I found to "access" the third column of each df in a list is by using the following:
lapply(df_list, "[", 3)
Any suggestion? Thanks in advance!
You can use Filter
from base
Filter(\(x) mean(x[[3]]) > 10, df_list)
or keep
from purrr
:
purrr::keep(df_list, \(x) mean(x[[3]]) > 10)
with an anonymous predicate function.
An approach using subset
or indexing with [
subset(df_list, sapply(df_list, function(x) mean(x[,3]) > 10))
df_list[sapply(df_list, function(x) mean(x[,3]) > 10)]
Since R 4.1.0 you can shorten function(x)
with \(x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With