Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is the fastest manner to the derive the conditional minimum value of an R data frame column?

Suppose we have this data frame:

> data
  ID Period_1 Values
1  1  2020-03     -5
2  1  2020-04     25
3  2  2020-01     35
4  2  2020-02     45
5  2  2020-03     55
6  2  2020-04     87
7  3  2020-02     10
8  3  2020-03     20
9  3  2020-04     30

data <- 
  data.frame(
    ID = c(1,1,2,2,2,2,3,3,3),
    Period_1 = c("2020-03", "2020-04", "2020-01", "2020-02", "2020-03", "2020-04", "2020-02", "2020-03", "2020-04"),
    Values = c(-5, 25, 35, 45, 55, 87, 10, 20, 30)
  )

I would like to extract the minimum of "Values", but subject to the condition that a Period_1 condition is met (such as Period_1 == "2020-04"). My inclination is to use dplyr group_by(Period_1) %>% but I don't need the minimum for all Period_1 groupings, I just need the minimum of Values for the single specified period. The actual database I am working with has 2 million + rows and I suspect my abundant use of group_by(...) is slowing things down dramatically.

Other Stack Overflow (and Google, etc.) posts I reviewed also rely on group_by, maybe this is the quickest way to process this, I don't know, but I suspect not.

I tried the following but it didn't work: data %>% select(where(data$Period_1 == "2020-04"))%>% min(data$Values, na.rm=TRUE), returning the message "Error: Can't convert a logical vector to function"

Processing speed-wise, which is the fastest way to extract a conditional minimum? Including by use of dplyr.

like image 683
Curious Jorge - user9788072 Avatar asked Jan 31 '26 02:01

Curious Jorge - user9788072


2 Answers

Here is a base R option (if you are looking for speed). We can subset the data, then get the minimum value for the third (i.e., Values) column.

min(data[data$Period_1 == "2020-04", ][,3], na.rm = TRUE)

# [1] 25

Benchmark

enter image description here

like image 186
AndrewGB Avatar answered Feb 02 '26 19:02

AndrewGB


You are confusing dplyr::filter with dplyr::select. select(where(condition)) selects columns based on a logical condition that is aplied to the whole vector/column, as in select(where(is.numeric)), which selects numeric columns.

To select rows that meet a condition, use filter.

library(dplyr)

data %>%
   filter(Period1 == "2020-04") %>%
   pull(Values) %>%
   min(na.rm = TRUE)

# OR with `summarise`

data %>%
   filter(Period1 == "2020-04") %>%
   summarise(min_Values = min(Values, na.rm = TRUE))
like image 23
GuedesBF Avatar answered Feb 02 '26 19:02

GuedesBF



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!