Sorry if I formatted this incorrectly or if the title isn't quite right, I am new to R and stack overflow. I am working with a list (called climates) that has 20 data frames (from each province) that each have year, month, day, and temperature columns (along with some other stuff). I am looking to find the rows where the temperature is above a certain threshold, but this threshold changes for each province. I've been able to use lapply to find the threshold for each province, but when I try to use those thresholds to find the rows in the data where the temp is above the threshold, the output isn't correct. My code does return a bunch of numbers, but they don't seem to be related to being greater than the threshold, and I also don't know how to get it to return the entire row instead of just the temperature value.
example climate list:
A <- data.frame("D" = c(1:30), "T" = c(sample(10:30, size = 30, replace = TRUE)))
B <- data.frame("D" = c(1:30), "T" = c(sample(4:22, size = 30, replace = TRUE)))
C <- data.frame("D" = c(1:30), "T" = c(sample(14:35, size = 30, replace = TRUE)))
climate <- list("Alist" = A, "Blist" = B, "Clist" = C)
climate
I've used lapply to find the threshold,
thresh95 <- lapply(lapply(
  climate, `[[`, 2), # this one takes my list of climate data and selects the T column for all provinces
  quantile, probs = c(0.95), na.rm = TRUE) # this one takes the previous list and finds 95th percentile value
thresh95
but when I try to then find the temperatures that are above the threshold, something goes wrong.
tmax95 <-  lapply(lapply(climate, `[[`, 2), # this one takes my list of climate data and selects the T column for all provinces
  function(x) x[which(x>thresh95)])# this one takes my list of climate data and selects the temps that are greater than the threshold
tmax95
Is there a way to write something that will return a subset of each province's data frame where the condition is that the temperature is greater than the threshold? Thanks!
Your thres95 is a list like
> thresh95
$Alist
95%
 29
$Blist
95%
 22
$Clist
95%
 34
but x is just a vector. So you have error if you apply x > thresh95
You can run the code below (data borrowed from @Edward)
lapply(
  climate,
  function(x) {
    subset(
      x,
      T > quantile(T, probs = 0.95)
    )
  }
)
which gives
$Alist
    D  T
19 19 30
$Blist
[1] D T
<0 rows> (or 0-length row.names)
$Clist
    D  T
17 17 35
You need mapply.
But first, always set the seed when simulating data.
set.seed(1234)
A <- data.frame("D" = c(1:30), "T" = c(sample(10:30, size = 30, replace = TRUE)))
B <- data.frame("D" = c(1:30), "T" = c(sample(4:22, size = 30, replace = TRUE)))
C <- data.frame("D" = c(1:30), "T" = c(sample(14:35, size = 30, replace = TRUE)))
mapply(\(x,y) x[which(x[,2] > y),], x=climate, y=thresh95, SIMPLIFY=FALSE)
$Alist
    D  T
19 19 30
$Blist
[1] D T
<0 rows> (or 0-length row.names)
$Clist
    D  T
17 17 35
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With