How to count the number of values per column above a sequence of thresholds ?
i.e.: calculate for each column, the number of values above 100, then above 150, then above ... and store the results in a data frame ?
# Reproductible data
# (Original data is daily streamflow values organized in columns per year)
set.seed(1234)
data = data.frame("1915" = runif(365, min = 60, max = 400),
"1916" = runif(365, min = 60, max = 400),
"1917" = runif(365, min = 60, max = 400))
# my code chunck
mymin = 75
mymax = 400
my step = 25
apply(data, 2, function (x) {
for(i in seq(mymin,mymax,mystep)) {
res = (sum(x > i)) # or nrow(data[x > i,])
return(res)
}
})
This code works well for one iteration, but I can't store the result of each iteration in a data frame.
I also tried approaches such as :
for (i in 1:n){
seuil = seq(mymin, mymax, my step)
lapply(data, function(x) {
res [[i]] = nrow(data[ x > seuil[i], ])
return(res)}
})
Which does not work really well...
The output would be something like :
year | n value above 75 | n values above 100 | n value above ... |
---|---|---|---|
1915 | 348 | 329 | ... |
1916 | 351 | 325 | ... |
... | ... | ... | ... |
Thanks for your comments and suggestions :)
You can try :
vals <- seq(mymin,mymax,mystep)
mat <- sapply(vals, function(x) sapply(data, function(y) sum(y > x)))
colnames(mat) <- paste0('values_above_', vals)
mat
# values_above_75 values_above_100 values_above_125 values_above_150 values_above_175
#X1915 348 329 303 276 235
#X1916 351 325 305 277 252
#X1917 345 315 291 260 236
# values_above_200 values_above_225 values_above_250 values_above_275 values_above_300
#X1915 212 186 153 126 104
#X1916 226 204 181 146 118
#X1917 208 186 161 133 99
# values_above_325 values_above_350 values_above_375 values_above_400
#X1915 74 49 28 0
#X1916 92 62 40 0
#X1917 81 60 34 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With