I have some data and I want to find consecutive values <=2, and the length of the consecutive values should >3. My objectives are (1) find the length of different groups of consecutive values and (2) find the location of the first value in each group. I have tried the following code:
set.seed(100)
pre = sample(x=1:5, size = 90, replace = T)
which(pre<=2)
and this will produce the result below:
[1] 1 2 4 8 10 13 14 17 18 19 26 30 33 37 40 41 49 50 51 52 53 54 56 57 58 60 66 69 72 80 85 88 89
So, the conservative values groups include: (1)17, 18, 19; (2)40, 41; (3)49, 59, 51, 52, 53, 54; (4)56, 57, 58; (5)88, 89.
However, as I only need consecutive values with length >=3, group (2) and (5) should be excluded from the results. I wonder how I can do this in R? Thanks for any help.
Use rle to get the lengths of each repeated section, then get the position by adding those lengths, and subset to get the desired runs. In the first line, I have to unclass rle first as otherwise data.frame doesn't know how to handle it.
out <- data.frame(unclass(rle(pre<=2)))
out$pos <- head(cumsum(c(1, out$lengths)), -1)
out[out$lengths>=3 & out$values,c("pos", "lengths")]
## pos lengths
## 17 3
## 49 6
## 56 3
If you prefer chaining with dplyr, here's a version with that idiom.
rle(pre <= 2) %>% unclass() %>% data.frame() %>%
mutate(pos = c(1, lengths) %>% cumsum %>% head(-1)) %>%
filter(lengths >=3 & values) %>% select(pos, lengths)
(In a previous version, I used do.call in the first line, which simply puts the results from rle into a data.frame; do.call simply calls the function specified by its first argument with the second argument as its parameters. It's helpful when you have a list of things (such as rle returns) which you want to use as parameters to a function. The code could certainly be written without that step, it just made it easier to keep the parts together and output only the rows you want.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With