Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count the number of negative values at the end of a vector

Tags:

r

How can I count the last negative values in sequence?

Example:

200 120 80 7 -12 -20 15 70 85 -12 -19 -43

Should return

3

Because the last three values are negative.

189 321 234 -87 -19 -8 -1 10 12 21 9 -23

Should return

1

And

145 321 213 187 87 78 -23 -43 12 -35 21

Should return

0

Because the last value isn't negative.

I know I could make some loop that would stop on the first non-negative value, but I don't think that would be computationally efficient. Is there a better and simpler way to do it?

like image 636
João Daniel Avatar asked Dec 29 '25 16:12

João Daniel


2 Answers

You can use rle:

z <- rnorm(20)
r <- rle(sign(z))
n <- length(r$values)
ifelse(r$values[n] < 1, r$lengths[n], 0)
like image 81
Hong Ooi Avatar answered Dec 31 '25 09:12

Hong Ooi


This will likely be faster than rle since it stops processing the data as soon as it finds a positive. I'll stress out that both @HongOoi and my solution assume your data does not contain any NA which is probably your case:

first.pos <- match(TRUE, rev(x) >= 0)
if (is.na(first.pos)) length(x) else first.pos - 1L

Edit: I am somewhat surprised but you can also compute first.pos as which(rev(x) >= 0)[1] and it seems even faster with various input lengths.


Benchmarks:

flodel <- function(x) {
  first.pos <- which(rev(x) >= 0)[1]
  if (is.na(first.pos)) length(x) else first.pos - 1L
}

hong <- function(z) {
  r <- rle(sign(z))
  n <- length(r$values)
  ifelse(r$values[n] < 1, r$lengths[n], 0)
}

alexis <- function(x) sum(Reduce(`==`, ifelse(rev(sign(x)) < 0, 1, NA),
                                 accumulate = T), na.rm = T)

x <- rnorm(1e1)
microbenchmark(flodel(x), hong(x), alexis(x))
# Unit: microseconds
#       expr    min      lq   median      uq      max neval
#  flodel(x) 15.079  17.003  19.8910  22.938 1434.925   100
#    hong(x) 60.632  68.652  79.7190 108.430 5778.838   100
#  alexis(x) 92.711 100.410 117.4125 151.256 2176.288   100
#   simon(x) 47.158  56.782  64.3205  86.616  791.728   100

x <- rnorm(1e4)
# Unit: microseconds
#       expr       min        lq     median         uq       max neval
#  flodel(x)   207.877   230.013   261.6110   309.2485  3619.233   100
#    hong(x)   893.420   972.497  1047.8840  2135.0650 41202.528   100
#  alexis(x) 25922.325 28983.209 31241.9405 34402.9145 75246.148   100
#   simon(x)   465.798   518.249   548.7245   646.5670  3048.535   100

One more edit. There has been a lot of discussion about handling NAs so here is a non-necessarily optimized but robust method which IMHO follows how R functions usually handle NAs:

foo <- function(x, na.rm = FALSE) {
  x.rev     <- rev(x)
  first.pos <- match(TRUE, x.rev >= 0)
  first.neg <- if (is.na(first.pos)) x.rev else head(x.rev, first.pos - 1L)
  sum(first.neg < 0, na.rm = na.rm)
}

foo(c())
# [1] 0
foo(1:3)
# [1] 0
foo(c(1, -1, NA, -1, NA, -1))
# [1] NA
foo(c(1, -1, NA, -1, NA, -1), na.rm = TRUE)
# [1] 3
like image 20
flodel Avatar answered Dec 31 '25 07:12

flodel