Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R count number of pattern matches in consecutive columns

Tags:

string

r

I have a dataframe of '0's and '1's, like so:

DATA <- data.frame("V1" = c(0,0,0,0,1,1,0,1,1,1),
                   "V2" = c(1,0,0,0,1,1,0,1,1,1),
                   "V3" = c(0,0,0,0,1,0,0,1,1,1),
                   "V4" = c(1,1,1,0,1,1,0,1,1,1),
                   "V5" = c(0,0,0,0,1,1,0,1,1,1))

I want to know how many times in each row a '0' is followed by a '1' in the next column. If the first column value is a '1', this should also be counted.

I have a loop which binds each row into a vector and then counts the number of '01's using either stringi::stri_count_fixed or stringr::str_count:

  for(n in 1:nrow(DATA)) {
    # Paste row into a single character vector, with extra 0 at start in case
    # the first column value is 1.
    STRING <- do.call(paste0, c(0, DATA[n, 1:ncol(DATA)]))

    # Count number of 0-1 transitions.
    COUNT <- stringr::str_count(STRING, pattern = "01")

    # Add this to the summary column.
    DATA$Count[n] <- COUNT
  }

However, both of these are very slow with my real dataset (3000 - 4000 columns). Any ideas for speeding this up?

Desired output:

> DATA$Count
[1] 2 1 1 0 1 2 0 1 1 1
like image 966
EcologyTom Avatar asked Dec 05 '25 16:12

EcologyTom


1 Answers

A possible solution, in base R:

DATA$Count <- 
  apply(DATA, 1, \(x) x[1] + sum((x[2:length(x)] - x[1:(length(x)-1)]) > 0))
DATA

#>    V1 V2 V3 V4 V5 Count
#> 1   0  1  0  1  0     2
#> 2   0  0  0  1  0     1
#> 3   0  0  0  1  0     1
#> 4   0  0  0  0  0     0
#> 5   1  1  1  1  1     1
#> 6   1  1  0  1  1     2
#> 7   0  0  0  0  0     0
#> 8   1  1  1  1  1     1
#> 9   1  1  1  1  1     1
#> 10  1  1  1  1  1     1
like image 108
PaulS Avatar answered Dec 08 '25 10:12

PaulS



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!