I have a dataframe of '0's and '1's, like so:
DATA <- data.frame("V1" = c(0,0,0,0,1,1,0,1,1,1),
"V2" = c(1,0,0,0,1,1,0,1,1,1),
"V3" = c(0,0,0,0,1,0,0,1,1,1),
"V4" = c(1,1,1,0,1,1,0,1,1,1),
"V5" = c(0,0,0,0,1,1,0,1,1,1))
I want to know how many times in each row a '0' is followed by a '1' in the next column. If the first column value is a '1', this should also be counted.
I have a loop which binds each row into a vector and then counts the number of '01's using either stringi::stri_count_fixed or stringr::str_count:
for(n in 1:nrow(DATA)) {
# Paste row into a single character vector, with extra 0 at start in case
# the first column value is 1.
STRING <- do.call(paste0, c(0, DATA[n, 1:ncol(DATA)]))
# Count number of 0-1 transitions.
COUNT <- stringr::str_count(STRING, pattern = "01")
# Add this to the summary column.
DATA$Count[n] <- COUNT
}
However, both of these are very slow with my real dataset (3000 - 4000 columns). Any ideas for speeding this up?
Desired output:
> DATA$Count
[1] 2 1 1 0 1 2 0 1 1 1
A possible solution, in base R:
DATA$Count <-
apply(DATA, 1, \(x) x[1] + sum((x[2:length(x)] - x[1:(length(x)-1)]) > 0))
DATA
#> V1 V2 V3 V4 V5 Count
#> 1 0 1 0 1 0 2
#> 2 0 0 0 1 0 1
#> 3 0 0 0 1 0 1
#> 4 0 0 0 0 0 0
#> 5 1 1 1 1 1 1
#> 6 1 1 0 1 1 2
#> 7 0 0 0 0 0 0
#> 8 1 1 1 1 1 1
#> 9 1 1 1 1 1 1
#> 10 1 1 1 1 1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With