Conditional mutate based on order of variables

Question

I have vectors that index the events A, B, C. Each event occurs at least once but can occur multiple times. For example:

Sequence 1: c("A", "B", "C")
Sequence 2: c("A", "A", "B", "B", "C")
Sequence 3: c("B", "A", "C")
Sequence 4: c("C", "B", "A")
Sequence 5: c("A", "B", "B", "C", "A")

For each sequence, I want to identify events that occur in the following order: A-B-C. I want to index these sequences with a vector of 0/1's indicating whether or not the matches this condition. There should only be one A or C per sequence, but multiple B's between them are acceptable. When events are not in the order A-B-C, they do not match the condition. These rules should return the vectors:

Sequence 1: c(1,1,1)
Sequence 2: c(0,1,1,1,1)
Sequence 3: c(0,0,0)
Sequence 4: c(0,0,0)
Sequence 5: c(1,1,1,1,0)

I'm not sure what general type of problem this falls under so am having trouble searching for a solution. Any suggestions would be appreciated!

Edit: open to either a base R solution applied to each vector (e.g., via lapply) or a tidyverse solution would be more appropriate so edited my question to include both list of vectors and a data frame with input/desired output Edit^2: added an additional test case

## sequences
s1<-c("A", "B", "C")  
s2<-c("A", "A", "B", "B", "C")  
s3<-c("B", "A", "C")  
s4<-c("C", "B", "A")
s5<-c("A", "B", "B", "C", "A")

## make into a list
s.list <- list(s1,s2,s3,s4,s5)

## s.list as a data frame
# https://stackoverflow.com/questions/57270001/list-to-dataframe-conversion-in-r-keeping-list-indexes
s.df <- s.list %>% 
  purrr::map(~as_tibble(.)) %>% 
  dplyr::bind_rows(.id="group")

## solution...

## desired output
# as a list
s.indexed <- list(c(1,1,1),
                  c(0,1,1,1,1),
                  c(0,0,0),
                  c(0,0,0),
                  c(1,1,1,1,0)
)
s.indexed 

# as a data frame
s.df <- s.df %>%
  bind_cols(index = unlist(s.indexed))

s.df

Andre Wildberg · Accepted Answer

This uses str_locate to find the desired characters/events. It works for one occurrence of the event per entry. It's possible to extend this to work on any number of occurrences (by using str_locate_all) but it would add some layers of complexity.

library(stringr)

lapply(s.list, \(x){
  strng <- paste0(x, collapse="")
  loc <- str_locate(strng, "AB+C")
  zeros <- rep(0, nchar(strng))
  if(!is.na(loc[1])){ 
    zeros[seq(loc[1], loc[2])] <- 1
  }
  zeros
})
[[1]]
[1] 1 1 1

[[2]]
[1] 0 1 1 1 1

[[3]]
[1] 0 0 0

[[4]]
[1] 0 0 0

[[5]]
[1] 1 1 1 1 0

thelatemail · Answer

An attempt, which relies on more manual rules then I hoped to deal with the allowed duplication of B, but not A or C:

result <- lapply(s.list, \(x) {
    ox <- ordered(x, levels=c("A","B","C"))
    os <- is.unsorted(ox)
    out <- rep(0, length(ox))
    if(!os) {
        out[ox == "A" & (!duplicated(ox, fromLast=TRUE))] <- 1
        out[ox == "C" & (!duplicated(ox))] <- 1
        out[ox == "B"] <- 1
    }
    out
})
result
#[[1]]
#[1] 1 1 1
#
#[[2]]
#[1] 0 1 1 1 1
#
#[[3]]
#[1] 0 0 0
#
#[[4]]
#[1] 0 0 0

Checks out:

identical(result, s.indexed)
#[1] TRUE

The base logic is to use an ordered factor with the desired A-B-C order, which allows for identifying the unsorted vectors first. Then, only the last A and the first C are marked as 1 using duplicated flagging.

Conditional mutate based on order of variables

Tags:

r

conditional-statements

dplyr

gregor-fausto

2 Answers

Andre Wildberg

thelatemail

Recent Activity

Donate For Us

Conditional mutate based on order of variables

Tags:

r

conditional-statements

dplyr

gregor-fausto

2 Answers

Andre Wildberg

thelatemail

Related questions

Recent Activity

Donate For Us