Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find and label matching pairs of parentheses

Tags:

string

r

Given the following string of nested parentheses

a = "[[[][]]][[[][][]]]"

I am trying to find the pair of opening and closing brackets in a and label their positions with common IDs. For example, I am trying to create a vector of IDs that would look like this

b = c(1,2,3,3,4,4,2,1,5,6,7,7,8,8,9,9,6,5)

For example, here 1 and 2 in the vector b is corresponding to the pair of brackets and so on..

 [[[][]]][[[][][]]]
 1      1

 [[[][]]][[[][][]]]
  2    2

Any input in this regard is much appreciated.

like image 923
GPVS Avatar asked Jan 31 '26 06:01

GPVS


2 Answers

It's ugly

a <- "[[[][]]][[[][][]]]"
s <- unlist(strsplit(a, ''))
i <- cumsum(s == '[') * (s == '[')

while (any(idx <- i == 0)) {
  ii <- min(which(idx))
  jj <- table(i[1:ii])
  i[ii] <- max(as.integer(names(jj[jj < 2])))
}
i
# [1] 1 2 3 3 4 4 2 1 5 6 7 7 8 8 9 9 6 5
like image 181
rawr Avatar answered Feb 01 '26 20:02

rawr


@rawr, no, this is ugly:

library(data.table)
d = data.table(x = strsplit(a, "")[[1]])
d[ , g := cumsum(shift(cumsum(x == "[") == cumsum(x == "]"), fill = FALSE))]

d[ , ix := d[d[ , .I[1:(.N / 2)], by = g]$V1, {
  i = cumsum(x == "[")
  c(i, rev(i))}, by = g]$V1]

d[ , pair := .GRP, by = .(ix, (rowid(ix) - 1) %/% 2)] 

I assume speed is not an issue here, but just out of curiosity I found my data.table monstrosity to be faster on larger strings, e.g. a = paste(rep("[[[][]]][[[][][]]]", 1000), collapse = "").

all.equal(d$pair, i)
# TRUE
like image 29
Henrik Avatar answered Feb 01 '26 19:02

Henrik