Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split character as per specific sequence?

Let's say I have a specific string in R, say "ABCDEFG". I can break it into a sequence of say every two characters using the following regex.

 strsplit("ABCDEFG", "(?<=(..))", perl = TRUE)
[[1]]
[1] "AB" "CD" "EF" "G" 

But I want to split it into a specific sequence. First two characters then next one character, then again two then one and so on.

If my input string is "ABCDEFG" I want "AB" "C" "DE" "F" "G" as output (in last element there is only one element left).

How can I do it. I do not want to count nchar beforehand as I want to do it dynamically.

like image 595
AnilGoyal Avatar asked Jan 21 '26 09:01

AnilGoyal


1 Answers

We could generalize Edward's and rawr's ideas.

> spl_pat <- \(x, p) {
+   stopifnot(all(is.na(p) | p >= 0))
+   if (any(is.na(p))) return(x)  ## compatibility w/ strsplit()
+   if (identical(p, NULL)) p <- 1  ## compatibility w/ strsplit()
+   .spl <- \(x) {
+     pat <- rep_len(p, len=1 + nchar(x)/2)
+     start <- cumsum(c(1, pat[-length(pat)]))
+     stop <- cumsum(pat)
+     Filter(nzchar, substring(x, start, stop))
+   }
+   if (length(x) > 1L) lapply(x, .spl) else .spl(x)
+ }

Usage

Single strings, length(x) == 1L:

> spl_pat('ABCDEFG', 2:1)
[1] "AB" "C"  "DE" "F" 
> spl_pat('ABCDEFG', c(1, 4))
[1] "A"    "BCDE" "F"    "G"   
> spl_pat('ABCDEFG', c(0, 4))
[1] "ABCD" "EFG" 
> spl_pat('ABCDEFG', 1:1e3)
[1] "A"   "BC"  "DEF" "G"  
> spl_pat('ABCDEFG', 2)
[1] "AB" "CD" "EF" "G" 
> spl_pat('ABCDEFG', 1)
[1] "A" "B" "C" "D"
> spl_pat('ABCDEFG', 0)
character(0)
> spl_pat('ABCDEFG', NA)
[1] "ABCDEFG"
> spl_pat('ABCDEFG', NULL)
[1] "A" "B" "C" "D"

Multiple strings, length(x) > 1L:

> spl_pat(c('ABCDEFG', 'ABCDEFGHIJ'), 2:1)
[[1]]
[1] "AB" "C"  "DE" "F" 

[[2]]
[1] "AB" "C"  "DE" "F"  "GH" "I" 

Different patterns:

> Vectorize(spl_pat)(c('ABCDEFG', 'ABCDEFGHIJ'), list(2:1, 1:2))
$ABCDEFG
[1] "AB" "C"  "DE" "F" 

$ABCDEFGHIJ
[1] "A"  "BC" "D"  "EF" "G"  "HI"

> Vectorize(spl_pat)(c('ABCDEFG', 'ABCDEFGHIJ', 'ABCDEFGHIJ'), list(2:1, 1:2, 0))
$ABCDEFG
[1] "AB" "C"  "DE" "F" 

$ABCDEFGHIJ
[1] "A"  "BC" "D"  "EF" "G"  "HI"

$ABCDEFGHIJ
[1] ""

p < 0 probably wouldn't make sense, would it?:

> spl_pat('ABCDEFG', -1)
Error in spl_pat("ABCDEFG", -1) : all(is.na(p) | p >= 0) is not TRUE
like image 60
jay.sf Avatar answered Jan 24 '26 00:01

jay.sf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!