Consider two vectors.
a <- c(123, 234, 432, 223)
b <- c(234, 238, 342, 325, 326)
Now, I want to match last two digits of a to first two digits of b and create a new vector pasting first digit of a, the matched part and  last digit of b. My expected output is : 
[1] 1234 1238 2342 4325 4326 2234 2238
For simplicity purpose consider all the elements would always be of length 3.
I have tried :
sub_a <- substr(a, 2, 3)   #get last two digits of a
sub_b <- substr(b, 1, 2)   #get first two digits of b
common <- intersect(sub_a, sub_b) 
common gives me the common elements in both a and b which are :
[1] "23" "34" "32"
and then I use match and paste0 together and I get incomplete output.
paste0(a[match(common, sub_a)], substr(b[match(common, sub_b)], 3, 3))
#[1] "1234" "2342" "4325"
as match matches only with the first occurrences. 
How can I achieve my expected output?
A possible solution:
a <- setNames(a, substr(a, 2, 3))
b <- setNames(b, substr(b, 1, 2))
df <- merge(stack(a), stack(b), by = 'ind')
paste0(substr(df$values.x, 1, 1), df$values.y)
which gives:
[1] "1234" "1238" "2234" "2238" "4325" "4326" "2342"
A second alternative:
a <- setNames(a, substr(a, 2, 3))
b <- setNames(b, substr(b, 1, 2))
l <- lapply(names(a), function(x) b[x == names(b)])
paste0(substr(rep(a, lengths(l)), 1, 1), unlist(l))
which gives the same result and is considerably faster (see the benchmark).
Probably a little complex but works:
unlist( sapply( a, function(x) {
  regex <- paste0( substr(x, 2, 3), '(\\d)')
  z <- sub(regex, paste0(x, "\\1"), b)
  z[!b %in% z] 
} ))
which give: [1] "1234" "1238" "2342" "4325" "4326" "2234" "2238"
The main idea is to create a regex for each entry in a, apply this regex to b and replace the values with the current a value and append only the last digit captured (the (\\d) part of the regex, then filter the resulting vector to get back only the modified values.
Out of curiosity, I did a small benchmark (adding sub_a and sub_b creation into Sotos and Heikki answers so everyone start on the same initial vectors a of 400 observations and b of 500 observations):
Unit: milliseconds
            expr      min       lq     mean   median       uq      max neval
      Jaap(a, b) 341.0224 342.6853 345.2182 344.3482 347.3161 350.2840     3
     Tensi(a, b) 415.9175 416.2672 421.9148 416.6168 424.9134 433.2100     3
    Heikki(a, b) 126.9859 139.6727 149.3252 152.3594 160.4948 168.6302     3
     Sotos(a, b) 151.1264 164.9869 172.0310 178.8474 182.4833 186.1191     3
 MattWBase(a, b) 286.9651 290.8923 293.3795 294.8195 296.5867 298.3538     3
Another way could be to use expand.grid, so picking up at your sub_a and sub_b,
d1 <- expand.grid(a, b, stringsAsFactors = FALSE)
d2 <- expand.grid(sub_a, sub_b, stringsAsFactors = FALSE)
i1 <- d2$Var1 == d2$Var2
d1 <- d1[i1,] 
d1$Var1 <- substr(d1$Var1, 1, 1)
do.call(paste0, d1)
#[1] "1234" "2234" "1238" "2238" "2342" "4325" "4326"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With