Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a regex expression to get a substring between 2 pipes

I have a dataset that I'm trying to work with where I need to get the text between two pipe delimiters. The length of the text is variable so I can't use length to get it. This is the string:

ENST00000000233.10|ENSG00000004059.11|OTTHUMG000

I want to get the text between the first and second pipes, that being ENSG00000004059.11. I've tried several different regex expressions, but I can't really figure out the correct syntax. What should the correct regex expression be?

like image 262
Ben Tanner Avatar asked Dec 18 '25 02:12

Ben Tanner


1 Answers

Here is a regex.

x <- "ENST00000000233.10|ENSG00000004059.11|OTTHUMG000"
sub("^[^\\|]*\\|([^\\|]+)\\|.*$", "\\1", x)
#> [1] "ENSG00000004059.11"

Created on 2022-05-03 by the reprex package (v2.0.1)

Explanation:

  • ^ beginning of string;
  • [^\\|]* not the pipe character zero or more times;
  • \\| the pipe character needs to be escaped since it's a meta-character;
  • ^[^\\|]*\\| the 3 above combined mean to match anything but the pipe character at the beginning of the string zero or more times until a pipe character is found;
  • ([^\\|]+) group match anything but the pipe character at least once;
  • \\|.*$ the second pipe plus anything until the end of the string.

Then replace the 1st (and only) group with itself, "\\1", thus removing everything else.

like image 126
Rui Barradas Avatar answered Dec 20 '25 17:12

Rui Barradas



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!