Good day everyone!
I'm loosing my mind trying to use sed for replacing a string pattern.
I have searched old threads about sed and escaping special characters, but I still can't get it done. I think my mind is now too deep into wanting to convolute the problem more than necessary too see the easy way.
I have a .tsv document, in which the second column represents tag-annotations that come in the form of these possibilities:
B-something
B-something-something
B-something_something
B-something-something_something
I-something
I-something-something
I-something_something
I-something-something_something
I need to change all the B-*s with B, and the same with the I-*s -> I.
I know I could make it in Python, but I need to learn sed for future quick pre-processing.
I played with regex101 and the pattern that seems to work is the following:
\b([BI]-[a-zA-Z_-]+)\b
Using sed, I could capture the first part, i.e. 'B-first_character' by using:
sed /s/\([BI]-[a-zA-Z]\)/replacing_word/g' input > output
Nothing is replaced when I use:
sed /s/\([BI]-\)\([a-zA-Z_-]+\)/replacing_word/g'
Probably the last piece of code is a horrible mistake in my mistakes, my mind is a bit blurry now. Sorry for the stupid topic and thanks all.
The sed command is corrupt: you can't use / before the s substitution command here as you meant to just use it inside single quotes.
Also, + is a literal + in a BRE POSIX pattern. Use -E or replace + with \{1,\}.
To restore the captured value use a \NUMBER in the replacement pattern.
You may use
LC_ALL=C sed 's/\([BI]\)-[a-zA-Z_-]\{1,\}/\1/g' file
See the online demo.
The LC_ALL=C will make all character classes behave the same way as at regex101.com.
Pattern details
\([BI]\) - Group 1: B or I- - a hyphen [a-zA-Z_-]\{1,\} - one or more ASCII letters, _ or - chars.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With