Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex \number adding \0x

I am tring to go a simple regex replace on a string in python. This is my code:

>>> s = "num1 1 num2 5"
>>> re.sub("num1 (.*?) num2 (.*?)","1 \1 2 \2",s)

I would expect an output like this, with the \numbers being replaced with their corresponding groups.

'1 1 2 5'

However, this is the output I am getting:

'1 \x01 2 \x025'

And I'm kinda stumped as to why the \x0s are their, and not what I would like to be there. Many thanks for any help

like image 212
ACarter Avatar asked Mar 18 '26 07:03

ACarter


2 Answers

You need to start using raw strings (prefix the string with r):

>>> import re
>>> s = "num1 1 num2 5"
>>> re.sub(r"num1 (.*?) num2 (.*?)", r"1 \1 2 \2", s)
'1 1 2 5'

Otherwise you would need to escape your backslashes both for python and for the regex, like this:

>>> re.sub("num1 (.*?) num2 (.*?)", "1 \\1 2 \\2", s)
'1 1 2 5'

(this gets really old really fast, check out the opening paragraphs of the python regex docs

like image 72
Nolen Royalty Avatar answered Mar 19 '26 21:03

Nolen Royalty


\1 and \2 are getting interpreted as octal character code escapes, rather than just getting passed to the regex engine. Using raw strings r"\1" instead of "\1" prevents this interpretation.

>>> "\17"
'\x0f'
>>> r"\17"
'\\17'
like image 31
Amber Avatar answered Mar 19 '26 21:03

Amber



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!