Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I convert a Perl regex with named groups to Python?

Tags:

python

regex

perl

I am trying to convert the following Perl regex I found in the Video::Filename Perl module to a Python 2.5.4 regex to parse a filename

# Perl > v5.10
re => '^(?:(?<name>.*?)[\/\s._-]*)?(?<openb>\[)?(?<season>\d{1,2})[x\/](?<episode>\d{1,2})(?:-(?:\k<season>x)?(?<endep>\d{1,2}))?(?(<openb>)\])(?:[\s._-]*(?<epname>[^\/]+?))?$',

I would like to use named groups too, and I know in Python the regex extension for named groups is different, but I am not 100% sure on the syntax.

This is what I tried:

# Python (not working)
r = re.compile(r'^(?:(?P<name>.*?)[\/\s._-]*)?(?P<openb>\[)?(?P<season>\d{1,2})[x\/](?P<episode>\d{1,2})(?:-(?:\kP<season>x)?(?P<endep>\d{1,2}))?(?(P<openb>)\])(?:[\s._-]*(?P<epname>[^\/]+?))?$')

The error I get:

   raise error, v # invalid expression
sre_constants.error: bad character in group name

For example, this one I managed to convert and it works. But the one above I can't seem to get right. I get a compilation error in Python.

# Perl:
re => '^(?:(?<name>.*?)[\/\s._-]+)?(?:s|se|season|series)[\s._-]?(?<season>\d{1,2})[x\/\s._-]*(?:e|ep|episode|[\/\s._-]+)[\s._-]?(?<episode>\d{1,2})(?:-?(?:(?:e|ep)[\s._]*)?(?<endep>\d{1,2}))?(?:[\s._]?(?:p|part)[\s._]?(?<part>\d+))?(?<subep>[a-z])?(?:[\/\s._-]*(?<epname>[^\/]+?))?$',

# Python (working):
r = re.compile(r'^(?:(?P<name>.*?)[\/\s._-]+)?(?:s|se|season|series)[\s._-]?(?P<season>\d{1,2})[x\/\s._-]*(?:e|ep|episode|[\/\s._-]+)[\s._-]?(?P<episode>\d{1,2})(?:-?(?:(?:e|ep)[\s._]*)?(?P<endep>\d{1,2}))?(?:[\s._]?(?:p|part)[\s._]?(?P<part>\d+))?(?P<subep>[a-z])?(?:[\/\s._-]*(?P<epname>[^\/]+?))?$')

I am not sure where to start looking.

like image 901
Andre Miller Avatar asked Oct 25 '25 07:10

Andre Miller


2 Answers

There are 2 problems with your translation. First of all, the second mention of openb has extra parenthesis around it making it a conditional expression, not a named expression.

Next is that you didn't translate the \k<season> backreference, Python uses (P=season) to match the same. The following compiles for me:

r = re.compile(r'^(?:(?P<name>.*?)[\/\s._-]*)?(?P<openb>\[)?(?P<season>\d{1,2})[x\/](?P<episode>\d{1,2})(?:-(?:(?P=season)x)?(?P<endep>\d{1,2}))?(?(openb)\])(?:[\s._-]*(?P<epname>[^\/]+?))?$')

If I were you, I'd use re.VERBOSE to split this expression over multiple lines and add copious documentation so you can keep understanding the expression in the future if this is something that needs to remain maintainable though.

(edited after realising the second openb reference was a conditional expression, and to properly translate the backreference).

like image 112
Martijn Pieters Avatar answered Oct 28 '25 00:10

Martijn Pieters


I found the offending part but can't figure out what exactly is wrong without wrapping my mind around the whole thing.

r = re.compile(r'^(?:(?P<name>.*?)[\/\s._-]*)?(?P<openb>\[)?(?P<season>\d{1,2})[x\/](?P<episode>\d{1,2})(?:-(?:\kP<season>x)?(?P<endep>\d{1,2}))?

(?(P<openb>)\]) // this part here causes the error message

(?:[\s._-]*(?P<epname>[^\/]+?))?$')

The problem seems to be with the fact that group names in python must be valid python identifiers (check documentation). The parentheses seem to be the problem. Removing them gives

(?(P<openb>)\]) //with parentheses
(?P<openb>\])   //without parentheses

redefinition of group name 'openb' as group 6; was group 2
like image 35
jitter Avatar answered Oct 27 '25 22:10

jitter



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!