Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex to parse into a 2D array

Tags:

python

regex

I have a string like this that I need to parse into a 2D array:

 str = "'813702104[813702106]','813702141[813702143]','813702172[813702174]'"

the array equiv would be:

arr[0][0] = 813702104
arr[0][1] = 813702106
arr[1][0] = 813702141
arr[1][1] = 813702143
#... etc ...

I'm trying to do this by REGEX. The string above is buried in an HTML page but I can be certain it's the only string in that pattern on the page. I'm not sure if this is the best way, but it's all I've got right now.

imgRegex = re.compile(r"(?:'(?P<main>\d+)\[(?P<thumb>\d+)\]',?)+")

If I run imgRegex.match(str).groups() I only get one result (the first couplet). How do I either get multiple matches back or a 2d match object (if such a thing exists!)?

Note: Contrary to how it might look, this is not homework

Note part deux: The real string is embedded in a large HTML file and therefore splitting does not appear to be an option.

I'm still getting answers for this, so I thought I better edit it to show why I'm not changing the accepted answer. Splitting, though more efficient on this test string, isn't going to extract the parts from a whole HTML file. I could combine a regex and splitting but that seems silly.

If you do have a better way to find the parts from a load of HTML (the pattern \d+\[\d+\] is unique to this string in the source), I'll happily change accepted answers. Anything else is academic.

like image 292
Oli Avatar asked Oct 21 '25 01:10

Oli


1 Answers

I would try findall or finditer instead of match.

Edit by Oli: Yeah findall work brilliantly but I had to simplify the regex to:

r"'(?P<main>\d+)\[(?P<thumb>\d+)\]',?"
like image 160
stesch Avatar answered Oct 22 '25 16:10

stesch