Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract nth occurrence with Perl Regex

Tags:

regex

perl

I am trying to find the best way to parse a line that looks like this:


Explicit|00|11|Hello World|12 3 134||and|blah|blah|blah

I just want to extract the stuff between the 6th and 7th vertical bar |
I tried something like

if ($line =~ /^(.*\|){6}(\w*)\|/ ) {  
    print $2;  
}

The problem is that the first part seems to be matching the longest sequence possible because of .*, perhaps there is something different I should be using. Between the vertical bars, there are alphanumeric characters, spaces and punctuation.

Should I be matching the shortest between them?

like image 372
MCH Avatar asked Sep 06 '25 21:09

MCH


1 Answers

You can use .*? instead, to modify the * to prefer fewer to more times.

This could still match in the wrong place if the field you want has non-word characters; to prevent this you can either explicitly say anything-but-| ( ([^|]*\|){6} ) or disable backtracking for that part ( ((?>.*?\|)){6} ).

Or you could just use split:

if ( my $seventh = ( split /\|/, $line, 8 )[6] ) {
    print $seventh;
}

(the 8 is optional and tells split not to bother trying anymore after reaching the 7th |)

like image 161
ysth Avatar answered Sep 08 '25 10:09

ysth