I am trying to find the best way to parse a line that looks like this:
Explicit|00|11|Hello World|12 3 134||and|blah|blah|blah
I just want to extract the stuff between the 6th and 7th vertical bar |
I tried something like
if ($line =~ /^(.*\|){6}(\w*)\|/ ) {
print $2;
}
The problem is that the first part seems to be matching the longest sequence possible because of .*
, perhaps there is something different I should be using. Between the vertical bars, there are alphanumeric characters, spaces and punctuation.
Should I be matching the shortest between them?
You can use .*?
instead, to modify the *
to prefer fewer to more times.
This could still match in the wrong place if the field you want has non-word characters; to prevent this you can either explicitly say anything-but-| ( ([^|]*\|){6}
) or disable backtracking for that part ( ((?>.*?\|)){6}
).
Or you could just use split:
if ( my $seventh = ( split /\|/, $line, 8 )[6] ) {
print $seventh;
}
(the 8 is optional and tells split not to bother trying anymore after reaching the 7th |)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With