I can see from this answer that if I do
sub match_all_positions {
my ($regex, $string) = @_;
my @ret;
while ($string =~ /$regex/g) { push @ret, $-[0] }
return @ret
}
print join ',', match_all_positions('0{3}', '001100010000');
I get
4,8
What do I need to do to get the indexes of all matches, even when the overlap, such as positions 8 and 9 in the example above?
I can do
sub match_all_positions_b {
my ($substr, $string) = @_;
return unless index($string, $substr) > 0;
my @res;
my $i = 0;
while ($i <= (length($string) - $length)) {
$i = index($string, $substr, $i);
last if $i < 0;
push @res, $i++;
}
return @res;
}
print join ',', match_all_positions_b('000', '001100010000');
which just lets me match a substring, or
sub match_all_positions_c {
my ($substr, $string) = @_;
my $re = '^' . $substr;
my @res;
for (0..(length($string) - $length)) {
push @res, $_ if substr($string, $_) =~ /$re/;
}
return @res;
}
print join ',', match_all_positions_c('0{3}', '001100010000');
Which is twice as slow.
is there a way to get all matches, even when they overlap? Or should I just take the speed loss because it's inherent to using regex matches?
m operator in Perl is used to match a pattern within the given text. The string passed to m operator can be enclosed within any character which will be used as a delimiter to regular expressions.
9.3. The Binding Operator, =~ Matching against $_ is merely the default; the binding operator (=~) tells Perl to match the pattern on the right against the string on the left, instead of matching against $_.
The Special Character Classes in Perl are as follows: Digit \d[0-9]: The \d is used to match any digit character and its equivalent to [0-9]. In the regex /\d/ will match a single digit. The \d is standardized to “digit”.
\s matches any single character considered whitespace. In all Perl versions, \s matches the 5 characters [\t\n\f\r ]; that is, the horizontal tab, the newline, the form feed, the carriage return, and the space. Starting in Perl v5. 18, it also matches the vertical tab, \cK .
You need to update your regex for zero-width look-ahead matching.
Try calling your function like this:
print join ',', match_all_positions('(?=0{3})', '001100010000');
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With