Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I find _all_ locations of a regex match in Perl?

Tags:

regex

perl

I can see from this answer that if I do

sub match_all_positions {
    my ($regex, $string) = @_;
    my @ret;
    while ($string =~ /$regex/g) { push @ret, $-[0] }
    return @ret
}

print join ',', match_all_positions('0{3}', '001100010000');

I get

4,8

What do I need to do to get the indexes of all matches, even when the overlap, such as positions 8 and 9 in the example above?

I can do

sub match_all_positions_b  {
    my ($substr, $string) = @_;
    return unless index($string, $substr) > 0;
    my @res;
    my $i = 0;
    while ($i <= (length($string) - $length)) {
        $i = index($string, $substr, $i);
        last if $i < 0;
        push @res, $i++;
    }
    return @res;
}

print join ',', match_all_positions_b('000', '001100010000');

which just lets me match a substring, or

sub match_all_positions_c {
    my ($substr, $string) = @_;
    my $re = '^' . $substr;
    my @res;
    for (0..(length($string) - $length)) {
         push @res, $_ if substr($string, $_) =~ /$re/;
    }
    return @res;
}

print join ',', match_all_positions_c('0{3}', '001100010000');

Which is twice as slow.

is there a way to get all matches, even when they overlap? Or should I just take the speed loss because it's inherent to using regex matches?

like image 807
simone Avatar asked Aug 17 '17 09:08

simone


People also ask

How do I search for a pattern in Perl?

m operator in Perl is used to match a pattern within the given text. The string passed to m operator can be enclosed within any character which will be used as a delimiter to regular expressions.

What does =~ in Perl?

9.3. The Binding Operator, =~ Matching against $_ is merely the default; the binding operator (=~) tells Perl to match the pattern on the right against the string on the left, instead of matching against $_.

What is \d in Perl regex?

The Special Character Classes in Perl are as follows: Digit \d[0-9]: The \d is used to match any digit character and its equivalent to [0-9]. In the regex /\d/ will match a single digit. The \d is standardized to “digit”.

How do you match a space in regex in Perl?

\s matches any single character considered whitespace. In all Perl versions, \s matches the 5 characters [\t\n\f\r ]; that is, the horizontal tab, the newline, the form feed, the carriage return, and the space. Starting in Perl v5. 18, it also matches the vertical tab, \cK .


1 Answers

You need to update your regex for zero-width look-ahead matching.

Try calling your function like this:

print join ',', match_all_positions('(?=0{3})', '001100010000');
like image 112
pitseeker Avatar answered Oct 22 '22 14:10

pitseeker