I've been previously working only with bash regular expressions, grep, sed, awk etc. After trying Perl 6 regexes I've got an impression that they work slower than I would expect, but probably the reason is that I handle them incorrectly.
I've made a simple test to compare similar operations in Perl 6 and in bash. Here is the Perl 6 code:
my @array = "aaaaa" .. "fffff";
say +@array; # 7776 = 6 ** 5
my @search = <abcde cdeff fabcd>;
my token search {
@search
}
my @new_array = @array.grep({/ <search> /});
say @new_array;
Then I printed @array into a file named array (with 7776 lines), made a file named search with 3 lines (abcde, cdeff, fabcd) and made a simple grep search.
$ grep -f search array
After both programs produced the same result, as expected, I measured the time they were working.
$ time perl6 search.p6
real 0m6,683s
user 0m6,724s
sys 0m0,044s
$ time grep -f search array
real 0m0,009s
user 0m0,008s
sys 0m0,000s
So, what am I doing wrong in my Perl 6 code?
UPD: If I pass the search tokens to grep, looping through the @search array, the program works much faster:
my @array = "aaaaa" .. "fffff";
say +@array;
my @search = <abcde cdeff fabcd>;
for @search -> $token {
say [email protected]({/$token/});
}
$ time perl6 search.p6
real 0m1,378s
user 0m1,400s
sys 0m0,052s
And if I define each search pattern manually, it works even faster:
my @array = "aaaaa" .. "fffff";
say +@array; # 7776 = 6 ** 5
say [email protected]({/abcde/});
say [email protected]({/cdeff/});
say [email protected]({/fabcd/});
$ time perl6 search.p6
real 0m0,587s
user 0m0,632s
sys 0m0,036s
The grep command is much simpler than Perl 6's regular expressions, and it has had many more years to get optimized. It is also one of the areas that hasn't seen as much optimizing in Rakudo; partly because it is seen as being a difficult thing to work on.
For a more performant example, you could pre-compile the regex:
my $search = "/@search.join('|')/".EVAL;
# $search = /abcde|cdeff|fabcd/;
say [email protected]($search);
That change causes it to run in about half a second.
If there is any chance of malicious data in @search, and you have to do this it may be safer to use:
"/@search».Str».perl.join('|')/".EVAL
The compiler can't quite generate that optimized code for /@search/ as @search could change after the regex gets compiled. What could happen is that the first time the regex is used it gets re-compiled into the better form, and then cache it as long as @search doesn't get modified.
(I think Perl 5 does something similar)
One important fact you have to keep in mind is that a regex in Perl 6 is just a method that is written in a domain specific sub-language.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With