We allow some user-supplied REs for the purpose of filtering email.  Early on we ran into some performance issues with REs that contained, for example, .*, when matching against arbitrarily-large emails.  We found a simple solution was to s/\*/{0,1024}/ on the user-supplied RE.  However, this is not a perfect solution, as it will break with the following pattern:
/[*]/
And rather than coming up with some convoluted recipe to account for every possible mutation of user-supplied RE input, I'd like to just limit perl's interpretation of the * and + characters to have a maximum length of 1024 characters.
Is there any way to do this?
This does not really answer your question, but you should be aware of other issues with user-supplied regular expressions, see for example this summary at OWASP. Depending on your exact situation, it might be better to write or find a custom simple pattern matching library?
Update
Added a (?<!\\) before the quantifiers, because escaped *+ should not be matched. Replacement will still fail if there is an \\* (match \ 0 or more times).
An improvement would be this
s/(?<!\\)\*(?!(?<!\\)[^[]*?(?<!\\)\])/{0,1024}/
s/(?<!\\)\+(?!(?<!\\)[^[]*?(?<!\\)\])/{1,1024}/
See it here on Regexr
That means match [*+] but only if there is no closing ] ahead and no [ till then. And there is no \ (the (?<!\\) part) allowed before the square brackets.
(?! ... ) is a negative lookahead
(?<! ... ) is a negative lookbehind
See perlretut for details
Update 2 include possessive quantifiers
s/(?<!(?<!\\)[\\+*?])\+(?!(?<!\\)[^[]*?(?<!\\)\])/{1,1024}/   # for +
s/(?<!\\)\*(?!(?<!\\)[^[]*?(?<!\\)\])/{0,1024}/    # for *
See it here on Regexr
Seems to be working, but its getting real complicated now!
Get a tree using Regexp::Parser and modify regex as you want, or provide GUI interface to Regexp::English
You mean except of patching the source?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With