Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLite regex - not PCRE? What are the rules?

Tags:

regex

sqlite

I'm pretty used to the PCRE regex flavor at this point, and some other alternatives like plain grep's style. But, I couldn't figure out how to do some basic things in the new built-in SQLite regexes (version 3.37.2)

Attempting to match a whole string with ^ and $ anchors works as expected:

sqlite> SELECT 'pancakes' REGEXP '^pancakes$';
1

I'd expect this next one to work. I always thought | had "everything to left" and "everything to right" behavior (low "precedence"?):

sqlite> SELECT 'pancakes' REGEXP '^pan|cakes$';
0

That works fine in Python:

>>> re.search(r"^pan|cakes$", "pancakes")
<re.Match object; span=(0, 3), match='pan'>

Maybe I have to escape |? But doesn't work:

sqlite> SELECT 'pancakes' REGEXP '^pan\|cakes$';
0

Does | work at all? Yes, it seems to, but maybe not doing what I think:

sqlite> SELECT 'pancakes' REGEXP 'pan|cakes';
1

Maybe the precedence is off, I'll group to be safe. Uh oh:

sqlite> SELECT 'pancakes' REGEXP '(^pan)|(cakes$)';
Error: stepping, unmatched '(' (1)

(escaping the parens also gave the same error)

How can I do grouping, anchoring, | etc? I wasn't able to find much about the regex style in the docs, just something saying that REGEXP operator is really using some regex function under the hood.

like image 731
dcc310 Avatar asked Oct 14 '25 12:10

dcc310


1 Answers

I found the code for the default regexp implementation: https://sqlite.org/src/file/ext/misc/regexp.c

The syntax documented in a code comment block suggests that the syntax you used should work, but I find it does not. I tested on 3.37.0 and then upgraded to 3.39.3 but it still fails.

I was able to get it to work this way:

sqlite> SELECT 'pancakes' REGEXP '(^pan)|cakes$';
1

It seems in the SQLite regex parser, | binds tighter than ^ or $, so without the parentheses, it evaluates as if we had written it this way:

sqlite> SELECT 'pancakes' REGEXP '^(pan|cakes)$';
0

This is obviously not going to match, because it would only match the whole string "pan" or "cakes".

This is not compatible with PCRE semantics.

I also tried to use explicit parentheses to override the precedence of |:

sqlite> SELECT 'pancakes' REGEXP '(^pan)|(cakes$)';
Runtime error: unmatched '('

This should work, but it hits a bug in SQLite with respect to ^ and $. Read these threads for the discussion from July 2022:

  • https://sqlite.org/forum/forumpost/96692f8ba5
  • https://sqlite.org/forum/forumpost/18f87fdcdf
like image 142
Bill Karwin Avatar answered Oct 17 '25 07:10

Bill Karwin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!