Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL REGEXP Changes in 8.X

Tags:

regex

sql

mysql

My Ubuntu Web development system was recently updated from 19.04 to 19.10 that included an update of MySQL, which is now version 8.0.18. The live server has version 5.6.45 (it is a shared server running CentOS so I cannot change it) and I discovered that something in REGEXP has apparently changed with the update.

The following query, when run locally, gives Illegal argument to a regular expression but runs properly on the live server. I am not too familiar with regular expressions much less REGEXP in MySQL so how can I make this work in the newer MySQL without breaking it in the old one?

These are the two REGEXP lines by themselves, followed by the complete query.

REGEXP '[[:<:]][0-9]+(-[0-9]+)*[[:>:]]'

REGEXP '^[0-9]+(-[0-9]+)*$'

SELECT t.ID AS partID, t.partNo,
  TRIM(CONCAT(
    IF(N.n = 0, '', LEFT(t.Model, 2)),
    SUBSTRING_INDEX(SUBSTRING_INDEX(t.Model, '-', N.n + 1), '-', -1)
  )) AS modelNo, NULL AS bodyNo, t.xHD AS isRHD
FROM (
  SELECT
    SUBSTRING_INDEX(SUBSTRING_INDEX(t.part, ' ', N.n + 1), ' ', -1) AS Model,
    CASE LEFT(t.part, 3) WHEN 'LHD' THEN 1 WHEN 'RHD' THEN 2 ELSE 0 END AS xHD,
    t.ID, t.GroupNumber, t.partNo, t.Models
  FROM (
    SELECT
      LTRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(p.Models, ';', N.n + 1), ';', -1)) part,
      p.ID, p.GroupNumber, p.partNo, p.Models
    FROM parts_listing AS p CROSS JOIN parts_temp AS N
    WHERE Models REGEXP '[[:<:]][0-9]+(-[0-9]+)*[[:>:]]' AND
          N.n <= LENGTH(Models) - LENGTH(REPLACE(Models, ';', ''))
  ) AS t CROSS JOIN parts_temp AS N
  WHERE N.n <= LENGTH(t.part) - LENGTH(REPLACE(t.part, ' ', ''))
) AS t CROSS JOIN parts_temp AS N
WHERE t.Model REGEXP '^[0-9]+(-[0-9]+)*$' AND
      N.n <= LENGTH(t.model) - LENGTH(REPLACE(t.Model, '-', '')); 

1 Answers

MySQL 8.0.4 introduced changed the implementation of its regexes engine from Henry Spencer's implementation to Internation Components for Unicode (ICU). This involves some non-backward compatible changes, which are listed in the documentation. Here is the part that is of interest for your use case:

The Spencer library supports word-beginning and word-end boundary markers ([[:<:]] and [[:>:]] notation). ICU does not. For ICU, you can use \b to match word boundaries; double the backslash because MySQL interprets it as the escape character within strings.

In other words, the following regexp is invalid in MySQL 8.0.4 onwards: '[[:<:]][0-9]+(-[0-9]+)*[[:>:]]', because it contains Henry Spencer's boudary markers.

how can I make this work in the newer MySQL without breaking it in the old one?

Short answer: you can't. Changes are not backward compatible. You may be able to work around this situation by defining a custom expression (or character class) that represents word word boundaries for your use case. Here is an extremely simplified version, that handles a space or the beginning/end of the string:

'(^|\s)[0-9]+(-[0-9]+)*(\s|$)
like image 143
GMB Avatar answered Oct 23 '25 07:10

GMB