Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I don't understand this Textile Regex

Tags:

regex

php

perl

I found the following regex in the PHP code of the Textism Textile:

/\b ?[([]TM[])]/i

I consider myself to be experienced in reading regexes but this one is a mystery to me. The beginning is easy, but I don't understand why there are two empty character class inside of an already opened character class [[][]]?

Can someone shed some light on this issue?

like image 598
micxer Avatar asked Dec 05 '25 17:12

micxer


1 Answers

It is a rather cryptic one...

Here's what it means:

/     # start regex pattern
\b    # word boundary
 ?    # an optional space
[([]  # char class: either '(' or '['
TM    # literal 'TM'
[])]  # char class: either ']' or ')'
/     # end regex pattern
i     # match case insensitive

Some things to note:

  • inside a character class, [ is not special and need not be escaped ( [([] is therefor valid!)
  • inside a character class, the first character, possibly a special char, need not be escaped ( [])] is therefor valid: ] needs no escape!)

To summarize, it matches "TM" case insensitive surrounded by either [ or ( and ] or ) (they do not need to be matched: "[TM)" will be matched in most cases). I say in most cases, because \b ? will cause "[tm)" to be excluded from the matches in the demo below because it is preceded by ". " which does not match \b ?:

<?php
preg_match_all(
    '/\b ?[([]TM[])]/i', 
    "... [tm) foo (TM) bar [TM] baz (tm] ...", 
    $matches
);
print_r($matches);
?>
/*
Array
(
    [0] => Array
        (
            [0] =>  (TM)
            [1] =>  [TM]
            [2] =>  (tm]
        )

)
*/
like image 56
Bart Kiers Avatar answered Dec 08 '25 08:12

Bart Kiers



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!