A while ago, I saw in regex (at least in PHP) you can make a capturing group not capture by prepending ?:.
$str = 'big blue ball'; $regex = '/b(ig|all)/'; preg_match_all($regex, $str, $matches); var_dump($matches); Outputs...
array(2) {   [0]=>   array(2) {     [0]=>     string(3) "big"     [1]=>     string(4) "ball"   }   [1]=>   array(2) {     [0]=>     string(2) "ig"     [1]=>     string(3) "all"   } } In this example, I don't care about what was matched in the parenthesis, so I appended the ?: ('/b(?:ig|all)/') and got output
array(1) {   [0]=>   array(2) {     [0]=>     string(3) "big"     [1]=>     string(4) "ball"   } } This is very useful - at least I think so. Sometimes you just don't want to clutter your matches with unnecessary values.
I was trying to look up documentation and the official name for this (I call it a non capturing group, but I think I've heard it before).
Being symbols, it seemed hard to Google for.
I have also looked at a number of regex reference guides, with no mention.
Being prefixed with ?, and appearing in the first chars inside parenthesis would leave me to believe it has something to do with lookaheads or lookbehinds.
So, what is the proper name for these, and where can I learn more?
'a' (which in this case ?: is doing it is matching with a string but it is excluding whatever comes after it means it will match the string but not whitespace(taking into account match(numbers or strings) not additional things with them.)
Python docs: (?:...) A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
For example, the terminology rule regular expression, "/a.b/", matches all text where there is an "a" followed by any single character, followed by a "b", as in, "a5b". The asterisk matches the preceding pattern or character zero or more times. Combining the period and asterisk, "/a.
It's available on the Subpatterns page of the official documentation.
The fact that plain parentheses fulfill two functions is not always helpful. There are often times when a grouping subpattern is required without a capturing requirement. If an opening parenthesis is followed by "?:", the subpattern does not do any capturing, and is not counted when computing the number of any subsequent capturing subpatterns. For example, if the string "the white queen" is matched against the pattern the ((?:red|white) (king|queen)) the captured substrings are "white queen" and "queen", and are numbered 1 and 2. The maximum number of captured substrings is 99, and the maximum number of all subpatterns, both capturing and non-capturing, is 200.
It's also good to note that you can set options for the subpattern with it. For example, if you want only the sub-pattern to be case insensitive, you can do:
(?i:foo)bar Will match:
But not
Oh, and while the official documentation doesn't actually explicitly name the syntax, it does refer to it later on as a "non-capturing subpattern" (which makes complete sense, and is what I would call it anyway, since it's not really a "group", but a subpattern)...
(?:) as a whole represents a non-capturing group. 
Regular-expressions.info mentions this syntax :
The question mark and the colon after the opening round bracket are the special syntax that you can use to tell the regex engine that this pair of brackets should not create a backreference. Note the question mark [...] is the regex operator that makes the previous token optional. This operator cannot appear after an opening round bracket, because an opening bracket by itself is not a valid regex token. Therefore, there is no confusion between the question mark as an operator to make a token optional, and the question mark as a character to change the properties of a pair of round brackets. The colon indicates that the change we want to make is to turn off capturing the backreference.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With