Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add R to a dynamically created regex pattern

Tags:

c++

regex

I would like to create a regex pattern dynamically. I am able to build the string that contains the pattern. Now, in a definition like

  std::regex pattern{ R"((\w)+)" };

the "((\w)+)" needs to be my string containing the pattern I am creating.

But how can I add the R outside of the string?

I could create the string using escape characters. But I am curious if there is a way around it.

Suppose

std::string myPatternWithoutEscapeChar;

is my string that contains a regex without escape characters, say ([\]+) as opposed to ([\\]+)

I would like to do what amounts to

  std::regex pattern{ R... };

with that R as in the first definition above and where ... is the content of myPatternWithoutEscapeChar inside "".

like image 417
Kae Avatar asked Jan 21 '26 14:01

Kae


1 Answers

It doesn't matter if you are using the Raw string construct or not.
Its what gets passed to the Regular Expression engine that counts.

So, as always, there is the language string parsing phase, then the regex parsing phase.
That's why it is extremely important to write and debug the entire regex in its raw state first.
This avoids any confusion.

Use a tool to create/test a regex first. A good one is RegexFormat 5.
Its like the Swiss army knife for regex processing. It also has embedded regex engines in a
complete test harness Find/Replace paradigm. Formats/compresses,Eror checks, and will make
any kind of string out of it including raw, that you can drop into your source code. It also
can take your source code strings, parse them for language, then for regex, then process the regex.

Your only concern is then with regular expressions, which you should learn.

The first lesson is that regex is a language, it contains meta character construct combinations.

A sample of metachar's is .,?,\,+,*,^,$,#,[,],(,) They all have special meaning depending
on how they are used. A construct can be a series of metachars/normal chars that start it end it,
example (?'Var' ... )

As with all languages, there needs to be a way to introduce literals within the code constructs.
A conflict develops if a literal being matched is a metacharacter.

To differentiate that it has literal meaning, the escape character is placed in front of it.

But what happens when the literal being matched is the actual escape character?
It, the escape character is actually escaped, now it has literal meaning.

You really don't want to assume what regex looks like to engines below the raw representation.
Raw representation is devoid of language or regex delimiters.

For instance, you mention ([\]+) as opposed to ([\\]+)

In raw form ([\]+) won't compile into a regex object.
It has an opening metachar [ an literal metachar \] with no closing metachar ].

This ([\\]+) is better it has an opening [ with one literal \ then a closing ].

So, ([\\]+) is the RAW regex.

Then, it is presented to the language as a double quoted string "([\\\\]+)"
or as a raw string R"([\\]+)".

I have only glanced at the new C++11 Raw string constructs, I know you can use any series of
characters, so this is just general info.

Good luck!


Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!