When there is \r in the matching string, std::regex and boost::regex behave differently. Why?
code:
#include <iostream>
#include <string>
#include <regex>
#include <boost/regex.hpp>
int main()
{ 
    std::string content = "123456728\r,234";
    std::string regex_string = "2.*?4";
    boost::regex reg(regex_string);
    boost::sregex_iterator it(content.begin(),content.end(),reg);
    boost::sregex_iterator end;
    std::cout <<"content size:" << content.size() << std::endl;
    //boost match 234 and 28\r,234
    while (it != end) 
    {
        std::cout <<"boost match: " << it->str(0) <<" size: " <<it->str(0).size() << std::endl;
        ++it;
    }
    std::regex regex_std(regex_string);
    std::sregex_iterator it_std(content.begin(),content.end(),regex_std);
    std::sregex_iterator std_end;
    //std match 234 and 234
    while (it_std != std_end) 
    {
        std::cout <<"std match: " << it_std->str(0) <<" size: " << it_std->str(0).size() << std::endl;
        ++it_std;
    }
    return 0;
}
I think the boost library behaves normally, but I don't understand why the standard library is implemented this way.
That is expected.
std::regex default flavor is ECMAScript-262, and in ECMAScript, the . char matches any character but any LineTerminator character:
The production Atom :: . evaluates as follows:
- Let A be the set of all characters except LineTerminator.
- Call CharacterSetMatcher(A, false) and return its Matcher result.
And then 7.3Line Terminators says:
Line terminators are included in the set of white space characters that are matched by the
\sclass in regular expressions.
| Code Unit Value | Name | Formal Name | 
|---|---|---|
| \u000A | Line Feed | <LF> | 
| \u000D | Carriage Return | <CR> | 
| \u2028 | Line separator | <LS> | 
| \u2029 | Paragraph separator | <PS> | 
In Boost regex, however, . matches
The NULL character when the flag match_not_dot_null is passed to the matching algorithms.
The newline character when the flag match_not_dot_newline is passed to the matching algorithms.
So, . in Boost regex matches \r, in std::regex, it does not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With