Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle or avoid exceptions from C++11 <regex> matching functions (§28.11)?

Starting from C++11 the <regex> headers defines the functions std::regex_match, std::regex_search and std::regex_replace in §28.11. I guess there is a valid reason for these functions not to be noexcept, but I couldn't find any reference about what these might throw or why.

  1. What types of exceptions may these functions throw?
  2. What runtime conditions cause these exceptions to be thrown?
    • Does the standard ensure that for some sets of arguments these functions never throw, e.g. does it ensure that regex_match(anyString, regex(".")) never throws?

PS: Since some of these exceptions probably inherit from std::runtime_error, they might throw std::bad_alloc during their construction.

like image 252
jotik Avatar asked Mar 19 '16 19:03

jotik


2 Answers

regex_error is the only exception mentioned as being thrown from any of the classes or algorithms in <regex>. There are two basic categories of errors: malformed regular expressions and failure to process the match.

The constructors for basic_regex can throw a regex_error (as per [re.regex.construct]\3, \7, \14, and \17) if the argument (or sequence) passed in is "not a valid regular expression." The same is true if you try to assign a basic_regex to an invalid regular expression ([re.regex.assign]/15).

Separately from that, the algorithms can also throw regex_error([re.except]/1):

The functions described in this Clause report errors by throwing exceptions of type regex_error. If such an exception e is thrown, e.code() shall return either regex_constants::error_complexity or regex_constants::error_stack.

where those two error codes mean ([re.err]):

error_complexity: The complexity of an attempted match against a regular expression exceeded a pre-set level.
error_stack: There was insufficient memory to determine whether the regular expression could match the specified character sequence.

like image 6
Barry Avatar answered Oct 30 '22 16:10

Barry


C++11 §28.6 states

The class regex_error defines the type of objects thrown as exceptions to report errors from the regular expression library.

Which means that the <regex> library should not throw anything else by itself. You are correct that constructing a regex_error which inherits from runtime_error may throw bad_alloc during construction due to out-of-memory conditions, therefore you must also check for this in your error handling code. Unfortunately this makes it impossible to determine which regex_error construction actually throws bad_alloc.

For regular expressions algorithms in §28.11 it is stated in §28.11.1 that

The algorithms described in this subclause may throw an exception of type regex_error. If such an exception e is thrown, e.code() shall return either regex_constants::error_complexity or regex_-constants::error_stack.

This means that if the functions in §28.11 ever throw a regex_error, it shall hold one of these codes and nothing else. However, note also that things you pass to the <regex> library, such as allocators etc might also throw, e.g. the allocator of match_results which may trigger if results are added to the given match_results container. Also note that §28.11 has shorthand functions which "as if" construct match_results, such as

template <class BidirectionalIterator, class charT, class traits>
bool regex_match(BidirectionalIterator first, BidirectionalIterator last,
                 const basic_regex<charT, traits> & e,
                 regex_constants::match_flag_type flags =
                 regex_constants::match_default);

template <class BidirectionalIterator, class charT, class traits>
bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                  const basic_regex<charT, traits> & e,
                  regex_constants::match_flag_type flags =
                  regex_constants::match_default); 

and possibly others. Since such might construct and use match_results with the standard allocator internally, they might throw anything std::allocator throws. Therefore your simple example of regex_match(anyString, regex(".")) might also throw due to construction and usage of the default allocator.

Another caveat to note that for some <regex> functions and classes it is currently impossible to determine whether a bad_alloc was thrown by some allocator or during construction of a regex_error exception.

In general, if you need something with a better exception specifications avoid using <regex>. If you require simple pattern matching you're better off rolling your own safe match/search/replace functions, because it is impossible to constrain your regular expressions to avoid these exceptions in a portable nor forwards-compatible manner, even using an empty regular expression "" might give you an exception.

PS: Note that the C++11 standard is rather poorly written in some aspects, lacking complete cross referencing. E.g. there's no explicit notice under the clauses for the methods of match_results to throw anything, whereas §28.10.1.1 states (emphasis mine):

In all match_results constructors, a copy of the Allocator argument shall be used for any memory allocation performed by the constructor or member functions during the lifetime of the object.

So take care when browsing the standards like a lawyer! ;-)

like image 3
mceo Avatar answered Oct 30 '22 17:10

mceo