Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does the standard require `operator->()` to be defined for past-the-end non-contiguous iterators?

Does the standard require that operator->() is defined for non-contiguous past-the-end iterators?

Background:

  • Regardless of the iterator category, it is allowed for operator*() to exhibit undefined behavior when the iterator points to past-the-end. That is explicit at https://en.cppreference.com/w/cpp/iterator, section "Dereferenceability and validity", which says "Values of an iterator i for which the expression *i is defined are called dereferenceable. The standard library never assumes that past-the-end values are dereferenceable."
  • EDIT: The conclusion in this bullet is wrong: see my answer below. For contiguous iterators, I believe it is disallowed for operator->() to exhibit undefined behavior when the iterator points to past-the-end. This can be inferred from two sections at cppreference: (1) At https://en.cppreference.com/w/cpp/iterator/contiguous_iterator, the "Semantic requirements" section defines the non-dereferenceable iterator c and states requirements for std::to_address(c) which imply that std::to_address(c) does not exhibit undefined behavior. (2) at https://en.cppreference.com/w/cpp/memory/to_address it gives a "Possible implementation" where std::to_address depends on operator->(). EDIT: The "possible implementation" does not use operator->() in case pointer_traits is defined for the iterator; if that is the case it seems allowed for operator->() to not be defined for end iterators.

It is less clear whether non-contiguous iterators are allowed to implement operator->() so that its behavior is undefined for past-the-end iterators. Here is various "evidence"/"hints" I found related to this:

  • Non-contiguous C++20 iterator concepts do not seem to require that operator->() is defined at all. That's simply not mentioned among all requirements defined at https://en.cppreference.com/w/cpp/iterator/random_access_iterator, or the requirements they depend on, AFAICS.
  • The contiguous C++20 iterator concept, std::contiguous_iterator, requires operator->(): as mentioned above, that can be inferred since std::to_address must be defined, and the "Possible implementation" for std::to_address uses operator->(). And the page https://en.cppreference.com/w/cpp/memory/to_address explicitly mentions std::contiguous_iterator. Since it does not mention other iterator types, this does not imply anything for non-contiguous iterators, IIUC.
  • The legacy iterator categories require that operator->() is defined for LegacyInputIterator and stronger (see the table at https://en.cppreference.com/w/cpp/named_req/InputIterator) and this is qualified by "Precondition: i is dereferenceable". In other words, there is an explicit exception allowing for past-the-end iterators to exhibit undefined behavior. I don't see this precondition removed for any of the stronger legacy iterator categories (FWIW not even https://en.cppreference.com/w/cpp/named_req/ContiguousIterator, so that seems like a difference between LegacyContiguousIterator and std::contiguous_iterator).
  • On the other hand, at https://en.cppreference.com/w/cpp/iterator, section "Dereferenceability and validity", it mentions that operator*() does not need to be defined for past-the-end iterators, but does not mention operator->(). So this may suggest that the exception that allows some operations on past-the-end iterators to be undefined, does not necessarily apply to operator->().

So, easy to get confused, and I see quite a bit of "evidence"/"hints" related to non-contiguous iterators, operator->(), and past-the-end. But no explicit requirement, as far as I could find, which settles whether non-contiguous iterators are allowed to exhibit undefined behavior in operator->() when the iterator points to past-the-end. Does anyone have a more definite answer?

Edit: Thanks for several helpful comments. To give some background, possibly answering some of the replies: The practical side, slightly simplified, is that I defined iterator wrappers, i.e., custom iterator classes whose implementation depends on an existing ("wrapped") iterator. The type of the wrapped iterator is given as a template parameter and the wrapped iterator is given as a constructor argument. I carelessly assumed that I could define operator->() in the wrapper class as &*wrapped_iterator. This worked on clang, gcc, MSVC release build, and even in MSVC debug build for non-contiguous iterators. However, it resulted in assertion failures on MSVC's debug build for contiguous iterators, in functions like std::vector::assign(first,last) where first and last are wrappers around contiguous iterators and first and last are both past-the-end iterators. The reason was that MSVC's vector::assign invokes std::address_of(first) even if first==last, which I didn't anticipate. In turn, address_of invoked first.operator->() for my contiguous iterator wrapper. Since I had defined operator->() as &*wrapped_iterator, it invokes operator* in the wrapped iterator whose behavior under the given conditions is undefined. In this particular case it resulted in an assertion failure, because MSVC's debug mode has special code that checks things like this (_ITERATOR_DEBUG_LEVEL).

So I need to change my iterator wrapper's implementation of operator->(). My first idea was to make it invoke operator->() on the wrapped iterator. However, that is not guaranteed to be defined (see the accepted answer). What I have to do, is invoke std::to_address on the wrapped iterator.

like image 998
Sven Sandberg Avatar asked Dec 02 '25 22:12

Sven Sandberg


1 Answers

By my reading of the standard, a random_access_iterator is not required to define operator->. Thus, generic algorithms must only use operator*. Of course concrete iterators are allowed to define additional methods, AFAIK the standard poses no requirements on those. So a non-contiguous iterator with UB in end.operator->() should be fine; C++20 algorithms should not be calling operator-> at all.

(I think the rationale here is that some iterators might want to return elements by value, but operator-> is required to return a pointer. So those iterators are forced to not define any operator->.)

It's only contiguous_iterator that introduces the operator-> requirement (indirectly via std::to_address). There, despite the common "obviously it's undefined behavior" reflex in the comments here, it must not have undefined behavior: std::to_address(c) == std::to_address(a) + std::iter_difference_t<I>(c - a) must hold even when c is a past-the-end iterator. This makes sense when you consider that a past-the-end contiguous iterator can be converted to a past-the-end pointer via &c.operator->().

The situation is different for legacy iterators (pre-C++20 algorithms): these expect that a->m is equivalent to (*a).m ([input.iterators] table). This also allows UB for non-contiguous iterators.

Note that these legacy iterator requirements are still used in C++20, e.g. for std::sort; it's only the newer C++20-only functions like std::ranges::sort that use the new concept-based iterator requirements.

(by the way, these language-lawyer question should be based on the actual standard text, not cppreference. Though in this case it turns out there's no significant difference between them.)

like image 132
Daniel Avatar answered Dec 05 '25 11:12

Daniel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!