My colleague's code looked like this:
void copy(std::string const& s, char *d) {
for(int i = 0; i <= s.size(); i++, d++)
*d = s[i];
}
His application crashes and I think that it is because this accesses s out of range, since the condition should go only up to s.size() - 1.
But other guys next to me says there was a discussion in the past about this being legal. Can anyone please clear this up for me?
Let's put aside the possiblity that *d is invalid since that has nothing to do with what the question seems directed at: whether or not std::string operator[]() has well defined behavior when accessing the "element" at index std::string::size().
The C++03 standard has the following description of string::operator[]() (21.3.4 "basic_string element access"):
const_reference operator[](size_type pos) const; reference operator[](size_type pos);Returns: If
pos < size(), returnsdata()[pos]. Otherwise, ifpos == size(), the const version returnscharT(). Otherwise, the behavior is undefined.
Since s in the example code is const, the behavior is well defined and s[s.size()] will return a null character. However, if s was not a const string, the behavior would be undefined.
C++11 remedies this odd-ball behavior of the const version behaving so differently than the non-const version in this edge case. C++11 21.4.5 "basic_string element access" says:
const_reference operator[](size_type pos) const; reference operator[](size_type pos);Requires:
pos <= size().Returns:
*(begin() + pos) ifpos < size(), otherwise a reference to an object of type T with valuecharT(); the referenced value shall not be modified.
So for a C++11 compiler, the behavior is well-defined whether or not the string is const.
Unrelated to the question, I find it a little strange that C++11 says that "the referenced value shall not be modified" - it's not clear to me if that clause applies only in the case where pos == size(). I'm pretty sure there's a ton of existing code that does things like s[i] = some_character; where s is a non-const std:string and i < s.size(). Is that undefined behavior now? I suspect that that clause applies only to the special-case charT() object.
Another interesting thing is that neither standard seems to require that the address of the object returned for s[s.size()] be in any way related to the address of the object returned for s[s.size() - 1]. In other words, it seems like the returned charT() reference doesn't have to be contiguous to the end of the string data. I suspect that this is to give implementers a choice to just return a reference to a single static copy of that sentinel element if desired (that would also explain C++11's "shall not be modified" restriction, assuming it applies only to the special case).
cppreference says this:
reference operator[]( size_type pos ); const_reference operator[]( size_type pos ) const;If
pos==size(),
- The const version returns a reference to the character with value CharT() (the null character). (until C++11)
- Both versions returns a reference to the character with value CharT() (the null character). Modifying the null character through non-const reference results in undefined behavior. (since C++11)
So it is OK so long as you don't modify the null character.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With