Both, C and C++, support an seemingly equivalent set of escape sequences like \b, \t, \n, \" and others starting with the backslash character (\). How is a backslash handled if normal character follows? As far as I remember from several compilers the escape character \ is silently skipped. On cppreference.com, I read these articles
I only found this note (in the C article) about orphan backslashes
ISO C requires a diagnostic if the backslash is followed by any character not listed here: [...]
above the reference table. I had also a look an some online compilers
#include <stdio.h>
int main(void) {
// your code goes here
printf("%d", !strcmp("\\ x", "\\ x"));
printf("%d", !strcmp("\\ x", "\\\ x"));
printf("%d", !strcmp("\\ x", "\\\\ x"));
return 0;
}
#include <iostream>
#include <string>
using namespace std;
int main() {
cout << (string("\\ x") == "\\ x");
cout << (string("\\ x") == "\\\ x");
cout << (string("\\ x") == "\\\\ x");
return 0;
}
Both treat "\\ x" and "\\\ x" as equivalent, (kind of) warning via syntax highlighting. IOW "\\\ x" has been transformed into "\\ x".
Can I assume this to be defined behavior?
"\".Edit #2: Focus even more on constant being generated (and portability).
Answer is no. It is an invalid C program and unspecified behavior C++ one.
says it is syntactically wrong (emphasize is mine), it does not produce a valid token, thus the program is invalid:
5.2.1 Character sets
2/ In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or by escape sequences consisting of the backslash \ followed by one or more characters.
6.4.4.4 Character constants
3/ The single-quote ', the double-quote ", the question-mark ?, the backslash \, and arbitrary integer values are representable according to the following table of escape sequences:
- single quote '
\'- double quote "
\"- question mark ?
\?- backslash \
\\- octal character
\octal digits- hexadecimal character
\xhexadecimal digits8/ In addition, characters not in the basic character set are representable by universal character names and certain nongraphic characters are representable by escape sequences consisting of the backslash \ followed by a lowercase letter: \a, \b, \f, \n, \r, \t, and \v. Note : If any other character follows a backslash, the result is not a token and a diagnostic is required.
says differently (emphasize is mine):
5.13.3 Character literals
7/ Certain non-graphic characters, the single quote ’, the double quote ", the question mark ?,25 and the backslash \, can be represented according to Table 8. The double quote " and the question mark ?, can be represented as themselves or by the escape sequences \" and \? respectively, but the single quote ’ and the backslash \ shall be represented by the escape sequences \’ and \ respectively. Escape sequences in which the character following the backslash is not listed in Table 8 are conditionally-supported, with implementation-defined semantics. An escape sequence specifies a single character.
Thus for C++, you need to have a look at your compiler manual for the semantic, but the program is syntactically valid.
You need to compile with a conforming C compiler. Various online compilers tend to use gcc which is by default set to "lax non-standard mode", aka GNU C. This may or may not enable some non-standard escape sequences, but it also won't produce compiler errors even when you violate the C language - you might get away with a "warning", but that doesn't make the code valid C.
If you tell gcc to behave as a conforming C compiler with -std=c17 -pedantic-errors, you get this error:
error: unknown escape sequence: '\040'
040 is octal for 32 which is the ASCII code for ' '. (For some reason gcc uses octal notation for escape sequences internally, might be because \0 is octal, I don't know why.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With