Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing a string in an array of chars without the null character

I'm reading the C++ Primer Plus by Stephen Prata. He gives this example:

char dog[8] = { 'b', 'e', 'a', 'u', 'x', ' ', 'I', 'I'}; // not a string!
char cat[8] = {'f', 'a', 't', 'e', 's', 's', 'a', '\0'}; // a string!

with the comment that:

Both of these arrays are arrays of char, but only the second is a string.The null character plays a fundamental role in C-style strings. For example, C++ has many functions that handle strings, including those used by cout.They all work by processing a string character- by-character until they reach the null character. If you ask cout to display a nice string like cat in the preceding example, it displays the first seven characters, detects the null character, and stops. But if you are ungracious enough to tell cout to display the dog array from the preceding example, which is not a string, cout prints the eight letters in the array and then keeps marching through memory byte-by-byte, interpreting each byte as a character to print, until it reaches a null character. Because null characters, which really are bytes set to zero, tend to be common in memory, the damage is usually contained quickly; nonetheless, you should not treat nonstring character arrays as strings.

Now, if a declare my variables global, like this:

#include <iostream>
using namespace std;

char a[8] = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'};
char b[8] = {'1', '2', '3', '4', '5', '6', '7', '8'};

int main(void)
{
    cout << a << endl;
    cout << b << endl;

    return 0;
}

the output will be:

abcdefgh12345678
12345678

So, indeed, the cout "keeps marching through memory byte-by-byte" but only to the end of the second character array. The same thing happens with any combination of char array. I'm thinking that all the other addresses are initialized to 0 and that's why the cout stop. Is this true? If I do something like:

for (int i = 0; i < 100; ++i)
{
    cout << *(&a + i) << endl;
}

I'm getting mostly empty space at output (like 95%, perhaps), but not everywhere.

If, however, i declare my char arrays a little bit shorter, like:

char a[3] = {'a', 'b', 'c'};
char b[3] = {'1', '2', '3'};

keeping all other things the same, I'm getting the following output:

abc
123

Now the cout doesn't even get past the first char array, not to mention the second. Why is this happening? I've checked the memory addresses and they are sequential, just like in the first scenario. For example,

cout << &a << endl;
cout << &b << endl;

gives

003B903C
003B9040

Why is the behavior different in this case? Why doesn't it read beyond the first char array?

And, lastly if I do declare my variables inside main, then I do get the behavior suggested by Prata, namely, a lot of junk gets printed before, somewhere a null character is reached.

I'm guessing that in the first case, the char array is declared on the heap and that this is initialized to 0 (but not everywhere, why?) and cout behaves differently based on the length of the char array (why?)

I'm using Visual Studio 2010 for these examples.

like image 325
mihai Avatar asked Feb 03 '26 17:02

mihai


2 Answers

It looks like your C++ compiler is allocating space in 4-byte chunks, so that every object has an address that is a multiple of 4 (the hex addresses in your dump are divisible by 4). Compilers like to do this because they like to make sure larger datatypes such as intand float (4 bytes wide) are aligned to 4-byte boundaries. Compilers like to do this because some kinds of computer hardware take longer to load/move/store unaligned int and float values.

In your first example, each array need 8 bytes of memory - a char fills a single byte - so the compiler allocates exactly 8 bytes. In the second example each array is 3 bytes, so the compiler allocates 4 bytes, fills the first 3 bytes with your data, and leaves the 4th byte unused.

Now in this second case it appears the unused byte was filled with a null which explains why cout stopped at the end of the string. But as others have pointed out, you cannot depend on unused bytes to be initialized to any particular value, so the behaviour of the program cannot be guaranteed.

If you change your sample arrays to have 4 bytes the program will behave as in the first example.

like image 76
Peter Raynham Avatar answered Feb 06 '26 06:02

Peter Raynham


The contents of memory out of bounds is indeterminate. Accessing memory you do not own, even just for reading, leads to undefined behavior.

like image 20
Some programmer dude Avatar answered Feb 06 '26 08:02

Some programmer dude