When you concatenate two vectors in R using c(), it "combines the arguments and results in a vector". Does it combine them by creating a new vector to take on the elements of the two vectors or is there a way to literally combine the data spaces allocated for both vectors?
When I searched I couldn't find an explanation. The visual representations of c() all literally just attach the second vector to the end of the first vector, but I think that's just so we can easily understand what this function does and not what actually happens.
When you call c(), a new vector is allocated, into which the existing vectors are combined. It happens here in the underlying C code.
PROTECT(ans = allocVector(mode, data.ans_length));
This might seem wasteful, since we already have the values written to memory, so why not just wrap up a couple of pointers to this memory and call that a vector?
There are several reasons for this.
Firstly, many of the arithmetic and statistical operations that R carries out on vectors are done by iterating through elements in contiguous memory. This would not be possible if the elements were not in contiguous memory. There would be a lot of address-checking steps and jumping between memory addresses, which would make things a lot slower. Outside of R, concatenating vectors in C or in C++ is also done by allocating a new vector, for much the same reason.
A second reason is to avoid fragmentation and memory leaks. If we created a vector from concatenating subsets of other vectors without allocating dedicated memory, we would end up with a bunch of pointers to different locations in the memory free store. If we then used subsets of this vector, we would have a nightmare of memory pointers to memory pointers to fragments of vectors, and chunks of unused fragments of vectors which could not be re-used or reclaimed by the garbage collector.
A third reason is that R users expect copy-on-modify behaviour. For example, if we have:
a <- c(1, 2, 3)
b <- c(a, a)
b
#> [1] 1 2 3 1 2 3
Then we expect to be able to change a single element:
b[6] <- 6
b
#> [1] 1 2 3 1 2 6
Whereas, if b did not have its own data allocated, this operation would change the third element of b as well as the sixth element.
As Nicola points out in the comments, another reason is that c will carry out type checking and implicit conversion between types to ensure that the underlying storage mode of the new vector is consistent. This allows some straightforward and well-defined flexibility between integers, doubles, logical vectors, factors and character strings which would be impossible if vectors created by c were composed of pieces of existing vectors.
Conceptually, the memory allocation in R works like this: each R object is stored in C as a SEXP object. This is a structure which is basically a pointer to the data itself, which is stored in memory as a structure called a SEXPREC.
Therefore, if we run the code:
A <- 1:4
B <- 5:14
the vectors A and B might be stored in memory like this:

If we then do
C <- c(A, B)
Then in memory we get:

With the data in the SEXPREC pointed to by C having been copied from the data in the two other SEXPREC objects pointed to by A and B
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With