Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linux memcpy restrict keyword syntax

I know that the restrict qualifier in C specifies that the memory region pointed by two pointers should not overlap. It was my understanding that the Linux (not SUS) prototype for memcpy looks like -

void* memcpy(void *restrict dest, const void *restrict src, size_t count);

However, when I looked at man7.org/memcpy it seems that the declarations is -

void *memcpy(void dest[restrict .n], const void src[restrict .n], size_t n);

My questions are -

  1. When did this syntax get introduced? C99 or later or is this some GNU extension?
  2. What does the . before n signify? I am familiar with the variable length array declaration. Is the . for the variable appearing after the array specification? Is this part of the standard?
like image 653
tinkerbeast Avatar asked Sep 06 '25 03:09

tinkerbeast


2 Answers

TLDR: It's an ad hoc syntax created in a discussion in a Linux mailing list that is used to express the size of VLA before the variable is declared, the . in .n means n refers to a parameter in the current function declaration, but n may appear after the currently declared parameter. They have also extended the usual int a[restrict n] parameter declaration to void type. I have no idea where such syntax can be found in the official documentation, but the mailing list has all the details.


The change to the memcpy syntax in the Linux library functions manual was introduced by commit c64cd13e. The commit message is copied here verbatim for reference.

Various pages: SYNOPSIS: Use VLA syntax in 'void *' function parameters

Use VLA syntax also for void *, even if it's a bit more weird.

Admittedly, it is weird enough from the C language perspective, because while void f(int n, int[restrict n]) is valid VLA syntax, void f(int n, void[restrict n]) is not because we are not allowed to have arrays of void.

For the . before n, if we dig deeper we can find this thread from the linux-man mailing list.

Let's take an example:

    int getnameinfo(const struct sockaddr *restrict addr,
                    socklen_t addrlen,
                    char *restrict host, socklen_t hostlen,
                    char *restrict serv, socklen_t servlen,
                    int flags);

and some transformations:

    int getnameinfo(const struct sockaddr *restrict addr,
                    socklen_t addrlen,
                    char host[restrict hostlen], socklen_t hostlen,
                    char serv[restrict servlen], socklen_t servlen,
                    int flags);


    int getnameinfo(socklen_t hostlen;
                    socklen_t servlen;
                    const struct sockaddr *restrict addr,
                    socklen_t addrlen,
                    char host[restrict hostlen], socklen_t hostlen,
                    char serv[restrict servlen], socklen_t servlen,
                    int flags);

(I'm not sure if I used correct GNU syntax, since I never used that extension myself.)

The first transformation above is non-ambiguous, as concise as possible, and its only issue is that it might complicate the implementation a bit too much. I don't think forward-using a parameter's size would be too much of a parsing problem for human readers.

I personally find the second form not terrible. Being able to read code left-to-right, top-down is helpful in more complicated examples.

The second one is unnecessarily long and verbose, and semicolons are not very distinguishable from commas, for human readers, which may be very confusing.

    int foo(int a; int b[a], int a);
    int foo(int a, int b[a], int o);

Those two are very different to the compiler, and yet very similar to the human eye. I don't like it. The fact that it allows for simpler compilers isn't enough to overcome the readability issues.

This is true, I would probably use it with a comma and/or syntax highlighting.

I think I'd prefer having the forward-using syntax as a non-standard extension --or a standard but optional language feature-- to avoid forcing small compilers to implement it, rather than having the GNU extension standardized in all compilers.

The problems with the second form are:

  • it is not 100% backwards compatible (which maybe ok though) as the semantics of the following code changes:

int n; int foo(int a[n], int n); // refers to different n!

Code written for new compilers could then be misunderstood by old compilers when a variable with 'n' is in scope.

  • it would generally be fundamentally new to C to have backwards references and parser might need to be changes to allow this

  • a compiler or tool then has to deal also with ugly corner cases such as mutual references:

int foo(int (*a)[sizeof(*b)], int (*b)[sizeof(*a)]);

We could consider new syntax such as

int foo(char buf[.n], int n);

Personally, I would prefer the conceptual simplicity of forward declarations and the fact that these exist already in GCC over any alternative. I would also not mind new syntax, but then one has to define the rules more precisely to avoid the aforementioned problems.

According to my understanding, this basically means the . is a way to refer to a VLA array size parameter that is used before declaration, and one use case is to handle mutual references.

There is a follow-up thread that states,

I am ok with the syntax, but I am not sure how this would work. If the type is determined only later you would still have to change parsers (some C compilers do type checking and folding during parsing, so need the types to be known during parsing) and you also still have the problem with the mutual dependencies.

We thought about using this syntax

int foo(char buf[.n], int n);

because it is new syntax which means we can restrict the size to be the name of a parameter instead of allowing arbitrary expressions, which then makes forward references less problematic. It is also consistent with designators in initializers and could also be extend to annotate flexible array members or for storing pointers to arrays in structures:

struct { int n; char buf[.n]; };

struct { int n; char (*buf)[.n]; };

Of course, there was also objection, which I think many people in the SO community would agree with,

the only point i strongly care about is this one:

Manual pages should not use

  • non-standard syntax
  • non-portable syntax
  • ambiguous syntax (i.e. syntax that might have different meanings with different compilers or in different contexts)
  • syntax that might be invalid or dangerous with some widely used compiler collections like GCC or LLVM
like image 103
Weijun Zhou Avatar answered Sep 07 '25 23:09

Weijun Zhou


For both questions, the VLA notation appears to be a goal of a design principle for C23 whereby "APIs should be self-documenting when possible". See Programming Language C - C23 Charter.

The dot notation does not appear in the April 2023 C23 draft, and I speculate it is a wish-list item for a future revision of the standard. The author of the dot notation openly admits that it's not valid syntax, and gives reasons why he chose it, at 1eed67e

The notation seems to originate in the Linux development community, and its use in published man-pages documentation appears to be somewhat speculative. It was introduced with commits 1eed67e (the commit message is a better answer to this question than I can manage) and c64cd13, and the language "Use VLA syntax also for void *, even if it's a bit more weird.".

The language "even if it's a bit more weird" tells me that the author hopes the syntax might eventually be considered for inclusion in the C standard, since he doesn't cite any authoritative source like a draft or a compiler implementation.

As far as the variable length array feature, it has been supported in GCC as extension since C90 and as a standard since C99: GCC Variable Length documentation. The dot notation used is man-pages in not yet implemented in any GCC version, AFAIK.

glibc uses the void * notation in the header files at the time of this writing (Sep 3, 2023).

like image 20
Jaredo Mills Avatar answered Sep 07 '25 22:09

Jaredo Mills