Consider the following assembly program:
bits 64
global _start
_start:
    mov rax, 0x0000111111111111
    add byte [rax*1+0x0], al
    jmp _start
When you compile this with nasm and ld (on Ubuntu, kernel 5.4.0-48-generic, Ryzen 3900X), you get a segfault:
$ ./segfault-addr
[1]    107116 segmentation fault (core dumped)  ./segfault-addr
When you attach gdb you can see the address that caused this fault:
(gdb) p $_siginfo._sifields._sigfault.si_addr
$1 = (void *) 0x111111111111
However, if you set any of the 16 most significant bits to 1 like this:
bits 64
global _start
_start:
    mov rax, 0x0001111111111111
    add byte [rax*1+0x0], al
    jmp _start
You obviously still get a segfault, but now the address is NULL:
(gdb) p $_siginfo._sifields._sigfault.si_addr
$1 = (void *) 0x0
Why is this happening? Is it caused by gdb, Linux, or the CPU itself?
Is there anything I can do to prevent this behavior?
It's the difference between canonical and non-canonical addresses, coming from the fact that the x86-64 doesn't have a full 64-bit virtual address space. Your second example is a non-canonical address as it isn't a sign-extended 48-bit value (you apparently don't have the 5-level page table extension on your machine or it would be 57 bits); such addresses can never resolve to a physical memory location.
Invalid accesses to canonical addresses generate a page fault (#PF), for which the CPU provides the faulting address to the kernel (in the CR2 register), and the kernel passes it along to userspace in the si_addr field of struct siginfo as you see.  But accesses to non-canonical addresses are always invalid and the CPU raises a general protection exception (#GP), or in rare cases, a stack fault (#SS).  The designers of the x86 architecture chose, in their infinite wisdom, not to provide the faulting address to software in case of a #GP or #SS exception, so the kernel doesn't get it and neither do you.
If you really need the address, your only choice is to decode the instruction that caused the exception, and inspect the contents of registers as needed to work out what it was trying to do.
I presume this decision was because the kernel really needs the address in case of a page fault. An access to a not-present page may be a memory violation that should kill the process; or, for instance, it may simply be a page that has been swapped out from physical memory. In the latter case, the kernel uses the fault address to find the appropriate page on disk and load it back into physical memory. Then it updates the page tables and returns from the exception handler to restart the faulting instruction, and the program can continue.
However, a general protection fault is typically unrecoverable, and the process will have to be killed, or at least signaled so it can try to clean up. In this case there is nothing actionable to be done with the faulting address, and I guess the architecture designers didn't think its potential value for debugging was worth the effort of having the CPU save it. Anyway, many possible causes of #GP don't arise from a memory access at all (e.g. trying to read or write control registers from unprivileged mode), in which case there is no faulting address.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With