Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading SSE registers (XMM, YMM) in a signal handler

I have an x86_64 instruction "vgatherqpd %ymm7,(%r9,%ymm1,8),%ymm3" for which I need to construct the memory address at runtime in a signal handler in Linux. The signal handler ucontext uc_mcontext.gregs[XED_REG_R9] gives me the value contained in %r9.

But, how to get the value contained in %ymm1? Linux seems to have a sys/ucontext.h file that has struct _libc_fpxreg and _libc_xmmreg fields but I am not sure how to make sense of them. Moreovere there is no reg_ymm.

Appreciate help in solving this problem.

like image 264
user1205476 Avatar asked Sep 06 '25 03:09

user1205476


1 Answers

Getting the avx registers in a signal handler

The kernel makes all registers available to userspace through the third parameter of the signal handler. However, it is not as clear as reading a struct field, mostly because struct _libc_fpxreg is rather a red herring.

As we are interested in YMM registers, that means the cpu we're running on will have xsave and this is what the kernel will be using to store the FPU context. In the following I'll concentrate on that, bear in mind that if you were to try this on older cpus you'd have to adapt the code a bit.

The xsave format

The xsave format consists of multiple blocks each corresponding to a specific cpu feature it handles, depending on the cpu. The first block contains your good old x87 FPU stack and XMM registers and is 512 bytes long. To the end of the block there is reserved space that the kernel uses to track the size and validity of the buffer, which you can use to verify it as well. See struct _fpx_sw_bytes.

The next block is a header and the extensions follow, first of which is the ymm extensions which store the high half of the ymm registers (the lower half is stored in the xmm part of the first block, just like they share space on the processor.

You can see the details in the structures below the one I linked earlier, however it is not exactly described in the architecture programming manual (at least from AMD). Therefore I think it is best just to leave the structure alone and use the cpu instruction xrstor to read it - after all it knows best what's in there.

Where is the buffer located

Because the cpu requires that the xsave buffer is 64 byte-aligned, it'd better not be part of another structure like ucontext_t. The kernel allocates aligned storage on the stack for it with some dynamic padding and stores a pointer to it to the ucontext_t structure (this is uc_mcontext.fpregs). See get_sigframe() for the allocation and other functions around it that fill the pointer.

That's not all, though. If you are in 64-bit mode, this pointer is actually all what you need. In 32-bit mode, this pointer points to a legacy x87 fsave buffer (which is the same as struct _libc_fpxreg) and the xsave buffer is following that. So you must add its size to the value.

The code

In the end, getting ymm0 from the interrupted code looks like this:

#ifdef i386
#define XSAVE_POINTER(p) ((char*)(p)+sizeof(struct _libc_fpstate))
#else
#define XSAVE_POINTER(p) (p)
#endif

typedef int __v8si __attribute__ ((vector_size (32)));
void sighandler(int sig, siginfo_t* info, void* v_context)
{
    ucontext_t* context = v_context;
    /* this will be our memory to put the values to */
    char values[32] __attribute__((aligned(32)));

    /* get ymm registers from the context */
    __builtin_ia32_xrstor(XSAVE_POINTER(context->uc_mcontext.fpregs), 7);

    /* move ymm0 to memory so we can print it out */
    __asm__("vmovdqa %%ymm0, %0"
            : "=m"(values));
like image 188
jpalecek Avatar answered Sep 07 '25 22:09

jpalecek



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!