Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in x86

AVX2 Transpose of a matrix represented by 8x __m256i registers

c x86 transpose simd avx2

Cache coherence literature generally only refers store buffers but not read buffers. Yet one somehow needs both?

Why _umul128 works slower than scalar code for mul128x64x2 function?

How is the transitivity/cumulativity property of memory barriers implemented micro-architecturally?

Why are there too many demand rfo offcore responses /offcore requests?

Can a “PUSH” instruction's operation be performed using other instructions?

Does RDTSCP increment monotonically across multi-cores?

c++ assembly x86 multicore rdtsc

What is the fastest virtual machine design for x86?

Why do segments begin on paragraph boundaries?

CPUID: Why must MISC_ENABLE.LCMV be set to 0 for some functions? Can I temporarily overwrite it?

assembly x86 x86-64 cpuid msr

Loading a file on an ISO 9660 File System

Stack allocation, why the extra space?

How does this asm code setup SEH?

exception assembly x86 masm seh

Least intrusive compile barrier for Java on x86

How are the C++11 memory barriers implemented for x86-like systems?

Do atomic CAS-operations on x86_64 and ARM always use std::memory_order_seq_cst?

CLI instruction not executed in Linux kernel module

Meaning of CS and SS registers on x86-64 Linux in userland?

Do the x86 segment registers have special meaning/usage on modern CPUs and OSes?

Understanding of vectorization with SSE instructions