Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in micro-optimization

Fastest way to set highest order bit of rax register to lowest order bit in rdx register

Optimized 53->32 bit modulo computation on 32-bit processors

Set an XMM register to a repeating byte pattern (broadcast a constant byte)

Performance / Space implications when ordering SQL Server columns?

Using the operand-size override prefix 0x66 for instruction alignment

Assembly function address table and data under the function or in data section

Fastest way to set a single memory cell to zero or a constant in x86 assembly?

How to exchange between 2 bits in a 1-byte number

Bit packing of groups of n repeated bits in a 32-bit word, compact to 1 bit per group

Can the compiler/JIT optimize away short-circuit evaluation if there are no side-effects?

Understanding a specific CIL / CLR optimization

Fastest way to take the average of two signed integers in x86 assembly?

Why do C compilers still prefer push over mov for saving registers, even when mov appears faster in llvm-mca?

Is the fall-through side of a conditional branch more efficient? Is it a good idea to make that the error-handling side?

Efficient UTF-8 character-length decoding for a non-zero character in a 32 bit register

Advantage of using LEA over MOV for passing parameters in Assembly compiled from C++

Is there a faster algorithm for max(ctz(x), ctz(y))?

repz ret: why all the hassle?

Why is my operator ++ more than twice as fast as its equivalent instance method?

_mm256_fmadd_ps is slower than _mm256_mul_ps + _mm256_add_ps?