Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in micro-optimization

How to force NASM to encode [1 + rax*2] as disp32 + index*2 instead of disp8 + base + index?

Most efficient popcount on `__uint128_t`?

What's the easiest way to determine if a register's value is equal to zero or not?

Difference between "or eax,eax" and "test eax,eax" [duplicate]

Which Intel microarchitecture introduced the ADC reg,0 single-uop special case?

Why are bitwise operators slower than multiplication/division/modulo?

Is thread time spent in synchronization too high?

Does calling the constructor of an empty class actually use any memory?

Faster implementation of Math.round?

Java: micro-optimizing array manipulation

Check the existence of a HashMap key

Extreme optimization of integer binary search

Why is `arr.take(idx)` faster than `arr[idx]`

Does Skylake need vzeroupper for turbo clocks to recover after a 512-bit instruction that only reads a ZMM register, writing a k mask?

What are the costs of failed store-to-load forwarding on x86?

What's the most efficient way to make bitwise operations in a C array

SSE micro-optimization instruction order

AND faster than integer modulo operation?

LINQ Count() until, is this more efficient?