Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in micro-optimization

Check the existence of a HashMap key

Extreme optimization of integer binary search

Why is `arr.take(idx)` faster than `arr[idx]`

Does Skylake need vzeroupper for turbo clocks to recover after a 512-bit instruction that only reads a ZMM register, writing a k mask?

What are the costs of failed store-to-load forwarding on x86?

What's the most efficient way to make bitwise operations in a C array

SSE micro-optimization instruction order

AND faster than integer modulo operation?

LINQ Count() until, is this more efficient?

Is it useful to use VZEROUPPER if your program+libraries contain no SSE instructions?

Why does declaring a counter variable outside of a nested function make a loop 5x slower?

Is vxorps-zeroing on AMD Jaguar/Bulldozer/Zen faster with xmm registers than ymm?

what's the difference between _mm256_lddqu_si256 and _mm256_loadu_si256

Inlining of a recursive function

C pointers vs direct member access for structs

Improving the Quick sort

Does rearranging a conditional evaluation speed up a loop?

Which of these pieces of code is faster in Java?

Implementing "logical not" using less than 5 bitwise operators