Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in micro-optimization

How to improve performance on a function that operates on two arrays in clojure

Why move 32-bit register to stack then from stack to xmm register?

Weird performance effects from nearby dependent stores in a pointer-chasing loop on IvyBridge. Adding an extra load speeds it up?

Ever any performance different between Java >> and >>> right shift operators?

Why does the compiler not always optimize away local variables?

Non-virtual interface? (Need a very performant low level abstraction)

Is it useful to check if a Java collection is empty before beginning iteration?

C# lambda allocation and collection

MIPS (curiosity) faster way of clearing a register?

Accessing arbitrary 16-bit elements packed in a 128-bit register

What is the most optimal way to use a C# struct as the key of a dictionary?

Numpy performance gap between len(arr) and arr.shape[0]

Do most compilers optimize MATMUL(TRANSPOSE(A),B)?

Is using AVX2 can implement a faster processing of LZCNT on a word array?

Efficient mod 3 in x86 assembly

Why is POP slow when using register R12?

Microoptimization: iterating with local variable vs. class member

c++ micro-optimization

Why is an empty function call in python around 15% slower for dynamically compiled python code

Alternative schemes for implementing vptr?

How to MOVe 3 bytes (24bits) from memory to a register?