Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx

Macro for generating immediates for AVX shuffle intrinsics

c macros intel intrinsics avx

optimising column-wise maximum with SIMD

c++ sse simd intrinsics avx

Find Absolute in AVX

Force compiler to use memory operand from Intrinsics

c memory intrinsics avx operands

AVX-512 Instruction Encoding - {er} Meaning

assembly x86 avx avx512

Improving a recursive hadamard transformation

c simd avx

No insert and extract for float/double in SSE and AVX?

c++ floating-point sse simd avx

Why won't simple code get auto-vectorized with SSE and AVX in modern compilers?

Why gcc is so much worse at std::vector<float> vectorization of a conditional multiply than clang?

Penalty for switching from SSE to AVX?

c++ sse avx sse2

Getting wrong results with using AVX instructions and -O3 compiling option

c compiler-optimization avx

Testing whether AVX register contains some equal integer numbers

c++ x86 simd avx avx2

Why is this code using VMULPD to write registers that will be overwritten by VFMADD? Isn't that useless?

assembly avx fma

Why _umul128 works slower than scalar code for mul128x64x2 function?

How to optimise my AVX Code

Does Hyperthreading have trouble with AVX?

Vectorization of modulo multiplication

c++ algorithm sse simd avx

How to run bitwise OR on big vectors of u64 in the most performant manner?

c++ performance assembly cpu avx

_mm256_fmadd_ps is slower than _mm256_mul_ps + _mm256_add_ps?

Why is (V)SHUFPS not in Intel's constant time instruction list?