Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

How to move (up to) 16 single bytes into an XMM register?

assembly x86 intel sse simd

No insert and extract for float/double in SSE and AVX?

c++ floating-point sse simd avx

Auto-vectorize shuffle instruction

c sse avx2 auto-vectorization

Why won't simple code get auto-vectorized with SSE and AVX in modern compilers?

Reading SSE registers (XMM, YMM) in a signal handler

Why do x86 FP compares set CF like unsigned integers, instead of using signed conditions?

assembly x86 sse sse2 x87

Extract scalar value from SSE vector

c x86 sse simd

Penalty for switching from SSE to AVX?

c++ sse avx sse2

Shifting a __m128i using _mm_slli_epi64

c sse

GCC access memory above stack top [duplicate]

assembly gcc x86-64 sse red-zone

SSE intrinsics: masking a float and using bitwise and?

c++ sse intrinsics

Questions about the performance of different implementations of strlen [closed]

Fast implementation of covariance of two 8-bit arrays

How do initialize an SIMD vector with a range from 0 to N?

c x86 sse simd intrinsics

Fast copy every second byte to new memory area

c performance sse memcpy sse2

INTEL SIMD: why is inplace multiplication so slow?

Vectorization of modulo multiplication

c++ algorithm sse simd avx

Does RSQRTSS break the dependency on the destination register?

_mm256_fmadd_ps is slower than _mm256_mul_ps + _mm256_add_ps?

Call libmvec functions manually on __m128 vectors?

c simd sse glibc intrinsics