Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

Out-of-range floating point to integer conversion breaks in VS2022 executable when linking VS2017 or VS2019 libraries

optimising column-wise maximum with SIMD

c++ sse simd intrinsics avx

Should you pass __m128 (and other register types) by reference or by copy?

c++ simd sse intrinsics

average operation ARM NEON

arm sse simd neon intrinsics

How to compile a project which requires SSE2 on MacBook with M1 chip?

Why is SIMD slower than scalar counterpart

assembly x86 sse simd

CVTTSD2SI - a truncating instruction - uses rounding with "inexact" results?

How to store 4 32 bit floats into one 128 bit xmm register?

assembly x86 x86-64 sse simd

gcc vector extensions don't work as stated in docs

gcc sse vectorization

How to move (up to) 16 single bytes into an XMM register?

assembly x86 intel sse simd

No insert and extract for float/double in SSE and AVX?

c++ floating-point sse simd avx

Auto-vectorize shuffle instruction

c sse avx2 auto-vectorization

Why won't simple code get auto-vectorized with SSE and AVX in modern compilers?

Reading SSE registers (XMM, YMM) in a signal handler

Why do x86 FP compares set CF like unsigned integers, instead of using signed conditions?

assembly x86 sse sse2 x87

Extract scalar value from SSE vector

c x86 sse simd

Vectorization of modulo multiplication

c++ algorithm sse simd avx

Does RSQRTSS break the dependency on the destination register?

_mm256_fmadd_ps is slower than _mm256_mul_ps + _mm256_add_ps?

Call libmvec functions manually on __m128 vectors?

c simd sse glibc intrinsics