Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

Constexpr and SSE intrinsics

An SSE Stdlib-esque Library?

c++ c visual-c++ assembly sse

Best way to load a 64-bit integer to a double precision SSE2 register?

assembly double sse sse2 int64

Get index of first element that is not zero in a __m256 variable

c++ c sse simd avx

Does rewriting memcpy/memcmp/... with SIMD instructions make sense?

performance sse simd

Optimizing code using Intel SSE intrinsics for vectorization

c sse sse3 sse4

Intel Intrinsics guide - Latency and Throughput

Sum reduction of unsigned bytes without overflow, using SSE2 on Intel

x86 sse simd sse2 sse3

Fast vectorized rsqrt and reciprocal with SSE/AVX depending on precision

performance sse simd avx

Converting float vector to 16-bit int without saturating

c++ c performance sse

Load address calculation when using AVX2 gather instructions

x86 sse simd avx2

SIMD the following code

c x86 sse simd

parallel prefix (cumulative) sum with SSE

c sum openmp sse

GCC emits vastly different code using "-march=native" on similar architectures

c gcc assembly sse avx

How can I disable vectorization while using GCC?

Fast 24-bit array -> 32-bit array conversion?

Getting max value in a __m128i vector with SSE?

c assembly x86 sse

Vectorizing with unaligned buffers: using VMASKMOVPS: generating a mask from a misalignment count? Or not using that insn at all

gcc assembly x86 sse avx

Does Java strictfp modifier have any effect on modern CPUs?

Compact a hex number

c++ bit-manipulation sse