Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

reordering 3D vector triplets in column major order is slow

c++ c sse simd

Understand whether code sample is CPU bound or Memory bound

c performance optimization sse

x64 SSE data types

assembly 64-bit sse

What gcc option enables loop unrolling for SSE intrinsics with immediate operands?

c gcc sse

vectorized sum in Fortran

fortran sse gfortran simd avx

Bilinear filter with SSE4.1 intrinsics

Learning SSE/SSE2 and asm optimizations

assembly graphics x86 x86-64 sse

SSE42 & STTNI - PcmpEstrM is twice slower than PcmpIstrM, is it true?

c++ performance sse sse4

Why move 32-bit register to stack then from stack to xmm register?

SSE2: How To Load Data From Non-Contiguous Memory Locations?

SIMD/SSE newbie: simple image filtering

How would you write code for unsigned addition likely to be optimized into one SSE instruction?

c++ c sse

Is there any situation where using MOVDQU and MOVUPD is better than MOVUPS?

assembly x86 x86-64 intel sse

Is shufps slower than memory access?

c++ assembly sse simd

find nan in array of doubles using simd

c nan sse simd avx

How do I perform 8 x 8 matrix operation using SSE?

c++ sse intrinsics

SIMD array add for arbitrary array lengths

c arrays sse simd sse2

How to store lower or higher values from AVX/AVX2(YMM) register to memory like the SSE movlps/movhps does?

x86 sse simd avx avx2