Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

Using SSE instructions

Why is my hand-tuned, SSE-enabled code so slow?

c++ optimization opencv sse

What are the best instruction sequences to generate vector constants on the fly?

assembly x86 sse simd avx

best cross-platform method to get aligned memory

Can one construct a "good" hash function using CRC32C as a base?

hash intel sse crc32

Are different mmx, sse and avx versions complementary or supersets of each other?

x86 sse avx mmx

SSE instructions: which CPUs can do atomic 16B memory operations?

Difference between MOVDQA and MOVAPS x86 instructions?

assembly x86 sse simd mov intel

Intel SSE and AVX Examples and Tutorials [closed]

intel sse vectorization avx

What does ordered / unordered comparison mean?

Why is strcmp not SIMD optimized?

c++ sse simd strcmp sse2

AVX2 what is the most efficient way to pack left based on a mask?

c++ vectorization sse simd avx2

Why does mulss take only 3 cycles on Haswell, different from Agner's instruction tables? (Unrolling FP loops with multiple accumulators)

Using AVX intrinsics instead of SSE does not improve speed -- why?

c++ performance gcc sse avx

How to use Fused Multiply-Add (FMA) instructions with SSE/AVX

c sse cpu-architecture avx fma

How to determine if memory is aligned?

c optimization memory sse simd

Getting started with Intel x86 SSE SIMD instructions

c gcc x86 sse simd

Why is this SSE code 6 times slower without VZEROUPPER on Skylake?

performance x86 intel sse avx

How is a vector's data aligned?

SSE intrinsic functions reference

c++ c gcc sse simd