Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

Compiling SSE intrinsics in GCC gives an error

gcc x86 intel sse simd

Is there a good way of finding modulus of two variables using SSE? (without SVML)

c++ sse

Move quadword between xmm and general-purpose register in ml64?

AVX2, How to Efficiently Load Four Integers to Even Indices of a 256 Bit Register and Copy to Odd Indices?

x86 sse simd avx avx2

SSE and iostream: wrong output for floating point types

SSE intrinsics cause normal float operation to return -1.#INV

c++ sse intrinsics

Why does _mm_stream_ps produce L1/LL cache misses?

c performance caching gcc sse

Where does the SSE instructions outperform normal instructions

c x86-64 sse

What is the difference between MOVDQA and MOVNTDQA, and VMOVDQA and VMOVNTDQ for WB/WC marked region?

assembly x86 sse simd avx

Visual Studio 2017: _mm_load_ps often compiled to movups

How do you move 128-bit values between XMM registers?

assembly simd sse

Use both SSE2 intrinsics and gcc inline assembler

SSE3 intrinsics: How to find the maximum of a large array of floats

c++ sse intrinsics

Setting __m256i to the value of two __m128i values

c sse simd avx

Loading 8 chars from memory into an __m256 variable as packed single precision floats

c++ sse simd avx avx2

Shuffling by mask with Intel AVX

c++ sse simd intrinsics avx

Control flow divergence in SIMT and SIMD

cuda sse simd

Are there SIMD(SSE / AVX) instructions in the x86-compatible accelerators Intel Xeon Phi?

intel sse simd avx intel-mic

Faster lookup tables using AVX2

Does using mix of pxor and xorps affect performance?

assembly x86 sse simd