Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx

What's the fastest way to perform an arbitrary 128/256/512 bit permutation using SIMD instructions?

c++ assembly sse avx avx2

8 bit shift operation in AVX2 with shifting in zeros

c sse simd avx avx2

Disabling AVX2 in CPU for testing purposes

Does the Linux kernel have its own SSE/AVX context?

Fastest way to expand bits in a field to all (overlapping + adjacent) set bits in a mask?

c assembly x86 sse avx

What's the difference between vextracti128 and vextractf128?

x86 simd avx avx2

Horizontal minimum and maximum using SSE

c++ max sse minimum avx

Using SIMD on amd64, when is it better to use more instructions vs. loading from memory?

Half-precision floating-point arithmetic on Intel chips

Unexpectedly good performance with openmp parallel for loop

Aligned and unaligned memory access with AVX/AVX2 intrinsics

gcc avx avx2

Efficiently find least significant set bit in a large array?

Difference between the AVX instructions vxorpd and vpxor

vectorization intel xor simd avx

Which versions of Windows support/require which CPU multimedia extensions? (How to check if SSE or AVX are fully usable?)

windows assembly sse avx avx512

Are older SIMD-versions available when using newer ones?

c++ c sse simd avx

How to get data out of AVX registers?

c++ visual-c++ avx fma

How to clear the upper 128 bits of __m256 value?

c x86 simd avx avx2

Generate code for multiple SIMD architectures

gcc simd avx sse4

Find index of maximum element in x86 SIMD vector

c++ x86 sse simd avx intel

practical BigNum AVX/SSE possible?