Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx2

Fast modulo-12 algorithm for 4 uint16_t's packed in a uint64_t

What do you do without fast gather and scatter in AVX2 instructions?

How to implement an efficient _mm256_madd_epi8?

c++ x86 simd intrinsics avx2

Efficient implementation of log2(__m256d) in AVX2

Parallel programming using Haswell architecture [closed]

sse cpu-architecture avx avx2

How can I add together two SSE registers

c++ c intel sse avx2

Efficient way to set first N or last N bits of __m256i to 1, the rest to 0

Fastest way to unpack 32 bits to a 32 byte SIMD vector

x86 simd avx bitmask avx2

Do all CPUs which support AVX2 also support SSE4.2 and AVX?

sse simd avx avx2

AVX2 slower than SSE on Haswell

c++ x86 sse simd avx2

Is this incorrect code generation with arrays of __m256 values a clang bug?

Packing and de-interleaving two __m256 registers

c++ x86 simd avx avx2

Fallback implementation for conflict detection in AVX2

c++ x86 intrinsics avx2 avx512

Why both? vperm2f128 (avx) vs vperm2i128 (avx2)

intel simd avx avx2

Where is VPERMB in AVX2?

assembly x86 intel sse avx2

Is it possible to use SIMD instructions in Rust?

rust simd avx avx2

is there an inverse instruction to the movemask instruction in intel avx2?

x86 intrinsics avx avx2 icc

Fastest Implementation of Exponential Function Using AVX

x86 simd avx exponential avx2