Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx2

What's the difference between vextracti128 and vextractf128?

x86 simd avx avx2

Why does storing to and loading from an AVX2 256bit vector have different results in debug and release mode? [duplicate]

Aligned and unaligned memory access with AVX/AVX2 intrinsics

gcc avx avx2

What's the fastest stride-3 gather instruction sequence?

c++ x86 vectorization avx2

How to clear the upper 128 bits of __m256 value?

c x86 simd avx avx2

Load address calculation when using AVX2 gather instructions

x86 sse simd avx2

Can I use the AVX FMA units to do bit-exact 52 bit integer multiplications?

floating-point x86 simd avx2 fma

Scatter intrinsics in AVX

intrinsics avx avx2

AVX2: Computing dot product of 512 float arrays

c++ simd avx2 dot-product fma

Transpose an 8x8 float using AVX/AVX2

simd avx avx2

How to find the horizontal maximum in a 256-bit AVX vector

Haswell memory access

How are the gather instructions in AVX2 implemented?

intel ram simd avx avx2

In what situation would the AVX2 gather instructions be faster than individually loading the data?

How to tell if a Linux machine supports AVX/AVX2 instructions?

linux unix avx suse avx2

Why is Intel Haswell XEON CPU sporadically miscomputing FFTs and ART?

AVX2 what is the most efficient way to pack left based on a mask?

c++ vectorization sse simd avx2