Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx2

Get sum of values stored in __m256d with SSE/AVX

c++ optimization sse avx avx2

What's the fastest way to perform an arbitrary 128/256/512 bit permutation using SIMD instructions?

c++ assembly sse avx avx2

8 bit shift operation in AVX2 with shifting in zeros

c sse simd avx avx2

Disabling AVX2 in CPU for testing purposes

Why are some Haswell AVX latencies advertised by Intel as 3x slower than Sandy Bridge?

What's the difference between vextracti128 and vextractf128?

x86 simd avx avx2

Why does storing to and loading from an AVX2 256bit vector have different results in debug and release mode? [duplicate]

Aligned and unaligned memory access with AVX/AVX2 intrinsics

gcc avx avx2

What's the fastest stride-3 gather instruction sequence?

c++ x86 vectorization avx2

How to clear the upper 128 bits of __m256 value?

c x86 simd avx avx2

Load address calculation when using AVX2 gather instructions

x86 sse simd avx2

Can I use the AVX FMA units to do bit-exact 52 bit integer multiplications?

floating-point x86 simd avx2 fma

Scatter intrinsics in AVX

intrinsics avx avx2

AVX2: Computing dot product of 512 float arrays

c++ simd avx2 dot-product fma

Transpose an 8x8 float using AVX/AVX2

simd avx avx2

How to find the horizontal maximum in a 256-bit AVX vector

Haswell memory access

How are the gather instructions in AVX2 implemented?

intel ram simd avx avx2

In what situation would the AVX2 gather instructions be faster than individually loading the data?

How to tell if a Linux machine supports AVX/AVX2 instructions?

linux unix avx suse avx2