Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx2

Sparse array compression using SIMD (AVX2)

perf report shows this function "__memset_avx2_unaligned_erms" has overhead. does this mean memory is unaligned?

c++ profiling avx perf avx2

gcc auto vectorization control flow in loop

c gcc avx2 auto-vectorization

Is using AVX2 can implement a faster processing of LZCNT on a word array?

AVX2, How to Efficiently Load Four Integers to Even Indices of a 256 Bit Register and Copy to Odd Indices?

x86 sse simd avx avx2

How to convert 32-bit float to 8-bit signed char? (4:1 packing of int32 to int8 __m256i)

c x86 simd intrinsics avx2

_mm_alignr_epi8 (PALIGNR) equivalent in AVX2

x86 simd intrinsics avx avx2

Loading 8 chars from memory into an __m256 variable as packed single precision floats

c++ sse simd avx avx2

Using a variable to index a simd vector with _mm256_extract_epi32() intrinsic

simd intrinsics avx avx2

Why do processors with only AVX out-perform AVX2 processors for many SIMD algorithms?

c# c++ simd avx avx2

Does /arch:AVX enable AVX2?

Best way to load/store from/to general purpose registers to/from xmm/ymm register

assembly x86 simd sse2 avx2

Fully utilizing pipelines on kaby lake

How to concatenate two vector efficiently using AVX2? (a lane-crossing version of VPALIGNR)

c simd intrinsics avx avx2

Counting 1 bits (population count) on large data using AVX-512 or AVX-2

Shifting SSE/AVX registers 32 bits left and right while shifting in zeros

x86 sse simd avx avx2

Efficient way of rotating a byte inside an AVX register

c sse simd avx avx2

Count leading zero bits for each element in AVX2 vector, emulate _mm256_lzcnt_epi32

Optimal SIMD algorithm to rotate or transpose an array