Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx

SSE runs slow after using AVX

c++ gcc x86 avx sse2

Does Clang have something like #pragma GCC target?

clang intrinsics avx pragma

What is the most efficient way to clear a single or a few ZMM registers on Knights Landing?

Packing and de-interleaving two __m256 registers

c++ x86 simd avx avx2

How to do an indirect load (gather-scatter) in AVX or SSE instructions?

c vector intel sse avx

Why both? vperm2f128 (avx) vs vperm2i128 (avx2)

intel simd avx avx2

Is it useful to use VZEROUPPER if your program+libraries contain no SSE instructions?

Is it okay to mix legacy SSE encoded instructions and VEX encoded ones in the same code path?

assembly x86 sse avx intel

Is it possible to use SIMD instructions in Rust?

rust simd avx avx2

When using a mask register with AVX-512 load and stores, is a fault raised for invalid accesses to masked out elements?

x86 avx avx512

Is vxorps-zeroing on AMD Jaguar/Bulldozer/Zen faster with xmm registers than ymm?

what's the difference between _mm256_lddqu_si256 and _mm256_loadu_si256

Using AVX with GCC - avxintrin.h missing

c++ gcc avx

AVX/SSE version of xorshift128+

c performance sse avx

L1 memory bandwidth: 50% drop in efficiency using addresses which differ by 4096+64 bytes

c caching memory x86 avx

is there an inverse instruction to the movemask instruction in intel avx2?

x86 intrinsics avx avx2 icc

Bitwise xor of two 256-bit integers

sse simd avx

Fastest Implementation of Exponential Function Using AVX

x86 simd avx exponential avx2

Get sum of values stored in __m256d with SSE/AVX

c++ optimization sse avx avx2

Why is GCC's AVX slower while LLVM's faster?

gcc assembly llvm julia avx