Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in avx

(Vec4 x Mat4x4) product using SIMD and improvements

c++ matrix simd avx sse3

Why dont use the AVX Registers as a ultra fast cache?

Automatically generate FMA instructions in MSVC

c++ visual-c++ x86 avx fma

Computing 8 horizontal sums of eight AVX single-precision floating-point vectors

Efficiently gather individual bytes, separated by a byte-stride of 4

c intrinsics avx

Need for fast data demuxing in C# by using multi-threading, AVX, GPU or whatever

Preventing GCC from automatically using AVX and FMA instructions when compiled with -mavx and -mfma

c++ gcc vectorization avx fma

Large (0,1) matrix multiplication using bitwise AND and popcount instead of actual int or float multiplies?

How to align stack at 32 byte boundary in GCC?

gcc stack sse avx

How to force gcc to use all SSE (or AVX) registers?

Horizontal XOR in AVX

c++ assembly x86 simd avx

Do 128bit cross lane operations in AVX512 give better performance?

performance x86 intel avx avx512

Parallel programming using Haswell architecture [closed]

sse cpu-architecture avx avx2

Does vzeroall zero registers ymm16 to ymm31?

assembly x86 intel avx avx512

Is L2 HW prefetcher really helpful?

AVX log intrinsics (_mm256_log_ps) missing in g++-4.8?

c++ g++ intrinsics avx

How to efficiently combine comparisons in SSE?

c optimization assembly sse avx

Fastest way to unpack 32 bits to a 32 byte SIMD vector

x86 simd avx bitmask avx2

Do all CPUs which support AVX2 also support SSE4.2 and AVX?

sse simd avx avx2