Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

how to work with 128 bits C variable and xmm 128 bits asm?

c sse simd

SSE micro-optimization instruction order

Initializing an __m128 type from a 64-bit unsigned int

c++ sse intrinsics

How to optimize "u[0]*v[0] + u[2]*v[2]" code line with SSE or GLSL

c++ c optimization sse glm-math

Unable to detect why the following piece of code was not vectorized

c sse vectorization icc stencils

Aligned types and passing arguments by value

c++ stl alignment sse

approximating log10[x^k0 + k1]

Tensorflow installation using SSE instructions with pip

How to do an indirect load (gather-scatter) in AVX or SSE instructions?

c vector intel sse avx

Is there a good double-precision small matrix SIMD library for x86?

Atomic 16 byte read on x64 CPUs

c++ c 64-bit sse lock-free

Is it possible to use SSE and SSE2 to make a 128-bit wide integer?

assembly sse sse2

Most efficient way to store 4 dot products into a contiguous array in C using SSE intrinsics

Is it okay to mix legacy SSE encoded instructions and VEX encoded ones in the same code path?

assembly x86 sse avx intel

Fast counting the number of equal bytes between two arrays [duplicate]

c++ c sse simd sse2

Where is VPERMB in AVX2?

assembly x86 intel sse avx2

Vectorizing Modular Arithmetic

c assembly x86-64 sse intrinsics

Load constant floats into SSE registers

assembly sse

Is it possible to vectorize myNum += a[b[i]] * c[i]; on x86_64?

What's the difference between __popcnt() and _mm_popcnt_u32()?

x86 sse intrinsics sse4