Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

Minimum and maximum of signed zero

Best way to shuffle 64-bit portions of two __m128i's

intel sse simd intrinsics

Java performance in numerical algorithms

SSE Code runs 30% faster, yet when in use show over 20% CPU increase

c sse

Using ymm registers as a "memory-like" storage location

assembly x86 sse avx

efficient way to convert scatter indices into gather indices?

Permuting bytes inside SSE __m128i register

optimization sse simd

How to merge a scalar into a vector without the compiler wasting an instruction zeroing upper elements? Design limitation in Intel's intrinsics?

c gcc x86 sse intrinsics

Can PTEST be used to test if two registers are both zero or some other condition?

assembly x86 sse intrinsics sse4

libc's system() when the stack pointer is not 16-padded causes segmentation fault

Neon equivalent to SSE intrinsics

c arm sse multiplication neon

Faster assembly optimized way to convert between RGB8 and RGB32 image

Is there still any development on SIMD in Mono?

c# mono sse simd

Matrix-vector-multiplication in AVX not proportionately faster than in SSE

Print value of __m128 datatype in gdb debugger

c++ gdb sse simd intrinsics

How to convert 'long long' (or __int64) to __m64

Bypass delays when switching execution unit domains

assembly intel sse

Optimal SSE unsigned 8 bit compare

c x86 sse simd sse4

Questions regarding operations on NaN