Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

SSE/AVX floating point convert exceptions

Writing a piece of C code such that compiler uses SSE4.1 instruction for generating assembly Code

c optimization gcc sse simd

Intel x86_64 assembly compare signed double precision floats

Testing which trits are set in a binary representation

Euclidean distance using intrinsic instruction

Convert 16 bits mask to 16 bytes mask

Broadcast one arbitrary element of __m128 vector

c++ x86 sse simd sse2

Most efficient way to convert vector of float to vector of uint32?

assembly floating-point sse

SSE2 8x8 byte-matrix transpose code twice as slow on Haswell+ then on ivy bridge

Loop is not vectorized when variable extent is used

Sign of the maximum absolute value in an __m128, SSE4

c++ sse simd

cost of if check vs sse operation?

c sse

Moving 2 QWORDs from general purpose registers into an XMM register as high/low [duplicate]

assembly x86-64 masm sse

Fast way to set single bit in SSE datatypes (__m128i)?

c++ bit-manipulation intel sse

different results with and without SSE ( float arrays multiplication)

c++ arrays floating-point sse

C++ load and store optimizations and heap objects

c++ sse simd

AVX vs. SSE: expect to see a larger speedup

performance sse simd avx

Is there a way to mask one end of a __m128i register based on mask length that is not known at compile time?

sse simd avx

Using SSE to speed up lower_bound function

c assembly x86 x86-64 sse