Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

SSE Instructions: Byte+Short

x86 sse instructions

std::bitset and SSE instructions

c++ sse bitset

For for an SSE vector that has all the same components, generate on the fly or precompute?

c++ sse simd avx

How to write c++ code that the compiler can efficiently compile to SSE or AVX?

Find the first instance of a character using simd

x86 sse simd avx avx2

Need some constructive criticism on my SSE/Assembly attempt

assembly x86 sse

What is the best way to perform branching using Intel SSE?

What is the fastest way to do a SIMD gather without AVX(2)?

x86 sse simd sse4

How many clock cycles does cost AVX/SSE exponentiation on modern x86_64 CPU?

c++ x86 x86-64 sse avx

Forcing AVX intrinsics to use SSE instructions instead

difference between load1 and broadcast intrinsics

x86 sse simd intrinsics intel

Extracting SSE shuffled 32 bit value with only SSE2

c optimization sse

SSE and AVX intrinsics mixture

c++ performance sse simd avx

How does endianness work with SIMD registers?

x86 sse endianness simd

Is there a more direct method to convert float to int with rounding than adding 0.5f and converting with truncation?

transpose for 8 registers of 16-bit elements on SSE2/SSSE3

assembly matrix x86 sse simd

How to convert a hex float to a float in C/C++ using _mm_extract_ps SSE GCC instrinc function

c++ gcc floating-point hex sse

Cannot use SSSE3 on enabled cpu

c linux ubuntu intel sse

Segmentation fault while working with SSE intrinsics due to incorrect memory alignment

c memory sse icc

Why is permute needed in parallel SIMD/SSE/AVX ?

permutation sse simd avx