Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in sse

How to get the number of unique elements of a simd vector in C

c simd sse

First use of AVX 256-bit vectors slows down 128-bit vector and AVX scalar ops

assembly x86-64 sse simd avx

Aligning memory on 16-byte and 32-byte boundaries

memory alignment sse simd avx

Why is masking needed before using a pshufb shuffle as a lookup table for nibbles?

c++ simd sse avx avx2

performance of SSE and AVX when both Memory-band width limited

performance caching sse avx

Set an XMM register to a repeating byte pattern (broadcast a constant byte)

How to best emulate the logical meaning of _mm_slli_si128 (128-bit bit-shift), not _mm_bslli_si128

c sse simd intrinsics sse2

Aliasing of NEON vector data types

c++ c sse simd neon

Meaning of XMM register values shown in Visual Studio debugger's register window

How to convert int 64 to int 32 with avx (but without avx-512)

simd sse avx

Why does __m128 cause alignment issues in a union with float x/y/z?

Out-of-range floating point to integer conversion breaks in VS2022 executable when linking VS2017 or VS2019 libraries

optimising column-wise maximum with SIMD

c++ sse simd intrinsics avx

Should you pass __m128 (and other register types) by reference or by copy?

c++ simd sse intrinsics

average operation ARM NEON

arm sse simd neon intrinsics

How to compile a project which requires SSE2 on MacBook with M1 chip?

Why is SIMD slower than scalar counterpart

assembly x86 sse simd

CVTTSD2SI - a truncating instruction - uses rounding with "inexact" results?

How to store 4 32 bit floats into one 128 bit xmm register?

assembly x86 x86-64 sse simd

gcc vector extensions don't work as stated in docs

gcc sse vectorization