Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest inverse of _mm_movemask_ps()?

Tags:

simd

sse

In How to perform the inverse of _mm256_movemask_epi8 (VPMOVMSKB)?, the OP asks for the inverse of _mm256_movemask_epi8, but with SSE's _mm_movemask_ps(), is there a simpler version? This is the best I could come up with, which isn't too bad.

__m128 movemask_inverse(int x) {
    __m128 m = _mm_setr_ps(x & 1, x & 2, x & 4, x & 8);
    return _mm_cmpneq_ps(m, _mm_setzero_ps());
}
like image 446
Vortico Avatar asked Oct 31 '25 10:10

Vortico


1 Answers

The efficiency of your inverse movemask strongly depends on the compiler. With gcc it takes about 21 instructions.

But, with clang -std=c99 -O3 -m64 -Wall -march=nehalem the code vectorizes well, and the results are not too bad actually:

movemask_inverse_original:              # @movemask_inverse_original
        movd    xmm0, edi
        pshufd  xmm0, xmm0, 0           # xmm0 = xmm0[0,0,0,0]
        pand    xmm0, xmmword ptr [rip + .LCPI0_0]
        cvtdq2ps        xmm1, xmm0
        xorps   xmm0, xmm0
        cmpneqps        xmm0, xmm1
        ret
    

Nevertheless, you don't need the cvtdq2ps integer to float conversion. It is more efficient to compute the mask in the integer domain, and cast (without conversion) the results to float afterwards. Peter Cordes' answer on: is there an inverse instruction to the movemask instruction in intel avx2?, discusses many ideas on the AVX2 case. Most of these ideas can be used in some form for the SSE case too. The LUT solution and the ALU solution are suitable for your case.

ALU solution with intrinsics:

__m128 movemask_inverse_alternative(int x) {
    __m128i msk8421 = _mm_set_epi32(8, 4, 2, 1);
    __m128i x_bc = _mm_set1_epi32(x);
    __m128i t = _mm_and_si128(x_bc, msk8421);
    return _mm_castsi128_ps(_mm_cmpeq_epi32(msk8421, t));
}

Generated assembly with gcc 8.3: gcc -std=c99 -O3 -m64 -Wall -march=nehalem

movemask_inverse_alternative:
  movd xmm1, edi
  pshufd xmm0, xmm1, 0
  pand xmm0, XMMWORD PTR .LC0[rip]
  pcmpeqd xmm0, XMMWORD PTR .LC1[rip]
  ret
like image 170
wim Avatar answered Nov 02 '25 23:11

wim



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!