Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AVX512 compare to vector not to mask

Tags:

x86-64

avx512

I miss the compare instructions in avx2 that produce a vector instead of a mask. What is the most efficient way to accomplish the same thing in avx512? Is it _mm512_cmp_ps_mask followed by an expand?

like image 479
bumpbump Avatar asked Jan 19 '26 21:01

bumpbump


1 Answers

Yes, I think just compare and vpmovm2d, although very often you can use merge-masking or zero-masking (possibly with a set1(-1) constant) for the next step, instead of whatever you were going to do with a vector. e.g. for counting matches, instead of _mm_sub_epi32() with the vector 0/-1 compare result, just do a merge-masked add.

Of course, for 256-bit vectors, the AVX2 compare instructions are still usable. Probably not worth it to unpack halves of a 512-bit vector, but it's sometimes worth it to avoid 512-bit vectors entirely with AVX-512 (e.g. to avoid clock-speed penalties on some CPUs, and also to avoid the shutdown of the vector ALU on port 1). So you still take advantage of the useful new instructions in AVX-512, and the extra registers (x/ymm16..31) for operands that don't need to be used with VEX-coded AVX1/AVX2-only instructions.

Still, there are cases where it might be worthwhile to just accept the penalty of needing to turn a mask back into a vector in order to use 512-bit vectors.

like image 141
Peter Cordes Avatar answered Jan 21 '26 17:01

Peter Cordes



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!