Is there a way using AVX/SSE to take a vector of floats, round-down and produce a vector of ints? All the floor intrinsic methods seem to produce a final vector of floating point, which is odd because rounding produces an integer!
SSE has conversion from FP to integer with your choice of truncation (towards zero) or the current rounding mode (normally the IEEE default mode,  nearest with tiebreaks rounding to even.  Like nearbyint(), unlike round() where the tiebreak is away-from-0.  If you need that rounding mode on x86, you have to emulate it, perhaps with truncate as a building block.)
The relevant instructions are CVTPS2DQ and CVTTPS2DQ to convert packed single-precision floats to signed doubleword integers.  The version with the extra T in the mnemonic does Truncation instead of the current rounding mode.
; xmm0 is assumed to be packed float input vector
cvttps2dq xmm0, xmm0
; xmm0 now contains the (rounded) packed integer vector
Or with intrinsics, __m128i _mm_cvt[t]ps_epi32(__m128 a)
For the other two rounding modes x86 provides in hardware, floor (toward -Inf) and ceil (toward +Inf), a simple way would be using this SSE4.1/AVX ROUNDPS instruction before converting to integer.
The code would look like this:
roundps  xmm0, xmm0, 1    ; nearest=0, floor=1,  ceil=2, trunc=3
cvtps2dq xmm0, xmm0       ; or cvttps2dq, doesn't matter
; xmm0 now contains the floored packed integer vector
For AVX ymm vectors prefix the instructions with 'V' and change the xmm's to ymm's.
ROUNDPS works like this
Round packed single precision floating-point values in xmm2/m128 and place the result in xmm1. The rounding mode is determined by imm8.
the rounding mode (the immediate/the third operand) can have the following values (taken from table 4-15 - Rounding Modes and Encoding of Rounding Control (RC) Field of the current Intel Docs):
Rounding Mode               RC Field Setting   Description
----------------------------------------------------------
Round to nearest (even)     00B                Rounded result is the closest to the infinitely precise result. If two values are equally close, the result is nearest (even) the even value (i.e., the integer value with the least-significant bit of zero).
Round down (toward −∞)      01B                Rounded result is closest to but no greater than the infinitely precise result.
Round up (toward +∞)        10B                Rounded result is closest to but no less than the infinitely precise result.
Round toward 0 (truncate)   11B                Rounded result is closest to but no greater in absolute value than the infinitely precise result.
The probable reason why the return vector of the rounding operation is float and not int may be that in this way the further operations could always be float operations (on rounded values) and a conversion to int would be trivial as shown.
The corresponding intrinsics are found in the referenced docs. An example of transforming the above code to intrinsics (which depend on the Rounding Control (RC) Field) is:
__m128 dst = _mm_cvtps_epi32( _mm_floor_ps(__m128 src) );
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With