Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

round much slower than floor/ceil/int in LLVM

I was benchmarking some essential routines by executing cycles such as:

float *src, *dst;
for (int i=0; i<cnt; i++) dst[i] = round(src[i]);

All with AVX2 target, newest CLANG. Interestingly floor(x), ceil(x), int(x)... all seem fast. But round(x) seems exremely slow and looking into disassembly there's some weird spaghetti code instead of the newer SSE or AVX versions. Even when blocking the ability to vectorize the loops by introducing some dependency, round is like 10x slower. For floor etc. the generated code uses vroundss, for round there's the spaghetti code... Any ideas?

Edit: I'm using -ffast-math, -mfpmath=sse, -fno-math-errno, -O3, -std=c++17, -march=core-avx2 -mavx2 -mfma

like image 668
Vojtěch Melda Meluzín Avatar asked Oct 27 '25 16:10

Vojtěch Melda Meluzín


1 Answers

The problem is that none of the SSE rounding modes specify the correct rounding for round:

These functions round x to the nearest integer, but round halfway cases away from zero (regardless of the current rounding direction, see fenv(3)), instead of to the nearest even integer like rint(3).

If you want faster code, you could try testing rint instead of round, as that specifies a rounding mode that SSE does support.

like image 187
Chris Dodd Avatar answered Oct 30 '25 06:10

Chris Dodd