Let's say I treat my integer as if it has 4 fraction bits. And now 0 is zero, 16 is one, 32 is two, and it goes on. When rounding, numbers in range [-7, 7] becomes 0, [8, 23] becomes 16.
My code is this:
std::int64_t my_round(std::int64_t n) {
auto q = n / 16;
auto r = n % 16;
if (r >= 0) {
if (r >= 8) {
++q;
}
} else {
if (r <= -8) {
--q;
}
}
return q * 16;
}
It's a lot of code for such a simple task. I wonder if there is a faster way to do it. I only need to support 64bit signed integer.
Edit: There was a comment (I don't remember whom made that comment) suggested adding 15 and masking the lower bits. It didn't work. But with some trial and error, I come up with this.
std::int64_t my_round2(std::int64_t n) {
if (n >= 0) {
n += 8;
}
else {
n += 7;
}
return n & (~15ll);
}
I have no idea. but my_round2 seems to give the same result as my_round and is 20 times faster. If there is a way to remove the branch it would be much better.
With
return (n + 8 + (n>>63)) & (~15ll);
one can shave off the branch from my_round2(), and ensure the original symmetry at zero. The idea is that signed type >> (sizeof(signed type) * 8 - 1) is -1 for negative values and 0 for positive values.
Clang is able to produce branchless code for the original my_round2() but that is still one instruction longer than the proposed routine here. With arm64 the savings are even greater.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With