Since the function fsin for computing the sin(x) function under the x86 dates back to the Pentium era, and apparently it doesn't even use SSE registers, I was wondering if there is a newer and better set of instructions for computing trigonometric functions.
I'm used to code in C++ and do some asm optimizations, so anything that fits in a pipeline starting from C++, to C to asm will do for me.
Thanks.
I'm under Linux 64 bit for now, with gcc and clang ( even tough clang doesn't really offer any FPU related optimization AFAIK ).
EDIT
sin function, it's usually 2 times faster then std::sin even with sse on.fsin, even tough fsin is usually more accurate, but considering that fsin never outperforms my sin implementation, I'll keep my sin for now, also my sin is totally portable where fsin is for x86 only.If you need an approximation of sine optimized for absolute accuracy over -π … π, use:
x * (1 + x * x * (-0.1661251158026961831813227851437597220432 + x * x * (8.03943560729777481878247432892823524338e-3 + x * x * -1.4941402004593877749503989396238510717e-4))
It can be implemented with:
float xx = x * x;
float s = x + (x * xx) * (-0.16612511580269618f + xx * (8.0394356072977748e-3f + xx * -1.49414020045938777495e-4f));
And perhaps optimized depending on the characteristics of your target architecture. Also, not noted in the linked blog post, if you are implementing this in assembly, do use the FMADD instruction. If implementing in C or C++, if you use, say, the fmaf() C99 standard function, make sure that FMADD is generated. The emulated version is much more expensive than a multiplication and an addition, because what fmaf() does is not exactly equivalent to multiplication followed by addition (so it would be incorrect to just implement it so).
The difference between sin(x) and the above polynomial between -π and π graphs so:

The polynomial is optimized to reduce the difference between it and sin(x) between -π and π, not just something that someone thought was a good idea.
If you only need the [-1 … 1] definition interval, then the polynomial can be made more accurate on that interval by ignoring the rest. Running the optimization algorithm again for this definition interval produces:
x * (1 + x * x * (-1.666659904470566774477504230733785739156e-1 + x * x *(8.329797530524482484880881032235130379746e-3 + x * x *(-1.928379009208489415662312713847811393721e-4)))
The absolute error graph:

If that is too accurate for you, it is possible to optimize a polynomial of lower degree for the same objective. Then the absolute error will be larger but you will save a multiplication or two.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With