I need an acos() function with double precision within a compute shader. Since there is no built-in function of acos() in GLSL with double precision, I tried to implement my own.
At first, I implemented a Taylor series like the equation from Wiki - Taylor series with precalculated faculty values. But that seems to be inaccurate around 1. The maximum error was something about 0.08 with 40 iterations.
I also implemented this method which works very well on CPU with a maximum error of -2.22045e-16, but I have some troubles to implement this within the shader.
Currently, I am using an acos() approximation function from here where somebody posted his approximation functions on this site. I am using the most accurate function of this site and now I get a maximum error of -7.60454e-08, but also that error is too high.
My code of this function is:
double myACOS(double x)
{
    double part[4];
    part[0] = 32768.0/2835.0*sqrt(2.0-sqrt(2.0+sqrt(2.0+sqrt(2.0+2.0*x))));
    part[1] = 256.0/135.0*sqrt(2.0-sqrt(2.0+sqrt(2.0+2.0*x)));
    part[2] = 8.0/135.0*sqrt(2.0-sqrt(2.0+2.0*x));
    part[3] = 1.0/2835.0*sqrt(2.0-2.0*x);
    return (part[0]-part[1]+part[2]-part[3]);
}
Does anybody know another implementation method of acos() which is very accurate and -if possible- easy to implement in a shader?
Some system information:
The NVIDIA GT 555M GPU is a device with compute capability 2.1, so there is native hardware support for basic double-precision operations, including fused multipy-add (FMA). As with all NVIDIA GPUs, the square root operation is emulated. I am familiar with CUDA, but not GLSL. According to version 4.3 of the GLSL specification, it exposes double-precision FMA as a function fma() and provides a double-precision square root, sqrt(). It is not clear whether the sqrt() implementation is correctly rounded according to IEEE-754 rules. I will assume it is, by analogy with CUDA.
Instead of using a Taylor series, one would want to use a polynomial minimax approximation, thereby reducing the number of terms required. Minimax approximations are typically generated using a variant of the Remez algorithm. To optimize both speed and accuracy, the use of FMA is essential. Evaluation of the polynomial with the Horner scheme is condusive to high accruacy. In the code below, a second order Horner scheme is utilized. As in DanceIgel's answer, acos is conveniently computed using an asin approximation as the basic building block in conjunction with standard mathematical identities.
With 400M test vectors, the maximum relative error seen with the code below was 2.67e-16, while the maximum ulp error observed is 1.442 ulp.
/* compute arcsin (a) for a in [-9/16, 9/16] */
double asin_core (double a)
{
    double q, r, s, t;
    s = a * a;
    q = s * s;
    r =             5.5579749017470502e-2;
    t =            -6.2027913464120114e-2;
    r = fma (r, q,  5.4224464349245036e-2);
    t = fma (t, q, -1.1326992890324464e-2);
    r = fma (r, q,  1.5268872539397656e-2);
    t = fma (t, q,  1.0493798473372081e-2);
    r = fma (r, q,  1.4106045900607047e-2);
    t = fma (t, q,  1.7339776384962050e-2);
    r = fma (r, q,  2.2372961589651054e-2);
    t = fma (t, q,  3.0381912707941005e-2);
    r = fma (r, q,  4.4642857881094775e-2);
    t = fma (t, q,  7.4999999991367292e-2);
    r = fma (r, s, t);
    r = fma (r, s,  1.6666666666670193e-1);
    t = a * s;
    r = fma (r, t, a);
    return r;
}
/* Compute arccosine (a), maximum error observed: 1.4316 ulp
   Double-precision factorization of π courtesy of Tor Myklebust
*/
double my_acos (double a)
{
    double r;
    r = (a > 0.0) ? -a : a; // avoid modifying the "sign" of NaNs
    if (r > -0.5625) {
        /* arccos(x) = pi/2 - arcsin(x) */
        r = fma (9.3282184640716537e-1, 1.6839188885261840e+0, asin_core (r));
    } else {
        /* arccos(x) = 2 * arcsin (sqrt ((1-x) / 2)) */
        r = 2.0 * asin_core (sqrt (fma (0.5, r, 0.5)));
    }
    if (!(a > 0.0) && (a >= -1.0)) { // avoid modifying the "sign" of NaNs
        /* arccos (-x) = pi - arccos(x) */
        r = fma (1.8656436928143307e+0, 1.6839188885261840e+0, -r);
    }
    return r;
}
My current accurate shader implementation of 'acos()' is a mix out of the usual Taylor series and the answer from Bence. With 40 iterations I get an accuracy of 4.44089e-16 to the 'acos()' implementation from math.h. Maybe it is not the best, but it works for me:
And here it is:
double myASIN2(double x)
{
    double sum, tempExp;
    tempExp = x;
    double factor = 1.0;
    double divisor = 1.0;
    sum = x;
    for(int i = 0; i < 40; i++)
    {
        tempExp *= x*x;
        divisor += 2.0;
        factor *= (2.0*double(i) + 1.0)/((double(i)+1.0)*2.0);
        sum += factor*tempExp/divisor;
    }
    return sum;
}
double myASIN(double x)
{
    if(abs(x) <= 0.71)
        return myASIN2(x);
    else if( x > 0)
        return (PI/2.0-myASIN2(sqrt(1.0-(x*x))));
    else //x < 0 or x is NaN
        return (myASIN2(sqrt(1.0-(x*x)))-PI/2.0);
}
double myACOS(double x)
{
    return (PI/2.0 - myASIN(x));
}
Any comments, what could be done better? For example using a LUT for the values of factor, but in my shader 'acos()' is just called once so there is no need for it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With