I was wondering if there is a numerically accurate way to compute
sign(a² - b * c) * sqrt(abs(a² - b * c))
with floating point arithmetics because it suffers from (ordered from most to least problematic):
a² ~= b * c which makes both the sign- and the sqrt-factor unstablea² or b * c dominate due to the square/product amplifying any differences between the two termsIf possible, I would like to stick with common floating point data types like 32- and 64-bit float and not resort to higher precision data types (like decimal) or arbitrary precision libraries (like Python mpmath).
While working on this scipy issue, I've found a part of the code that looks numerically unstable (Link) and even states so in the comments:
        # Distinguish between
        #    r1norm = ||b - Ax|| and
        #    r2norm = rnorm in current code
        #           = sqrt(r1norm^2 + damp^2*||x - x0||^2).
        #    Estimate r1norm from
        #    r1norm = sqrt(r2norm^2 - damp^2*||x - x0||^2).
        # Although there is cancellation, it might be accurate enough.
        if damp > 0:
            r1sq = rnorm**2 - dampsq * xxnorm
            r1norm = sqrt(abs(r1sq))
            if r1sq < 0:
                r1norm = -r1norm
This code basically aims to compute sign(a² - b * c) * sqrt(abs(a² - b * c)) with
`rnorm -> a`
`dampsq -> b`
`xxnorm -> c`
This problem looks pretty similar to the accurate computation of hypot(a, b) = sqrt(a² + b²) for which compensated fast algorithms exist, e.g., as proposed in
Borges C.F., Fast Compensated Algorithms for the Reciprocal Square Root, the Reciprocal Hypotenuse, and Givens Rotations, arXiv:2103.08694
The computation exploits fused multipy-add operations (fma) and compensates floating point errors.
However, I'm not particularly familiar with this deep theory on floating point numerical mathematics, so I was not able to translate the algorithms to the computation of sign(a² - b * c) * sqrt(abs(a² - b * c)).
Besides, the problem at hands comes with
r1sq = a² - b * c is a numeric problem when a² ≈ b * c.
Without resorting to wider types, one improvement when b * c >= 0:  determine d = sqrt(b*c).  We at least avoid rounding that occurs with a*a.
Then form the product r1sq = (a-d)*(a+d).
Code could use additional tricks if range was important too, yet sounds like OP is concerned about precision.
I've used this with the quadratic equation from time to time, yet find wider math more performant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With