Handling division of double by a large power of 2 in C++

Question

My question is about how division should be handled when you have to divide with a large power of 2, which is likely to trivial thing but I didn't find any helpful material. I am basically asking which (if any) of the two proposed methods in the end is more sensible and/or is there a better way to achieve my goal.

The context is that in a project that I am working on I have to at one point compute a sum of type double numbers w of the form w = (c_1/2)*(c_2/2)*...*(c_n/2) with c_1,...,c_n some other type double numbers. In the process of optimizing the code, I at first figured that it might be a good idea to compute first the product of c_1,...,c_n, and then divide that product by 2^n. Due to reason which I cannot really get in this post, the n in the product defining w might get quite large, possibly even around 70 or 80. Right now I am effectively computing the final value of w i.) computing the aforementioned product of c_1,...,c_n and storing it to w, ii.) computing and storing the value of 2^n to a type long long variable a, iii.) set to w = w/a.

To my knowledge the largest power of 2 that can be stored in a type long long variable is 63, so I was wondering whether it would be better to divide w in a for loop, like

for (int k { 0 }; k < n; ++k) {
    w /= 2;
}

to avoid storing a value to large in the aforementioned variable a. I am aware that this can be reduced to a logarithmic time by repeatedly dividing with suitable powers of 2, but the point still remains.

Alternatively I could also just not factor out the 1/2s from the product, and instead compute w in the old way. However, in this approach it is not clear to me how much worse the numerical stability will be, or is there anything to be concerned about in the first place.

Red.Wave · Accepted Answer

The answer was among the comments, but didn't get proper attention. I am not going to discuss the accuracy/precision concerns, because anyone dealing with floating point must already be aware and take responsibility for that part. The question at hand has a one liner answer as the library function ldexp:

#inckude <ranges>
#include <cmath>
//assume C[N] is modeled by a C++ range:
auto c = std::ranges::fold_left(c_range, 1.0L, std::multiplies<>{});
auto n = - int{std::ranges::distance(c_range)};
auto res = std::ldexp(c, n);
// res = c * pow(2, n);

The commented line might be less precise and slower than suggested answer.

Matteo Italia · Answer

(moving from a comment)

Ignoring subnormals and infinites, I don't see any numerical stability advantage of postponing the division by power of 2: as double values are internally stored as an integral mantissa multiplied by an integral base-2 exponent, all divisions by 2 boil down to to adjusting the exponent part of the double; the mantissa remains intact, and there's no precision loss.

Given that after this division by 2 you are multiplying together all these numbers, again this does not affect the precision of the final result, as in a double multiplication the exponent part is just summed (again, exactly).

Hence, multiplying the c_i/2 numbers together or multiplying c_i and dividing by exp2(n) (which, again, returns an exact double value for integral values of n, so you have no need to accumulate the power of 2 value in a long long variable) in the end is going to result in the exact same value, as the only difference is in whether you are adjusting the exponent before or after, which is in both cases an "exact" operation.

You can test this by yourself with some fuzzing:

#include <stdio.h>
#include <math.h>
#include <stdint.h>

struct XorShift64Star {
    /// PRNG state
    uint64_t y;

    /// Initializes with seed
    explicit XorShift64Star(uint64_t seed = 0) : y(seed) {
        if(y == 0) y = 0x159a55e5075bcd15ULL;
    }

    /// Returns a value in the range [1, 1<<64)
    uint64_t operator()() {
        y ^= (y>>12);
        y ^= (y<<25);
        y ^= (y>>27);
        return y * 0x2545F4914F6CDD1DULL;
    }

    /// 53-bit resolution double in range [0, 2^64)
    double gen_big_double() {
        uint64_t x = (*this)();
        double gexp = x & 0x3f;
        if (x & 0x40) gexp = -gexp;
        return (x >> 11)/9007199254740992.0 * exp2(gexp);
    }
};

int main() {
    XorShift64Star gen;
    double res_half = 0.;
    double res_full = 0.;
    long long iters;
    for (iters = 0; res_half == res_full / exp2(iters); ++iters) {
        double val = gen.gen_big_double();
        res_half *= val / 2.;
        res_full *= val;
    }
    printf("%lld %0.18g %0.18g", iters, res_half, res_full / exp2(iters));
    return 0;
}

I left this running for ~~minutes~~over an hour, and it still hasn't quit; I expect this to actually terminate only if one of the intermediate results gets to hit the extreme limits of exponent (±2**127), after which we get into infinities (if going above) or subnormal numbers, where the mantissa starts working in a different way.

Handling division of double by a large power of 2 in C++

Tags:

c++

precision

division

numerical-methods

numerical-stability

Epsilon Away

2 Answers

Red.Wave

Matteo Italia

Recent Activity

Donate For Us

Handling division of double by a large power of 2 in C++

Tags:

c++

precision

division

numerical-methods

numerical-stability

Epsilon Away

2 Answers

Red.Wave

Matteo Italia

Related questions

Recent Activity

Donate For Us