Assume that <code>t</code>,<code>a</code>,<code>b</code> are all double (IEEE Std 754) variables, and both values of <code>a</code>, <code>b</code> are NOT <code>NaN</code> (but may be <code>Inf</code>). After <code>t = a - b</code>, do I necessarily have <code>a == b + t</code>?

Absolutely not. One obvious case is <code>a=DBL_MAX</code>, <code>b=-DBL_MAX</code>. Then <code>t=INFINITY</code>, so <code>b+t</code> is also <code>INFINITY</code>. What may be more surprising is that there are cases where this happens without any overflow. Basically, they're all of the form where <code>a-b</code> is inexact. For example, if <code>a</code> is <code>DBL_EPSILON/4</code> and <code>b</code> is <code>-1</code>, <code>a-b</code> is 1 (assuming default rounding mode), and <code>a-b+b</code> is then 0. The reason I mention this second example is that this is the canonical way of forcing rounding to a particular precision in IEEE arithmetic. For instance, if you have a number in the range [0,1) and want to force rounding it to 4 bits of precision, you would add and then subtract <code>0x1p49</code>.

IEEE Std 754 Floating-Point: let t := a - b, does the standard guarantee that a == b + t?

1 Answers

Absolutely not. One obvious case is a=DBL_MAX, b=-DBL_MAX. Then t=INFINITY, so b+t is also INFINITY.

What may be more surprising is that there are cases where this happens without any overflow. Basically, they're all of the form where a-b is inexact. For example, if a is DBL_EPSILON/4 and b is -1, a-b is 1 (assuming default rounding mode), and a-b+b is then 0.

The reason I mention this second example is that this is the canonical way of forcing rounding to a particular precision in IEEE arithmetic. For instance, if you have a number in the range [0,1) and want to force rounding it to 4 bits of precision, you would add and then subtract 0x1p49.

129

answered Oct 13 '22 16:10

R.. GitHub STOP HELPING ICE

Related questions
                            
                                std::sort behavior with ints that are equal
                            
                                con.txt and C++
                            
                                Is it possible to set the opacity of qt widgets?
                            
                                int main(int argc, char** argv) [duplicate]
                            
                                Is const_cast<const Type*> ever useful?
                            
                                Is there a downside to leaving in debug symbols in release builds?
                            
                                How to determine type of widget in a qtable cell?
                            
                                How to invoke pointer to member function when it's a class data member?
                            
                                What's the best language for real-time graphics programming on Android?
                            
                                .crt section? What does this warning mean?
                            
                                Understanding the algorithm of Visual C++'s rand() function
                            
                                How to find all references of a particular class's overloaded operator in Visual Studio?
                            
                                compare-and-swap atomic operation vs Load-link/store-conditional operation
                            
                                Pointer to base class and private inheritance
                            
                                C++ initialization lists for multiple variables
                            
                                Templates don't always guess initializer list types
                            
                                shared library address space
                            
                                Getting arrow keys from cin
                            
                                Eclipse CDT convert a "normal folder" to a "source folder" or vice versa
                            
                                GCC's behaviour with std::async(std::launch::async) vs. Clang's behaviour

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

IEEE Std 754 Floating-Point: let t := a - b, does the standard guarantee that a == b + t?

Tags:

c++

c

floating-point

ieee-754

updogliu

People also ask

1 Answers

R.. GitHub STOP HELPING ICE

Recent Activity

Donate For Us