Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is integer to floating point conversion lossless?

Particularly I'm interested if int32_t is always losslessly converted to double.

Does the following code always return true?

int is_lossless(int32_t i)
{
    double   d = i;
    int32_t i2 = d;
    return (i2 == i);
}

What is for int64_t?

like image 843
anton_rh Avatar asked Oct 20 '25 05:10

anton_rh


1 Answers

When is integer to floating point conversion lossless?

When the floating point type has enough precision and range to encode all possible values of the integer type.

Does the following int32_t code always return true? --> Yes.
Does the following int64_t code always return true? --> No.

As DBL_MAX is at least 1E+37, the range is sufficient for at least int122_t, let us look to precision.

With common double, with its base 2, sign bit, 53 bit significand, and exponent, all values of int54_t with its 53 value bits can be represented exactly. INT54_MIN is also representable. With this double, it has DBL_MANT_DIG == 53 and in this case that is the number of base-2 digits in the floating-point significand.

The smallest magnitude non-representable value would be INT54_MAX + 2. Type int55_t and wider have values not exactly representable as a double.

With uintN_t types, there is 1 more value bit. The typical double can then encode all uint53_t and narrower.


With other possible double encodings, as C specifies DBL_DIG >= 10, all values of int34_t can round trip.

Code is always true with int32_t, regardless of double encoding.


What is for int64_t?

UB potential with int64_t.

The conversion in int64_t i ... double d = i;, when inexact, makes for a implementation defined result of the 2 nearest candidates. This is often a round to nearest. Then i values near INT64_MAX can convert to a double one more than INT64_MAX.

With int64_t i2 = d;, the conversion of the double value one more than INT64_MAX to int64_t is undefined behavior (UB).

A simple prior test to detect this:

#define INT64_MAX_P1 ((INT64_MAX/2 + 1) * 2.0)
if (d == INT64_MAX_P1) return false;  // not lossless
like image 185
chux - Reinstate Monica Avatar answered Oct 21 '25 20:10

chux - Reinstate Monica