Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting floating point ">=" to ">" and "<=" to "<"

I am looking for a way in Delphi to get the smallest single and double floating point value that I can add to or subtract from or add to my number to make the number different for floating point comparisons. Alternatively, if I can get the next floating point number that is smaller and larger than my number. From a floating point standpoint I would like to convert this:

if (A >= B) or (C <= D) then

To

if (A > newnumber1) or (C < newnumber2) then

Where they produce the same results in floating point. newnumber1 and newnumber2 would obviously be different for single and doubles. I either need some value that I can subtract from my A and add to my C values to get the newnumber1 and newnumber2 or I need a way of getting to these numbers from B and D.

In C++11 there is a method std::nextafter that is referenced in this question that looks like it would be sufficient.

Finding the closest floating point value less than a specific integer value in C++?

Context

I am doing vector operations and I need to do the equivalent of a greater than or equal to. The easiest way to accomplish this is to take a slightly smaller number and use that with a greater than operation. I would prefer not to thumb suck a value that seems to work, if at all possible.

The vector operation that I am using is ippsThreshold_LTValGTVal_32s from:

https://software.intel.com/en-us/node/502143

The library obviously doesn't support a >= operation. That is not practical in a floating point sense. To to create an equivalent function I need to increase and decrease my comparison values to counter this and then use a greater than operation and a less than operation.

For Example

If I have an array of 5 values [99.4, 20, 19.9, 99, 80], the ippsThreshold_LTValGTVal_32s vector operation will let me replace specific values in the vector with my own replacement values. In this example, I would like to replace all values >= 99 and <= 20 with 0. To do this I would like to pass in something like this. So I have to replace the 99 with something marginally smaller and the 20 with something marginally bigger.

The function signature looks like this:

ippsThreshold_LTValGTVal_32s(..., ..., ..., levelLT, valueLT, levelGT, valueGT);

My call would be something like this:

ippsThreshold_LTValGTVal_32s(..., ..., ..., 20.00000001, 0, 98.99999, 0);

This would then include the 20 for the less than operation and the 99 for the greater than operation and give me a vector that looks like [0, 0, 0, 0, 80].

I need to find out what to use for the 20.0000001 and 98.999999. I would like to have the difference between these values and the original values be as small as possible while still being significant enough to include the values in the > and < operations.

like image 224
Graymatter Avatar asked Oct 17 '25 15:10

Graymatter


1 Answers

By design, for IEEE754 data types, you can simply treat the value as an integer and increment the value. Or decrement it if the value is negative.

function NextDoubleGreater(const D: Double): Double;
var
  SpecialType: TFloatSpecial;
  I: Int64;
begin
  SpecialType := D.SpecialType;
  case SpecialType of
  fsZero,fsNZero:
    // special handling needed around 0 and -0
    I := 1;
  fsInf, fsNInf, fsNaN:
    I := PInt64(@D)^; // return the original value
  fsDenormal, fsNDenormal, fsPositive, fsNegative:
    begin
      I := PInt64(@D)^;
      if I >= 0 then begin
        inc(I);
      end else begin
        dec(I);
      end;
    end;
  end;
  Result := PDouble(@I)^;
end;

And similarly in the opposite direction:

function NextDoubleLess(const D: Double): Double;
var
  SpecialType: TFloatSpecial;
  I: Int64;
begin
  SpecialType := D.SpecialType;
  case SpecialType of
  fsZero,fsNZero:
    // special handling needed around 0 and -0
    I := $8000000000000001;
  fsInf, fsNInf, fsNaN:
    I := PInt64(@D)^; // return the original value
  fsDenormal, fsNDenormal, fsPositive, fsNegative:
    begin
      I := PInt64(@D)^;
      if I >= 0 then begin
        dec(I);
      end else begin
        inc(I);
      end;
    end;
  end;
  Result := PDouble(@I)^;
end;

It's no coincidence that the format is this way. Implementation of floating point comparison operators is trivial because of this design.

Reference: How to alter a float by its smallest increment (or close to it)?

like image 160
David Heffernan Avatar answered Oct 19 '25 12:10

David Heffernan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!