Converting floating point "=" to "" and "

Question

I am looking for a way in Delphi to get the smallest single and double floating point value that I can add to or subtract from or add to my number to make the number different for floating point comparisons. Alternatively, if I can get the next floating point number that is smaller and larger than my number. From a floating point standpoint I would like to convert this:

if (A >= B) or (C <= D) then

To

if (A > newnumber1) or (C < newnumber2) then

Where they produce the same results in floating point. newnumber1 and newnumber2 would obviously be different for single and doubles. I either need some value that I can subtract from my A and add to my C values to get the newnumber1 and newnumber2 or I need a way of getting to these numbers from B and D.

In C++11 there is a method std::nextafter that is referenced in this question that looks like it would be sufficient.

Finding the closest floating point value less than a specific integer value in C++?

Context

I am doing vector operations and I need to do the equivalent of a greater than or equal to. The easiest way to accomplish this is to take a slightly smaller number and use that with a greater than operation. I would prefer not to thumb suck a value that seems to work, if at all possible.

The vector operation that I am using is ippsThreshold_LTValGTVal_32s from:

https://software.intel.com/en-us/node/502143

The library obviously doesn't support a >= operation. That is not practical in a floating point sense. To to create an equivalent function I need to increase and decrease my comparison values to counter this and then use a greater than operation and a less than operation.

For Example

If I have an array of 5 values [99.4, 20, 19.9, 99, 80], the ippsThreshold_LTValGTVal_32s vector operation will let me replace specific values in the vector with my own replacement values. In this example, I would like to replace all values >= 99 and <= 20 with 0. To do this I would like to pass in something like this. So I have to replace the 99 with something marginally smaller and the 20 with something marginally bigger.

The function signature looks like this:

ippsThreshold_LTValGTVal_32s(..., ..., ..., levelLT, valueLT, levelGT, valueGT);

My call would be something like this:

ippsThreshold_LTValGTVal_32s(..., ..., ..., 20.00000001, 0, 98.99999, 0);

This would then include the 20 for the less than operation and the 99 for the greater than operation and give me a vector that looks like [0, 0, 0, 0, 80].

I need to find out what to use for the 20.0000001 and 98.999999. I would like to have the difference between these values and the original values be as small as possible while still being significant enough to include the values in the > and < operations.

David Heffernan · Accepted Answer

By design, for IEEE754 data types, you can simply treat the value as an integer and increment the value. Or decrement it if the value is negative.

function NextDoubleGreater(const D: Double): Double;
var
  SpecialType: TFloatSpecial;
  I: Int64;
begin
  SpecialType := D.SpecialType;
  case SpecialType of
  fsZero,fsNZero:
    // special handling needed around 0 and -0
    I := 1;
  fsInf, fsNInf, fsNaN:
    I := PInt64(@D)^; // return the original value
  fsDenormal, fsNDenormal, fsPositive, fsNegative:
    begin
      I := PInt64(@D)^;
      if I >= 0 then begin
        inc(I);
      end else begin
        dec(I);
      end;
    end;
  end;
  Result := PDouble(@I)^;
end;

And similarly in the opposite direction:

function NextDoubleLess(const D: Double): Double;
var
  SpecialType: TFloatSpecial;
  I: Int64;
begin
  SpecialType := D.SpecialType;
  case SpecialType of
  fsZero,fsNZero:
    // special handling needed around 0 and -0
    I := $8000000000000001;
  fsInf, fsNInf, fsNaN:
    I := PInt64(@D)^; // return the original value
  fsDenormal, fsNDenormal, fsPositive, fsNegative:
    begin
      I := PInt64(@D)^;
      if I >= 0 then begin
        dec(I);
      end else begin
        inc(I);
      end;
    end;
  end;
  Result := PDouble(@I)^;
end;

It's no coincidence that the format is this way. Implementation of floating point comparison operators is trivial because of this design.

Reference: How to alter a float by its smallest increment (or close to it)?

Converting floating point ">=" to ">" and "<=" to "<"

Tags:

simd

delphi

delphi-xe4

Graymatter

1 Answers

David Heffernan

Recent Activity

Donate For Us