I'm trying to write a function in the D programming language to replace the calls to C's strtold. (Rationale: To use strtold from D, you have to convert D strings to C strings, which is inefficient. Also, strtold can't be executed at compile time.) I've come up with an implementation that mostly works, but I seem to lose some precision in the least significant bits.
The code to the interesting part of the algorithm is below and I can see where the precision loss comes from, but I don't know how to get rid of it. (I've left out a lot of the parts of code that weren't relevant to the core algorithm to save people reading.) What string-to-float algorithm will guarantee that the result will be as close as possible on the IEEE number line to the value represented by the string.
real currentPlace = 10.0L ^^ (pointPos - ePos + 1 + expon);
real ans = 0;
for(int index = ePos - 1; index > -1; index--) {
    if(str[index] == '.') {
        continue;
    }
    if(str[index] < '0' || str[index] > '9') {
        err();
    }
    auto digit = cast(int) str[index] - cast(int) '0';
    ans += digit * currentPlace;
    currentPlace *= 10;
}
return ans * sign;
Also, I'm using the unit tests for the old version, which did things like:
assert(to!(real)("0.456") == 0.456L);
Is it possible that the answers being produced by my function are actually more accurate than the representation the compiler produces when parsing a floating point literal, but the compiler (which is written in C++) always agrees exactly with strtold because it uses strtold internally for parsing floating point literals?
We can convert a string to float in Python using the float() function. This is a built-in function used to convert an object to a floating point number. Internally, the float() function calls specified object __float__() function.
The atof() function converts a character string to a double-precision floating-point value. The input string is a sequence of characters that can be interpreted as a numeric value of the specified return type.
With a data type, there is a limited number of bits. Those bits cannot accurately represent a value that requires more than that number of bits. The data type float has 24 bits of precision. This is equivalent to only about 7 decimal places.
Clinger and Steele & White developed fine algorithms for reading and writing floating point.
There's a retrospective here along with some references to implementations.
David Gay's paper improving Clinger's work, and Gay's implementation in C are great. I have used them in embedded systems, and I believe Gay's dtoa made its way into many libc's.
Honestly, this is one of those things that you really ought not be doing if you don't already know how to do it. It's full of pitfalls, and even if you manage to get it right, it will likely be tremendously slow if you don't have expertise in analyzing low-level numerics performance.
That said, if you're really determined to write your own implementation, the best reference for correctness is David Gay's "Correctly Rounded Binary-Decimal and Decimal-Binary Conversions" (postscript version). You should also study his reference implementations (in C), which are available on Netlib.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With