Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trivial C program yields different result in clang/macOS/arm64 and clang/macOS/x86_64

Tags:

c

macos

clang

arm64

I have some problems while porting some complex under macOS/arm64 and ended up with the following trivial code to exhibit the different behavior w.r.t. macOS/x86_64 (using native osx/arm64 clang version 14.0.6 from conda-forge, and cross compiling for x86_64):

#include "assert.h"
#include "stdio.h"
int main()
{
    double y[2] = {-0.01,0.9};
    double r;
    r = y[0]+0.03*y[1];
    printf("r = %24.26e\n",r);
    assert(r == 0.017);
}

The results on arm64 is

$ clang -arch arm64 test.c -o test; ./test
Assertion failed: (r == 0.017), function main, file test.c, line 9.
r = 1.69999999999999977517983751e-02
zsh: abort      ./test

while the result on x86_64 is

$ clang -arch x86_64 test.c -o test; ./test
r = 1.70000000000000012212453271e-02
$       

The test program has also been compiled/run on a x86_64 machine, it yields the same result as above (cross compiled on arm64 and run with Rosetta).

In fact it doesn't matter that the arm64 result is not bitwise equal to 1.7 parsed and stored as a IEEE754 number, but rather the different value of the expression w.r.t. x86_64.

Update 1:

In order to check eventual different conventions (e.g. rounding mode), the following program has been compiled and run on both platforms

#include <iostream>
#include <limits>

#define LOG(x) std::cout << #x " = " << x << '\n'

int main()
{
    using l = std::numeric_limits<double>;
    LOG(l::digits);
    LOG(l::round_style);
    LOG(l::epsilon());
    LOG(l::min());

    return 0;
}

it yields the same result:

l::digits = 53
l::round_style = 1
l::epsilon() = 2.22045e-16
l::min() = 2.22507e-308

hence the problem seems to be elsewhere.

Update 2:

If it can help: under arm64 the result obtained with the expression is the same as the one obtained by calling refBLAS ddot with vectors {1,0.03} and y.

Update 3:

The toolchain seems to be the cause. Using the default toolchain of macOS 11.6.1:

mottelet@portmottelet-cr-1 ~ % clang -v
Apple clang version 13.0.0 (clang-1300.0.29.30)
Target: arm64-apple-darwin20.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

gives the same results for both architecture ! So the problem seems to be in the actual toolchain I am using: I use the version 1.5.2 of conda package cxx-compiler (I need conda as a package manager because the application I am building has a lot of dependencies that conda provides me).

Using -v shows a bunch of compilation flags, which one would be eventually incriminated ?

like image 673
Stéphane Mottelet Avatar asked Sep 17 '25 12:09

Stéphane Mottelet


2 Answers

The results differ in the least significant bit due to different rounding given the compilers and architectures. You can use %a to see all of the bits in the double in hex. Then you get on arm64:

0x1.16872b020c49bp-6

and on x86_64:

0x1.16872b020c49cp-6

The IEEE 754 standard by itself does not guarantee exactly the same results across conforming implementations, in particular due to destination accuracy, decimal conversions, and instruction choices. Variations in the least significant bit, or more with multiple operations, can and should be expected.

In this case, the fmadd operation on the arm64 architecture is used, doing the multiply and add in a single operation. That gives a different result than the separate multiply and add XMM operations used in the x86_64 architecture.

In the comments, Eric points out the C library function fma() to do a combined multiply-add. Indeed, if I use that call on the x86_64 architecture (as well as on arm64), I get the arm64 fmadd result.

You could potentially get different behavior in the same architecture if the compiler optimizes away the operation, as it should in the example. Then the compiler is doing the computation. The compiler could very well use separate multiply and add operations at compile time, giving a different result on arm64 than the fmadd operation when not optimized out. Also if you're cross-compiling, then the optimized-out calculation could depend the architecture of the machine you're compiling on, as opposed to the one you're running it on.

Comparison for exact equality of floating point values is fraught with peril. Whenever you see yourself attempting that, you need to think more deeply about your intent.

like image 65
Mark Adler Avatar answered Sep 19 '25 05:09

Mark Adler


It appears that clang behavior changed between 13.x and 14.x. When using -O, the result is computed at compile time and the target's floating point has nothing to do with it, so this is strictly a compiler issue.

Try on godbolt

The difference is easier to see in hex float output. clang 13 and earlier computes the value 0x1.16872b020c49cp-6 which is slightly greater than 1.7. clang 14 and later computes 0x1.16872b020c49bp-6 which is slightly less (different by 1 in the least significant bit).

The same discrepancy exists between the two versions whether on arm64 or x86-64.

I am not sure offhand which one is better or worse. I guess you could git bisect if you really care, and look at the rationale for the corresponding commit and see whether it seems to be correct. For comparison, gcc in all versions tested gives the "old clang" value of 0x1.16872b020c49cp-6.

like image 37
Nate Eldredge Avatar answered Sep 19 '25 03:09

Nate Eldredge