Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compute (a*b)%n FAST for 64-bit unsigned arguments in C(++) on x86-64 platforms?

I'm looking for a fast method to efficiently compute  (ab) modulo n  (in the mathematical sense of that) for a, b, n of type uint64_t. I could live with preconditions such as n!=0, or even a<n && b<n.

Notice that the C expression (a*b)%n won't cut it, because the product is truncated to 64 bits. I'm looking for (uint64_t)(((uint128_t)a*b)%n) except that I do not have a uint128_t (that I know, in Visual C++).

I'm in for a Visual C++ (preferably) or GCC/clang intrinsic making best use of the underlying hardware available on x86-64 platforms; or if that can't be done for a portable inline function.

like image 848
fgrieu Avatar asked Sep 06 '25 02:09

fgrieu


1 Answers

Ok, how about this (not tested)

modmul:
; rcx = a
; rdx = b
; r8 = n
mov rax, rdx
mul rcx
div r8
mov rax, rdx
ret

The precondition is that a * b / n <= ~0ULL, otherwise there will be a divide error. That's a slightly less strict condition than a < n && m < n, one of them can be bigger than n as long as the other is small enough.

Unfortunately it has to be assembled and linked in separately, because MSVC doesn't support inline asm for 64bit targets.

It's also still slow, the real problem is that 64bit div, which can take nearly a hundred cycles (seriously, up to 90 cycles on Nehalem for example).

like image 192
harold Avatar answered Sep 07 '25 21:09

harold