Compute (a*b)%n FAST for 64-bit unsigned arguments in C(++) on x86-64 platforms?

Question

I'm looking for a fast method to efficiently compute (a⋅b) modulo n (in the mathematical sense of that) for a, b, n of type uint64_t. I could live with preconditions such as n!=0, or even a<n && b<n.

Notice that the C expression (a*b)%n won't cut it, because the product is truncated to 64 bits. I'm looking for (uint64_t)(((uint128_t)a*b)%n) except that I do not have a uint128_t (that I know, in Visual C++).

I'm in for a Visual C++ (preferably) or GCC/clang intrinsic making best use of the underlying hardware available on x86-64 platforms; or if that can't be done for a portable inline function.

harold · Accepted Answer

Ok, how about this (not tested)

modmul:
; rcx = a
; rdx = b
; r8 = n
mov rax, rdx
mul rcx
div r8
mov rax, rdx
ret

The precondition is that a * b / n <= ~0ULL, otherwise there will be a divide error. That's a slightly less strict condition than a < n && m < n, one of them can be bigger than n as long as the other is small enough.

Unfortunately it has to be assembled and linked in separately, because MSVC doesn't support inline asm for 64bit targets.

It's also still slow, the real problem is that 64bit div, which can take nearly a hundred cycles (seriously, up to 90 cycles on Nehalem for example).

Compute (a*b)%n FAST for 64-bit unsigned arguments in C(++) on x86-64 platforms?

Tags:

c++

c

x86-64

visual-c++

128-bit

fgrieu

1 Answers

harold

Recent Activity

Donate For Us

Compute (a*b)%n FAST for 64-bit unsigned arguments in C(++) on x86-64 platforms?

Tags:

c++

c

x86-64

visual-c++

128-bit

fgrieu

1 Answers

harold

Related questions

Recent Activity

Donate For Us