I'm looking for a fast method to efficiently compute (a
⋅b
) modulo n
(in the mathematical sense of that) for a
, b
, n
of type uint64_t
. I could live with preconditions such as n!=0
, or even a<n && b<n
.
Notice that the C expression (a*b)%n
won't cut it, because the product is truncated to 64 bits. I'm looking for (uint64_t)(((uint128_t)a*b)%n)
except that I do not have a uint128_t
(that I know, in Visual C++).
I'm in for a Visual C++ (preferably) or GCC/clang intrinsic making best use of the underlying hardware available on x86-64 platforms; or if that can't be done for a portable inline
function.
Ok, how about this (not tested)
modmul:
; rcx = a
; rdx = b
; r8 = n
mov rax, rdx
mul rcx
div r8
mov rax, rdx
ret
The precondition is that a * b / n <= ~0ULL
, otherwise there will be a divide error. That's a slightly less strict condition than a < n && m < n
, one of them can be bigger than n
as long as the other is small enough.
Unfortunately it has to be assembled and linked in separately, because MSVC doesn't support inline asm for 64bit targets.
It's also still slow, the real problem is that 64bit div
, which can take nearly a hundred cycles (seriously, up to 90 cycles on Nehalem for example).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With