Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

my mersenne-prime generator is stopping at 2^31 - 1

i have a code that creates primes turns them in to Mersenne numbers and then prime checks it again to see if it is a Mersenne prime. it is working fine apart form that fact that it stops at 2^31 - 1... (2147483647)

i was using a function that produces infinite numbers:

def infinity():
    i = 0
    while True:
        i += 1
        yield i

and then changed it to a while True loop with a i += 1 at the end but it still does not work.

def isPrime(i):
    isprime = True
    for j in range(2, int(math.sqrt(i) + 1)):
        if i % j == 0:
            return False

    if isprime and i!=1:
        return True

i=1 
while True:
    isprime = True
    for j in range(2, int(math.sqrt(i) + 1)):
        if i % j == 0:
            isprime = False
            break

    if isprime and i!=1:
        test = (2**i)-1
        result = isPrime(test)
        if result:
            print (test)
    i+=1
like image 569
Ethan2001 Avatar asked Oct 28 '25 09:10

Ethan2001


1 Answers

Your algorithm looks good and working to me. I decided to make some improvements though, and also implemented faster algorithms of finding Mersenne Primes. Just for reference, all world-known Mersenne Primes are listed here.

In all next steps first goes description, then code, and after code you can see console output with running times.

Because in my answer I have a lot of code snippets and many are large, you can use nice Chrome extension (Clipboardy) (described how to install in this answer) which allows to easily copy to clipboard StackOverflow code snippets.


Step 1. Algorithm Version 0, Python-only.

First, I improved your algorithm a bit, also a bit beautified your code. One easy improvement is that when checking is_prime you can check only odd divisors, this will double the speed. This is what I got:

Try it online!

import math, time

def print_result(exp, *, tb = time.time()):
    print(exp, 'at', round(time.time() - tb, 3), 'secs', flush = True, end = ',   ')
    
def mersennes_v0():
    def is_prime(n):
        if n <= 2:
            return n == 2
        if n & 1 == 0:
            return False
        for j in range(3, math.floor(math.sqrt(n + 0.1)) + 1, 2):
            if n % j == 0:
                return False
        return True

    for i in range(2, 63):
        if is_prime(i) and is_prime((1 << i) - 1):
            print_result(i)

mersennes_v0()

Output:

2 at 0.0 secs,   3 at 0.0 secs,   5 at 0.001 secs,   7 at 0.001 secs,
13 at 0.001 secs,   17 at 0.001 secs,   19 at 0.002 secs,   31 at 0.009 secs,
61 at 243.5412 secs,

Step 2. Algorithm Version 0, plus Numba.

Then without change of algorithm I made very simple improvement, I used Numba JIT code optimizer/compiler! To use it in your program you need to install PIP module one time through python -m pip install numba.

In code above I just did 3 little changes to use Numba - 1) did import numba 2) added function decorator @numba.njit 3) add line with numba.objmode():.

Using Numba improved AlgoV0 running time from 243 seconds to 15 seconds i.e. boosted speed 16x times !!!

Important note - unlike regular Python Numba doesn't support big integer arithmetic, it means only primes fitting 64-bit integer can be tested!

Try it online!

import math, time
import numba

def print_result(exp, *, tb = time.time()):
    print(exp, 'at', round(time.time() - tb, 3), 'secs', flush = True, end = ',   ')
    
@numba.njit
def mersennes_v0():
    def is_prime(n):
        if n <= 2:
            return n == 2
        if n & 1 == 0:
            return False
        for j in range(3, math.floor(math.sqrt(n + 0.1)) + 1, 2):
            if n % j == 0:
                return False
        return True

    for i in range(2, 63):
        if is_prime(i) and is_prime((1 << i) - 1):
            with numba.objmode():
                print_result(i)

mersennes_v0()

Output:

2 at 1.257 secs,   3 at 1.257 secs,   5 at 1.257 secs,   7 at 1.257 secs,
13 at 1.257 secs,   17 at 1.257 secs,   19 at 1.258 secs,   31 at 1.258 secs,
61 at 15.052 secs,

Step 3. Algorithm Version 1, Python-only.

Then I decided to implement Lucas Lehmer Primality Test, which will solve same task much faster. As it is totally different algorithm I called it Version-1 algorithm.

Numba can't help us much here, because it doesn't have long arithmetics, so I didn't use it, I used built-in default Python long arithmetics. Python default long arithmetics implementation is good enough at speed to be used as it is. Also main running time of next code is within low level C-based functions of long arithmetics, so next code is very efficient variant of Lucas Lehmer Test.

Notice that Lucas Lehmer Test is valid for exponents starting from 3, so exponent 2 is not shown in console output.

So next code uses just raw Python with standard modules, and doesn't need installing any PIP modules.

Try it online!

import math, time

def print_result(exp, *, tb = time.time()):
    print(exp, 'at', round(time.time() - tb, 3), 'secs', flush = True, end = ',   ')
    
def mersennes_v1():
    def is_prime(n):
        if n <= 2:
            return n == 2
        if n & 1 == 0:
            return False
        for j in range(3, math.floor(math.sqrt(n + 0.1)) + 1, 2):
            if n % j == 0:
                return False
        return True

    def lucas_lehmer_test(p):
        # Lucas Lehmer Test https://en.wikipedia.org/wiki/Lucas%E2%80%93Lehmer_primality_test

        mask = (1 << p) - 1
        
        def mer_rem(x, bits):
            # Below is same as:   return x % mask
            while True:
                r = 0
                while x != 0:
                    r += x & mask
                    x >>= bits
                if r == mask:
                    r = 0
                if r < mask:
                    return r
                x = r

        s = 4
        for k in range(2, p):
            s = mer_rem(s * s - 2, p)
        return s == 0
        
    for p in range(3, 1 << 30):
        if is_prime(p) and lucas_lehmer_test(p):
            print_result(p)

mersennes_v1()

Output:

3 at 0.0 secs,   5 at 0.001 secs,   7 at 0.001 secs,   13 at 0.001 secs,
17 at 0.001 secs,   19 at 0.001 secs,   31 at 0.002 secs,   61 at 0.003 secs,
89 at 0.004 secs,   107 at 0.005 secs,   127 at 0.006 secs,   521 at 0.062 secs,
607 at 0.087 secs,   1279 at 0.504 secs,   2203 at 2.338 secs,   2281 at 2.634 secs,
3217 at 7.819 secs,   4253 at 20.578 secs, 4423 at 23.196 secs,
9689 at 386.877 secs,   9941 at 419.243 secs,   11213 at 590.477 secs,

Step 4. Algorithm Version 1, plus GMPY2.

Next possible improvement of algorithm above is to use GMPY2 Python library. It is highly-optimized library of long arithmetics, considerably faster than regular Python implementation of long arithmetics.

But GMPY2 can be quite challenging to install. Most recent precompiled versions for Windows are available by this link. Download .whl file for your version of Python and install in through pip, e.g. for my Windows 64-bit Python 3.7 I downloaded and installed pip install gmpy2-2.0.8-cp37-cp37m-win_amd64.whl. For Linux it is easiest to install through sudo apt install -y python3-gmpy2.

To use gmpy2 in code from previous Step, I added line import gmpy2 and used gmpy2.mpz(...) in several places.

Usage of gmpy2 compared to Python-only version improved speed by around 5.2x times for exponent 11213, and for larger exponents this speed gain will be even more, probably due to use of Fast Fourier Transform multiplication algorithms inside GMPY2 library.

Try it online!

import math, time
import gmpy2

def print_result(exp, *, tb = time.time()):
    print(exp, 'at', round(time.time() - tb, 3), 'secs', flush = True, end = ',   ')
    
def mersennes_v1():
    def is_prime(n):
        if n <= 2:
            return n == 2
        if n & 1 == 0:
            return False
        for j in range(3, math.floor(math.sqrt(n + 0.1)) + 1, 2):
            if n % j == 0:
                return False
        return True

    def lucas_lehmer_test(p):
        # Lucas Lehmer Test https://en.wikipedia.org/wiki/Lucas%E2%80%93Lehmer_primality_test
        
        mask = gmpy2.mpz((1 << p) - 1)

        def mer_rem(x, bits):
            # Below is same as:   return x % mask
            while True:
                r = gmpy2.mpz(0)
                while x != 0:
                    r += x & mask
                    x >>= bits
                if r == mask:
                    r = gmpy2.mpz(0)
                if r < mask:
                    return r
                x = r

        s = 4
        s, p = gmpy2.mpz(s), gmpy2.mpz(p)
        for k in range(2, p):
            s = mer_rem(s * s - 2, p)
        return s == 0
        
    for p in range(3, 1 << 30):
        if is_prime(p) and lucas_lehmer_test(p):
            print_result(p)

mersennes_v1()

Output:

3 at 0.0 secs,   5 at 0.0 secs,   7 at 0.001 secs,   13 at 0.001 secs,
17 at 0.001 secs,   19 at 0.001 secs,   31 at 0.002 secs,   61 at 0.002 secs,
89 at 0.004 secs,   107 at 0.005 secs,   127 at 0.006 secs,   521 at 0.055 secs,
607 at 0.073 secs,   1279 at 0.315 secs,   2203 at 1.055 secs,   2281 at 1.16 secs,
3217 at 2.637 secs,   4253 at 5.674 secs,   4423 at 6.375 secs,
9689 at 72.535 secs,   9941 at 79.605 secs,   11213 at 114.375 secs,
19937 at 796.705 secs,   21701 at 1080.045 secs,   23209 at 1421.392 secs,   

Step 5. Algorithm Version 1, and GMPY2 + Multi-Core

One more obvious improvement over previous step would be to use multi-core CPU capabilities, i.e. to parallelize Lucas-Lehmer Test on all available CPU cores.

For that we use standard Python's multiprocessing module.

On my 4-core CPU speed boost for exponent 23209 is 2.65x times compared to single-core version.

Try it online!

import math, time, multiprocessing
import gmpy2

def print_result(exp, *, tb = time.time()):
    print(exp, 'at', round(time.time() - tb, 3), 'secs', flush = True, end = ',   ')

def lucas_lehmer_test(p):
    # Lucas Lehmer Test https://en.wikipedia.org/wiki/Lucas%E2%80%93Lehmer_primality_test

    def num(n):
        return gmpy2.mpz(n)
    
    mask = num((1 << p) - 1)
    
    def mer_rem(x, bits):
        # Below is same as:   return x % mask
        while True:
            r = num(0)
            while x != 0:
                r += x & mask
                x >>= bits
            if r == mask:
                r = num(0)
            if r < mask:
                return r
            x = r

    s, p, two = num(4), num(p), num(2)
    for k in range(2, p):
        s = mer_rem(s * s - two, p)
    return p, s == 0
    
def mersennes_v1():
    def is_prime(n):
        if n <= 2:
            return n == 2
        if n & 1 == 0:
            return False
        for j in range(3, math.floor(math.sqrt(n + 0.1)) + 1, 2):
            if n % j == 0:
                return False
        return True

    print('Num Cores Used:', multiprocessing.cpu_count())

    with multiprocessing.Pool(multiprocessing.cpu_count()) as pool:
        for p, _ in filter(lambda e: e[1], pool.imap(lucas_lehmer_test, filter(is_prime, range(3, 1 << 30)))):
            print_result(p)

if __name__ == '__main__':
    mersennes_v1()

Output:

3 at 0.746 secs,   5 at 0.747 secs,   7 at 0.747 secs,   13 at 0.747 secs,
17 at 0.747 secs,   19 at 0.748 secs,   31 at 0.748 secs,   61 at 0.748 secs,
89 at 0.749 secs,   107 at 0.749 secs,   127 at 0.75 secs,   521 at 0.782 secs,
607 at 0.79 secs,   1279 at 0.876 secs,   2203 at 1.09 secs, 2281 at 1.122 secs,
3217 at 1.529 secs,   4253 at 2.329 secs,   4423 at 2.529 secs,
9689 at 25.208 secs,   9941 at 29.454 secs,   11213 at 50.498 secs,
19937 at 337.926 secs,   21701 at 441.264 secs,   23209 at 537.84 secs,

Step 6. Advanced Code, with Trial Division Test

Providing next code just for reference, it is much more complex than previous step. Main difference is added Trial Division Test (to existing Lucas-Lehmer Test).

To use code you need to pip install numpy colorama, also gmpy2 needs to be installed as described in previous steps. Also unlike previous steps this one needs at least Python 3.8 to work (due to shared memory capability).

At the beginning of mersennes_v2() function you may setup some configurational variables. One important variable there is prime_bits, it should be set to as much as possible for your computer, but not more than 32, this variable controls size of primes table, 32 meaning all 32-bit primes are generated. This prime table is needed for trial division test. If you generate more then is higher chance of this test success hence more time saved instead of using heavy Lucas-Lehmer. If you have not much memory then set prime_bits to small value, but not less than 24, for 24 only 5MB of memory will be used, while for 32 will be used around 650MB.

Another added things are 1) generation of primes table using Sieve of Eratosthenes 2) usage of shared memory 3) fancy console output with statistics.

Timings provided at the end are not precise (don't compare them to previous steps) because my CPU today is overheated and gives twice less speed.

In console output tboost signifies amount of boost given by extra trial division test, i.e. how much on average less time is spent with using both TrialDivision+LucasLehmer compared to just LucasLehmer. If tboost value is bigger than 1.0x then it means that extra TrialDivision was benefitial, otherwise (if less than 1.0) it means slowdown, which can happen on some systems then code needs extra tweaking. In console tbits:X signifies that trial division tests are done for all prime numbers below 2^X.

Try it online!

import math, time, multiprocessing, multiprocessing.pool, multiprocessing.shared_memory, sys, os, faulthandler, traceback, secrets
faulthandler.enable()
import numpy as np, gmpy2, colorama
colorama.init()

def is_mersenne(args, *, data = {'tf': 0}):
    try:
        def lucas_lehmer_test(exp):
            # Lucas Lehmer Test https://en.wikipedia.org/wiki/Lucas%E2%80%93Lehmer_primality_test

            def num(n):
                return gmpy2.mpz(n)
            
            mask = num((1 << exp) - 1)
            
            def mer_rem(x, bits):
                # Below is same as:   return x % mask
                while True:
                    r = num(0)
                    while x != 0:
                        r += x & mask
                        x >>= bits
                    if r == mask:
                        r = num(0)
                    if r < mask:
                        return r
                    x = r

            tb = time.time()
            
            s, exp, two = num(4), num(exp), num(2)
            for k in range(2, exp):
                s = mer_rem(s * s - two, exp)

            #print('ll', '-+'[s==0], '_', int(exp), '_', round(time.time() - tb, 3), sep = '', end = ' ', flush = True)
            return s == 0

        def trial_div_test(exp):
            import numpy as np
            if 'ptaba' not in data:
                data['shp'] = multiprocessing.shared_memory.SharedMemory(args['all_primes_shname'])
                data['ptaba'] = np.ndarray((args['all_primes_size'] // 4,), dtype = np.uint32, buffer = data['shp'].buf)
            if exp <= 32:
                return True, {'bits': 0}
            bits = max(16, min(-19 + math.floor(math.log(exp) / math.log(1.25)), math.ceil(args['prime_bits'])))
            if ('ptabb', bits) not in data:
                cnt = min(np.searchsorted(data['ptaba'], 1 << bits), len(data['ptaba']))
                data[('ptabb', bits)] = np.ndarray((cnt,), dtype = np.uint32, buffer = data['shp'].buf)
            sptab = data[('ptabb', bits)]
            tb, bl, probably_prime = time.time(), 1 << 13, True
            spows = np.empty((bl,), dtype = np.uint64)
            for i in range(0, sptab.size, bl):
                ptab = sptab[i : i + bl]
                pows = spows if spows.size == ptab.size else spows[:ptab.size]
                pows[...] = 2
                for b in bin(exp)[3:]:
                    pows *= pows
                    if b == '1':
                        pows <<= 1
                    pows %= ptab
                if np.count_nonzero(pows == 1) > 0:
                    probably_prime = False
                    break
            #print('td', '-+'[probably_prime], '_', int(exp), '_', round(time.time() - tb, 3), sep = '', end = ' ', flush = True)
            return probably_prime, {'bits': bits}

        p, stats = args['p'], {'tt': time.time()}
        r, tinfo = trial_div_test(p)
        stats['tt'], stats['tr'], stats['tte'], stats['tinfo'] = time.time() - stats['tt'], r, time.time(), tinfo
        if not r:
            return p, r, stats
        stats['lt'] = time.time()
        r = lucas_lehmer_test(p)
        stats['lt'], stats['lte'] = time.time() - stats['lt'], time.time()
        return p, r, stats
    except:
        return None, True, '', traceback.format_exc()
    
def mersennes_v2():
    prime_bits = 28
    num_cores = multiprocessing.cpu_count()
    console_width = os.get_terminal_size().columns

    def gen_primes(stop, *, dtype = 'int64'):
        # https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
        import math, numpy as np
        if stop < 2:
            return np.zeros((0,), dtype = dtype)
        primes = np.ones((stop >> 1,), dtype = np.uint8)
        primes[0] = False
        for p in range(3, math.floor(math.sqrt(stop)) + 1, 2):
            if primes[p >> 1]:
                primes[(p * p) >> 1 :: p] = False
        return np.concatenate((np.array([2], dtype = dtype), (np.flatnonzero(primes).astype(dtype) << 1) + 1))
        
    def RoundFix(x, n):
        s = str(round(x, n))
        return s + '0' * (n - len(s) + ('.' + s).rfind('.'))
        
    prime_bits = max(24, min(prime_bits, math.log(math.sqrt((1 << 63) - 1)) / math.log(2)))
    print(f'Generating {round(prime_bits, 2)}-bits primes...', end = ' ', flush = True)
    tb = time.time()
    all_primes = gen_primes(math.floor(2 ** prime_bits), dtype = 'uint32')
    print(f'{round(time.time() - tb)} secs.', flush = True)
    all_primes_size = len(all_primes.data) * all_primes.dtype.itemsize
    all_primes_shname = secrets.token_hex(16).upper()
    shp = multiprocessing.shared_memory.SharedMemory(
        name = all_primes_shname, create = True, size = all_primes_size)
    shp.buf[:all_primes_size] = all_primes.tobytes()
    del all_primes
    all_primes = np.ndarray((all_primes_size // 4,), dtype = np.uint32, buffer = shp.buf)

    print('Using', num_cores, 'cores.', flush = True)
    print('\n\n')
    offx, tstart, tlast, ptimes = 0, time.time(), time.time(), []
    tstats = {'tt': 0.0, 'tc': 0, 'tts': [], 'tta': 0.0, 'lt': 0.0, 'lc': 0, 'lts': [], 'lta': 0.0, 'st': [(0, 0, time.time())], 'tbits': []}
    with multiprocessing.Pool(num_cores) as pool:
        for e in pool.imap(is_mersenne, ({
            'p': int(e), 'all_primes_size': all_primes_size, 'all_primes_shname': all_primes_shname, 'prime_bits': prime_bits,
        } for e in all_primes)):
            if e[0] is None:
                pool.terminate()
                print('!!!Exception!!!\n', e[3], sep = '', flush = True)
                break
            stats = e[2]
            def fill(p):
                tstats[f'{p}t'] += stats[f'{p}t']
                tstats[f'{p}ts'] += [(stats[f'{p}t'], stats[f'{p}te'])]
                while len(tstats[f'{p}ts']) > 20 and tstats[f'{p}ts'][-1][1] - tstats[f'{p}ts'][0][1] > 120:
                    tstats[f'{p}ts'] = tstats[f'{p}ts'][1:]
                tstats[f'{p}c'] += 1
                tstats[f'{p}ta'] = sum(e[0] for e in tstats[f'{p}ts']) / len(tstats[f'{p}ts'])
                if p == 't':
                    tstats['st'] += [(stats['tt'] + stats.get('lt', 0), stats.get('lt', tstats['st'][-1][1]), stats.get('lte', stats['tte']))]
                    while len(tstats['st']) > 50 and tstats['st'][-1][2] - tstats['st'][0][2] > 300:
                        tstats['st'] = tstats['st'][1:]
                    tstats['sta'] = sum(e[1] for e in tstats['st']) / max(0.001, sum(e[0] for e in tstats['st']))
                    tstats['tbits'] = (tstats['tbits'] + [stats['tinfo']['bits']])[-20:]
                    tstats['tbitsa'] = sum(tstats['tbits']) / len(tstats['tbits'])
            fill('t')
            if 'lt' in stats:
                fill('l')
            if not e[1]:
                s0 = f'{str(e[0]).rjust(6)}| trial:{RoundFix(stats["tt"], 3)}/lucas:' + (f'{RoundFix(stats["lt"], 3)}' if 'lt' in stats else '-' * len(RoundFix(tstats['lta'], 3))) + f' secs (avg t:{RoundFix(tstats["tta"], 3)}/l:{RoundFix(tstats["lta"], 3)})    '
                s1 = f'{"".rjust(6)}| cnt t:{tstats["tc"]}({tstats["tc"] - tstats["lc"]})/l:{tstats["lc"]}, tboost:{RoundFix(tstats["sta"], 3)}x tbits:{RoundFix(tstats["tbitsa"], 2)}     '
                print('\033[2A' + s0[:console_width - 4] + '\n' + s1[:console_width - 4] + '\n', end = '', flush = True)
            else:
                s = str(e[0]).rjust(6) + ' at ' + str(round(time.time() - tstart)).rjust(6) + ' secs, '
                if offx + len(s) <= console_width - 4:
                    print('\033[3A\033[' + str(offx) + 'C' + s + '\n\n\n', end = '', flush = True)
                else:
                    print('\033[2A' + (' ' * (console_width - 4)) + '\n\033[1A' + s + '\n\n\n', end = '', flush = True)
                    offx = 0
                offx += len(s)
            tlast = time.time()
                
if __name__ == '__main__':
    mersennes_v2()
Generating 28-bits primes... 4 secs.
Using 8 cores.
     3 at      1 secs,      5 at      1 secs,      7 at      1 secs,
    13 at      1 secs,     17 at      1 secs,     19 at      1 secs,
    31 at      1 secs,     61 at      1 secs,     89 at      1 secs,
   107 at      1 secs,    127 at      1 secs,    521 at      1 secs,
   607 at      1 secs,   1279 at      1 secs,   2203 at      2 secs,
  2281 at      2 secs,   3217 at      3 secs,   4253 at      4 secs,
  4423 at      5 secs,   9689 at     57 secs,   9941 at     68 secs,
 11213 at    106 secs,  19937 at    832 secs,  21701 at   1145 secs,
 23209 at   1448 secs,
 13327| trial:0.599/lucas:5.510 secs (avg t:0.290/l:3.592)
      | cnt t:1582(490)/l:1092, tboost:1.348x tbits:23.00

Step 7. C++ Code.

Although the question is about Python, I implemented C++ solution.

Main benefit of C++ over Python is because same algorithm in pure C++ compared to pure Python is sometimes even 50x-200x faster, so much!

I used GMP library for implementing big integer arithmetic. Also made partial support of Boost Multiprecision library.

Try it online!

CODE HERE. Unfortunately my code together with post appears to be bigger than 30K symbols, which is more than StackOverflow limit of 30K per post. That's why I'm posting my code here as Github Gist (link below). Also, check out Try it online! link above, it also has full code, but, Note, it is with some limitations of exponents size and console verbosity.

Github Gist full code

Code output:

Pre-Computing Table Primes till 2^28.0... 4 sec
     3 at      0 sec,      5 at      0 sec,      7 at      0 sec,
    13 at      0 sec,     17 at      0 sec,     19 at      0 sec,
    31 at      0 sec,     61 at      0 sec,     89 at      1 sec,
   107 at      1 sec,    127 at      1 sec,    521 at      8 sec,
   607 at      9 sec,   1279 at     21 sec,   2203 at     37 sec,
  2281 at     38 sec,   3217 at     57 sec,   4253 at     79 sec,
  4423 at     82 sec,   9689 at    259 sec,   9941 at    273 sec,
 11213 at    343 sec,  19937 at   1351 sec,  21701 at   1926 sec,
 23209 at   2499 sec,



Exp 23565, Time 2629, TrialCnt 11783, LLCnt 4980, TrialBoost 1.932x,
    TrialTime% 18%, Threads 2, TrialPrimes 2^10.73
like image 68
Arty Avatar answered Oct 29 '25 22:10

Arty