Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there md5 collisions for integers in range(1,1000000000)

I want to generate a uuid in python using the md5 hash of a preexisting unique identifier ranged between 1 and 999999999

It seems obvious that on numbers so small there couldn't be any worry... but it made me think, do we know the smallest two integers that have the same md5 hash ?

like image 940
Vincent Chalmel Avatar asked Oct 28 '25 01:10

Vincent Chalmel


1 Answers

I tested, and the answer is no.

counting with set() will trigger MemoryError on my 64GB memory box, so I write hexlify hash to disk:

import hashlib
from multiprocessing import Pool
import os

def f(args):
    s, e = args
    l = []
    for i in xrange(s, e):
        h = hashlib.md5(str(i)).hexdigest()
        l.append(h)
    l.sort()
    fn = '/data/tmp/%s_%s' % (s, e)
    with open(fn, 'w') as f:
        for h in l:
            f.write('%s\n' % (h,))

def main():
    def gen():
        end = 1000000000
        step = 5000000
        s = 0
        while s < end:
            yield s, s+step
            s += step

    pool = Pool(processes=16)
    res = pool.imap_unordered(f, gen())
    list(res)

then counting with sort(1):

sort -mu /data/tmp/* | wc -l 

yields:

1000000000

note that I encoded integer into ASCII string.

like image 110
georgexsh Avatar answered Oct 29 '25 16:10

georgexsh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!