I want to generate a uuid in python using the md5 hash of a preexisting unique identifier ranged between 1 and 999999999
It seems obvious that on numbers so small there couldn't be any worry... but it made me think, do we know the smallest two integers that have the same md5 hash ?
I tested, and the answer is no.
counting with set() will trigger MemoryError on my 64GB memory box, so I write hexlify hash to disk:
import hashlib
from multiprocessing import Pool
import os
def f(args):
s, e = args
l = []
for i in xrange(s, e):
h = hashlib.md5(str(i)).hexdigest()
l.append(h)
l.sort()
fn = '/data/tmp/%s_%s' % (s, e)
with open(fn, 'w') as f:
for h in l:
f.write('%s\n' % (h,))
def main():
def gen():
end = 1000000000
step = 5000000
s = 0
while s < end:
yield s, s+step
s += step
pool = Pool(processes=16)
res = pool.imap_unordered(f, gen())
list(res)
then counting with sort(1):
sort -mu /data/tmp/* | wc -l
yields:
1000000000
note that I encoded integer into ASCII string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With