With python 2.7 the following code computes the mD5 hexdigest of the content of a file.
(EDIT: well, not really as answers have shown, I just thought so).
import hashlib  def md5sum(filename):     f = open(filename, mode='rb')     d = hashlib.md5()     for buf in f.read(128):         d.update(buf)     return d.hexdigest() Now if I run that code using python3 it raise a TypeError Exception:
    d.update(buf) TypeError: object supporting the buffer API required I figured out that I could make that code run with both python2 and python3 changing it to:
def md5sum(filename):     f = open(filename, mode='r')     d = hashlib.md5()     for buf in f.read(128):         d.update(buf.encode())     return d.hexdigest() Now I still wonder why the original code stopped working. It seems that when opening a file using the binary mode modifier it returns integers instead of strings encoded as bytes (I say that because type(buf) returns int). Is this behavior explained somewhere ?
# Import hashlib library (md5 method is part of it) import hashlib # File to check file_name = 'filename.exe' # Correct original md5 goes here original_md5 = '5d41402abc4b2a76b9719d911017c592' # Open,close, read file and calculate MD5 on its contents with open(file_name, 'rb') as file_to_check: # read contents of the ...
The MD5, defined in RFC 1321, is a hash algorithm to turn inputs into a fixed 128-bit (16 bytes) length of the hash value. Note. MD5 is not collision-resistant – Two different inputs may producing the same hash value. Read this MD5 vulnerabilities. In Python, we can use hashlib.
I think you wanted the for-loop to make successive calls to f.read(128).  That can be done using iter() and functools.partial():
import hashlib from functools import partial  def md5sum(filename):     with open(filename, mode='rb') as f:         d = hashlib.md5()         for buf in iter(partial(f.read, 128), b''):             d.update(buf)     return d.hexdigest()  print(md5sum('utils.py')) for buf in f.read(128):   d.update(buf) .. updates the hash sequentially with each of the first 128 bytes values of the file. Since iterating over a bytes produces int objects, you get the following calls which cause the error you encountered in Python3.
d.update(97) d.update(98) d.update(99) d.update(100) which is not what you want.
Instead, you want:
def md5sum(filename):   with open(filename, mode='rb') as f:     d = hashlib.md5()     while True:       buf = f.read(4096) # 128 is smaller than the typical filesystem block       if not buf:         break       d.update(buf)     return d.hexdigest() If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With