I'm not a python programmer, but I'm trying to translate some Python code to R. The piece of python code I'm having trouble with is:
hashlib.sha256(x).hexdigest()
My interpretation of this code is that the function is going to calculate the hash of x using the sha256 algorithm and return the value in hex.
Given that interpretation, I am using the following R function:
digest(x, algo="sha256", raw=FALSE)
Based upon my albeit limited knowledge of R and what I have read online on Python's hashlib function the two functions should be producing identical results, but they are not.
Am I missing something or am I using the wrong R function.
Using Python hashlib to Implement SHA256. Python has a built-in library, hashlib , that is designed to provide a common interface to different secure hashing algorithms. The module provides constructor methods for each type of hash. For example, the . sha256() constructor is used to create a SHA256 hash.
Source code: Lib/hashlib.py. This module implements a common interface to many different secure hash and message digest algorithms.
hexdigest() : Returns the encoded data in hexadecimal format.
This module implements a common interface to many different secure hash and message digest algorithms. Included are the FIPS secure hash algorithms SHA1, SHA224, SHA256, SHA384, and SHA512 (defined in FIPS 180-2) as well as RSA's MD5 algorithm (defined in Internet RFC 1321).
Yes, both the Python and the R sample code returns a hexadecimal representation of a SHA256 hash digest for the data passed in.
You do need to switch off serialisation in R, otherwise you the digest() package first creates a serialisation of the string rather than calculate the hash for the character data only; set serialize to FALSE:
> digest('', algo="sha256", serialize=FALSE)
[1] "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
> digest('hello world', algo="sha256", serialize=FALSE)
[1] "b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9"
These match their Python equivalents:
>>> import hashlib
>>> hashlib.sha256('').hexdigest()
'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
>>> hashlib.sha256('hello world').hexdigest()
'b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9'
If your hashes then still differ between R and Python, then your data is different. That could be a subtle as a newline at the end of the line, or a byte order mark at the start.
In Python, inspect the output of print(repr(x)) to represent the data as a Python string literal; this shows non-printable characters as escape sequences. I'm sure R has similar debugging tools. Both R and Python echo string values as representations when using their interactive modes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With