How do I convert a string, e.g. a user ID plus salt, to a random looking but actually a deterministically repeatable uniform probability in the semi-open range [0.0, 1.0)? This means that the output is ≥ 0.0 and < 1.0. The output distribution must be uniform irrespective of the input distribution. For example, if the input string is 'a3b2Foobar', the output probability could repeatably be 0.40341504.
Cross-language and cross-platform algorithmic reproducibility is desirable. I'm inclined to use a hash function unless there is a better way. Here is what I have:
>>> in_str = 'a3b2Foobar'
>>> (int(hashlib.sha256(in_str.encode()).hexdigest(), 16) % 1e8) / 1e8
0.40341504
I'm using the latest stable Python 3. Please note that this question is similar but not exactly identical to the related question to convert an integer to a random but deterministically repeatable choice.
A cryptographic hash is assumably a uniformly distributed integer in the range [0, MAX_HASH]. Accordingly, it can be scaled to a floating-point number in the range [0, 1) by dividing it by MAX_HASH + 1.
import hashlib
Hash = hashlib.sha512
MAX_HASH_PLUS_ONE = 2**(Hash().digest_size * 8)
def str_to_probability(in_str):
"""Return a reproducible uniformly random float in the interval [0, 1) for the given string."""
seed = in_str.encode()
hash_digest = Hash(seed).digest()
hash_int = int.from_bytes(hash_digest, 'big') # Uses explicit byteorder for system-agnostic reproducibility
return hash_int / MAX_HASH_PLUS_ONE # Float division
>>> str_to_probability('a3b2Foobar')
0.3659629991207491
Here is a real world usage example.
Notes:
hash
method must not be used because it can preserve the input's
distribution, e.g. with hash(123). Alternatively, it can return values that differ when Python is restarted, e.g. with hash('123').The random module can be used with in_str as its seed, while addressing concerns surrounding both thread safety and continuity.
With this approach, not only is cross-language reproducibility a concern, but reproducibility across multiple future versions of Python could also be a concern. It is therefore not recommended.
import random
def str_to_probability(in_str):
"""Return a reproducible uniformly random float in the interval [0, 1) for the given seed."""
return random.Random(in_str).random()
>>> str_to_probability('a3b2Foobar')
0.4662507245848473
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With