I have a date range - say between 1925-01-01 and 1992-01-01. I'd like to generate a list of x dates between that range, and have those x dates generated follow a 'normal' (bell curve - see image) distribution.
There are many many answers on stackoverflow about doing this with integers (using numpy, scipy, etc), but I can't find a solid example with dates

As per @sascha's comment, a conversion from the dates to a time value does the job:
#!/usr/bin/env python3
import time
import numpy
_DATE_RANGE = ('1925-01-01', '1992-01-01')
_DATE_FORMAT = '%Y-%m-%d'
_EMPIRICAL_SCALE_RATIO = 0.15
_DISTRIBUTION_SIZE = 1000
def main():
time_range = tuple(time.mktime(time.strptime(d, _DATE_FORMAT))
for d in _DATE_RANGE)
distribution = numpy.random.normal(
loc=(time_range[0] + time_range[1]) * 0.5,
scale=(time_range[1] - time_range[0]) * _EMPIRICAL_SCALE_RATIO,
size=_DISTRIBUTION_SIZE
)
date_range = tuple(time.strftime(_DATE_FORMAT, time.localtime(t))
for t in numpy.sort(distribution))
print(date_range)
if __name__ == '__main__':
main()
Note that instead of the _EMPIRICAL_SCALE_RATIO, you could (should?) use scipy.stats.truncnorm to generate a truncated normal distribution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With