Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate a normal distribution of dates within a range

I have a date range - say between 1925-01-01 and 1992-01-01. I'd like to generate a list of x dates between that range, and have those x dates generated follow a 'normal' (bell curve - see image) distribution.

There are many many answers on stackoverflow about doing this with integers (using numpy, scipy, etc), but I can't find a solid example with dates

enter image description here

like image 370
Jeff Avatar asked Oct 23 '25 06:10

Jeff


1 Answers

As per @sascha's comment, a conversion from the dates to a time value does the job:

#!/usr/bin/env python3

import time
import numpy

_DATE_RANGE = ('1925-01-01', '1992-01-01')
_DATE_FORMAT = '%Y-%m-%d'
_EMPIRICAL_SCALE_RATIO = 0.15
_DISTRIBUTION_SIZE = 1000

def main():
    time_range = tuple(time.mktime(time.strptime(d, _DATE_FORMAT))
                       for d in _DATE_RANGE)
    distribution = numpy.random.normal(
        loc=(time_range[0] + time_range[1]) * 0.5,
        scale=(time_range[1] - time_range[0]) * _EMPIRICAL_SCALE_RATIO,
        size=_DISTRIBUTION_SIZE
    )
    date_range = tuple(time.strftime(_DATE_FORMAT, time.localtime(t))
                       for t in numpy.sort(distribution))
    print(date_range)

if __name__ == '__main__':
    main()

Note that instead of the _EMPIRICAL_SCALE_RATIO, you could (should?) use scipy.stats.truncnorm to generate a truncated normal distribution.

like image 61
ChristopherC Avatar answered Oct 24 '25 22:10

ChristopherC



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!