I have code which reads vast numbers of dates in 'YYYY-MM-DD' format. Parsing all these dates, so that it can add one, two, or three days then write back in the same format is slowing things down quite considerably.
 3214657   14.330    0.000  103.698    0.000 trade.py:56(effective)
 3218418   34.757    0.000   66.155    0.000 _strptime.py:295(_strptime)
 day = datetime.datetime.strptime(endofdaydate, "%Y-%m-%d").date()
Any suggestions how to speed it up a bit (or a lot)?
DESCRIPTION. The strptime() function converts the character string pointed to by buf to values which are stored in the tm structure pointed to by tm, using the format specified by format. The format is composed of zero or more directives.
strptime is short for "parse time" where strftime is for "formatting time". That is, strptime is the opposite of strftime though they use, conveniently, the same formatting specification.
%B - full month name. %c - preferred date and time representation. %C - century number (the year divided by 100, range 00 to 99) %d - day of the month (01 to 31)
strptime() isn't considered thread-safe. But, that yields the exceptions as it's run inside a thread.
Is factor 7 lot enough?
datetime.datetime.strptime(a, '%Y-%m-%d').date()       # 8.87us
datetime.date(*map(int, a.split('-')))                 # 1.28us
EDIT: great idea with explicit slicing:
datetime.date(int(a[:4]), int(a[5:7]), int(a[8:10]))   # 1.06us
that makes factor 8.
fromisoformat()
Since Python 3.7, the datetime class has a method fromisoformat. It should be noted that this can also be applied to this question:
strptime()
Explicit string slicing may give you about a 9x increase in performance compared to normal strptime, but you can get about a 90x increase with the built-in fromisoformat method!
%timeit isofmt(datelist)
569 µs ± 8.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit slice2int(datelist)
5.51 ms ± 48.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit normalstrptime(datelist)
52.1 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
from datetime import datetime, timedelta
base, n = datetime(2000, 1, 1, 1, 2, 3, 420001), 10000
datelist = [(base + timedelta(days=i)).strftime('%Y-%m-%d') for i in range(n)]
def isofmt(l):
    return list(map(datetime.fromisoformat, l))
    
def slice2int(l):   
    def slicer(t):
        return datetime(int(t[:4]), int(t[5:7]), int(t[8:10]))
    return list(map(slicer, l))
def normalstrptime(l):
    return [datetime.strptime(t, '%Y-%m-%d') for t in l]
    
print(isofmt(datelist[0:1]))
print(slice2int(datelist[0:1]))
print(normalstrptime(datelist[0:1]))
# [datetime.datetime(2000, 1, 1, 0, 0)]
# [datetime.datetime(2000, 1, 1, 0, 0)]
# [datetime.datetime(2000, 1, 1, 0, 0)]
Python 3.8.3rc1 x64 / Win10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With