Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge two lists of dates into a series of date intervals?

I have list of start dates, and a list of end dates. They're sorted...

start_dates = [
    datetime.date(2009, 11, 5), datetime.date(2009, 11, 13), 
    datetime.date(2009, 11, 25), datetime.date(2009, 11, 26), 
    datetime.date(2009, 12, 4), datetime.date(2009, 12, 7), 
    datetime.date(2009, 12, 29), datetime.date(2009, 12, 30)]

end_dates = [
    datetime.date(2009, 10, 1), datetime.date(2009, 10, 2), 
    datetime.date(2009, 10, 9), datetime.date(2009, 10, 12), 
    datetime.date(2009, 11, 4), datetime.date(2009, 12, 14), 
    datetime.date(2009, 12, 15)]

The start dates represent dates on which we received advice to buy a share. The end dates are dates on which which we received advice to sell it. The sources of advice are different and we are backtesting what would happen if we used the buy advice from one source, but the sell advice from another. Thus we have two sequences of dates which we want to resolve into pairs - or intervals - over which we would have held a position in the stock.

Thus we'll take a date from the start_dates to decide when to buy the stock: on Nov 5 we buy a position. Then we cycle through the end_dates looking for the first advice to sell it: Dec 14. And repeat, buying when we don't hold a position when advised to from one source, and selling when we do hold a position from the other source.

You might say we want to flap between which of two lists we're looping over.

Thus the input above generates:

result = (
  (datetime.date(2009, 11, 5), datetime.date(2009, 12, 14)),
  (datetime.date(2009, 12, 29), None)
)

I'm using for-loops inside for-loops and wonder if there is not a better way. Performance is of interest since it will be applied to thousands of scenarios over a 40 year span; some of the lists involve thousands of dates.

like image 622
John Mee Avatar asked Sep 07 '25 04:09

John Mee


2 Answers

I've finally nailed it down to:

    trades = []
    enddate = datetime.date(1900, 1, 1)
    try:
        for startdate in startdates:
            if enddate <= startdate:
                enddate = enddates.next()
                while enddate <= startdate:
                    enddate = enddates.next()
                trades.append((startdate, enddate))
    except StopIteration:
        trades.append((startdate, None))

Thanks to those who asked questions and answered. For no rational reason this little puzzle became a fixation for me but I think finally I have done this to death and should move on with my life. It's really very simple in the end - astonishing how much work it took to make it this plain!

like image 50
John Mee Avatar answered Sep 10 '25 05:09

John Mee


Edit

This should scale with len(start_dates)+len(end_dates):

def date_range(start_dates, end_dates):
    result = []

    start_i = 0
    end_i = 0

    while start_i<len(start_dates):
        while end_i<len(end_dates) and start_dates[start_i]>end_dates[end_i]:
            end_i += 1
        if end_i == len(end_dates):
            result.append((start_dates[start_i], None))
            break
        result.append((start_dates[start_i], end_dates[end_i]))
        while start_i<len(start_dates) and start_dates[start_i]<=end_dates[end_i]:
            start_i += 1
        end_i += 1

    return result

Usage:

In  : start_dates = [
   ....:     datetime.date(2009, 11, 5), datetime.date(2009, 11, 13),
   ....:     datetime.date(2009, 11, 25), datetime.date(2009, 11, 26),
   ....:     datetime.date(2009, 12, 4), datetime.date(2009, 12, 7),
   ....:     datetime.date(2009, 12, 29), datetime.date(2009, 12, 30)]

In : end_dates = [
   ....:     datetime.date(2009, 10, 1), datetime.date(2009, 10, 2),
   ....:     datetime.date(2009, 10, 9), datetime.date(2009, 10, 12),
   ....:     datetime.date(2009, 11, 4), datetime.date(2009, 12, 14),
   ....:     datetime.date(2009, 12, 15)]

In : date_range(start_dates, end_dates)
Out:
[(datetime.date(2009, 11, 5), datetime.date(2009, 12, 14)),
 (datetime.date(2009, 12, 29), None)]

In : start_dates = [
   ....:     datetime.date(2009, 11, 5), datetime.date(2009, 11, 13),
   ....:     datetime.date(2009, 11, 25), datetime.date(2009, 11, 26),
   ....:     datetime.date(2009, 12, 4), datetime.date(2009, 12, 7),
   ....:     datetime.date(2009, 12, 29), datetime.date(2009, 12, 30)]

In : end_dates = [
   ....:     datetime.date(2009, 10, 1), datetime.date(2009, 10, 2),
   ....:     datetime.date(2009, 10, 9), datetime.date(2009, 10, 12),
   ....:     datetime.date(2009, 11, 7), datetime.date(2009, 12, 14), # changed (2009, 11, 4) -> (2009, 11, 7)
   ....:     datetime.date(2009, 12, 15)]

In : date_range(start_dates, end_dates)
Out:
[(datetime.date(2009, 11, 5), datetime.date(2009, 11, 7)),
 (datetime.date(2009, 11, 13), datetime.date(2009, 12, 14)),
 (datetime.date(2009, 12, 29), None)]
like image 31
Avaris Avatar answered Sep 10 '25 03:09

Avaris