Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sum based on date range in two separate columns

I want to sum all the value in one column based on a range of date in two column:

Start_Date  Value_to_sum  End_date
2017-12-13    2          2017-12-13
2017-12-13    3          2017-12-16 
2017-12-14    4          2017-12-15
2017-12-15    2          2017-12-15

A simple groupby won't do it since it would only add the value for a specific date.

We could do an embeeded for loop but it would take forever to run:

unique_date = carry.Start_Date.unique()
carry = pd.DataFrame({'Date':unique_date})
carry['total'] = 0
for n in tqdm(range(len(carry))):
    tr = data.loc[data['Start_Date'] >= carry['Date'][n]]
    for i in tr.index:
        if carry['Date'][n] <= tr['End_date'][i]:
                carry['total'][n] += tr['Value_to_sum'][i]

Something like that would work but like I said would take forever.

The expected output is unique date with the total for each day.

Here it would be

2017-12-13 = 5, 2017-12-14 = 7, 2017-12-15 = 9.

How do I compute the sum based on the date ranges?

like image 868
Nico Coallier Avatar asked Jan 29 '26 17:01

Nico Coallier


1 Answers

First, group by ["Start_Date", "End_date"] to save some operations.

from collections import Counter
c = Counter()
df_g = df.groupby(["Start_Date", "End_date"]).sum().reset_index()

def my_counter(row):
    s, v, e = row.Start_Date, row.Value_to_sum, row.End_date
    if s == e:
        c[pd.Timestamp(s, freq="D")] += row.Value_to_sum
    else:
         c.update({date: v for date in pd.date_range(s, e)})

df_g.apply(my_counter, axis=1) 
print(c)
"""
Counter({Timestamp('2017-12-15 00:00:00', freq='D'): 9,
     Timestamp('2017-12-14 00:00:00', freq='D'): 7,
     Timestamp('2017-12-13 00:00:00', freq='D'): 5,
     Timestamp('2017-12-16 00:00:00', freq='D'): 3})
"""

Tools used:

Counter.update([iterable-or-mapping]): Elements are counted from an iterable or added-in from another mapping (or counter). Like dict.update() but adds counts instead of replacing them. Also, the iterable is expected to be a sequence of elements, not a sequence of (key, value) pairs. -- Cited from Python 3 Documentation

pandas.date_range

like image 118
Tai Avatar answered Jan 31 '26 08:01

Tai