I want to sum all the value in one column based on a range of date in two column:
Start_Date Value_to_sum End_date
2017-12-13 2 2017-12-13
2017-12-13 3 2017-12-16
2017-12-14 4 2017-12-15
2017-12-15 2 2017-12-15
A simple groupby won't do it since it would only add the value for a specific date.
We could do an embeeded for loop but it would take forever to run:
unique_date = carry.Start_Date.unique()
carry = pd.DataFrame({'Date':unique_date})
carry['total'] = 0
for n in tqdm(range(len(carry))):
tr = data.loc[data['Start_Date'] >= carry['Date'][n]]
for i in tr.index:
if carry['Date'][n] <= tr['End_date'][i]:
carry['total'][n] += tr['Value_to_sum'][i]
Something like that would work but like I said would take forever.
The expected output is unique date with the total for each day.
Here it would be
2017-12-13 = 5, 2017-12-14 = 7, 2017-12-15 = 9.
How do I compute the sum based on the date ranges?
First, group by ["Start_Date", "End_date"] to save some operations.
from collections import Counter
c = Counter()
df_g = df.groupby(["Start_Date", "End_date"]).sum().reset_index()
def my_counter(row):
s, v, e = row.Start_Date, row.Value_to_sum, row.End_date
if s == e:
c[pd.Timestamp(s, freq="D")] += row.Value_to_sum
else:
c.update({date: v for date in pd.date_range(s, e)})
df_g.apply(my_counter, axis=1)
print(c)
"""
Counter({Timestamp('2017-12-15 00:00:00', freq='D'): 9,
Timestamp('2017-12-14 00:00:00', freq='D'): 7,
Timestamp('2017-12-13 00:00:00', freq='D'): 5,
Timestamp('2017-12-16 00:00:00', freq='D'): 3})
"""
Tools used:
Counter.update([iterable-or-mapping]): Elements are counted from an iterable or added-in from another mapping (or counter). Like dict.update() but adds counts instead of replacing them. Also, the iterable is expected to be a sequence of elements, not a sequence of (key, value) pairs. -- Cited from Python 3 Documentation
pandas.date_range
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With