Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create new column in pandas based on whether datetime values are within an hour

If I have a dataframe in pandas like:

StartDate,           EndDate,              uniqueid
2015-07-20 15:04:13, 2015-07-20 17:56:00,  1
2015-07-20 20:32:16, 2015-07-20 20:56:11,  3
2015-07-20 22:35:59, 2015-07-20 22:51:10,  11
2015-07-20 18:00:51, 2015-07-20 18:42:01,  12       

I want to create a new column which has in indicator value if in the list there exists a StartDate less than an hour after the rows EndDate. So an example output would look like:

StartDate,           EndDate,              uniqueid, WithinHour
2015-07-20 15:04:13, 2015-07-20 17:56:00,  1,        1
2015-07-20 20:32:16, 2015-07-20 20:56:11,  3,        0   
2015-07-20 22:35:59, 2015-07-20 22:51:10,  11,       0
2015-07-20 18:00:51, 2015-07-20 18:42:01,  12,       0

Because row 1 has

EndDate 2015-07-20 17:56:00 

and row 4 has

StartDate 2015-07-20 18:00:51

which is within 1 hour.

There should not be any overlap in start and end date for the rows, so I think I should order by startdate, create a new column with the (StartDate of the next row) - (EndDate of this row), then change the value of the new column to 1 if the time is less than an hour and 0 if it's greater. I'm not sure how to do this though.

like image 401
Josh Kidd Avatar asked Jan 22 '26 05:01

Josh Kidd


1 Answers

Here is one way. For each EndDate, apply the datetime range test you want. Use any to get the scalar answer as your value will be True if at least one row matches:

df['uniqueid'] = df.EndDate.apply(lambda dt: (((df.StartDate-pd.Timedelta(1, 'h')) <= dt)
                                             & (df.StartDate >= dt )).any())

df.uniqueid
Out[57]: 
0     True
1    False
2    False
3    False
Name: uniqueid, dtype: bool
like image 197
Zeugma Avatar answered Jan 23 '26 20:01

Zeugma