Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a function to find the difference between datetimes?

Tags:

python

pandas

I have multiple dataframes which can have the same timestamps ( also +-1second) that have milliseconds in them. So when they are all together in the new dataframe i want to filter out the rows where they are more than 1 second different from each other

Is there a function similar to dftogether['unique'] = np.ediff1d(dftogether['DateTime'] that works with timestamps?

My current solution works, but I am looking for a proper way to do it. Let's say i have 3 dataframes, df1, df2 and df3. For each dataframe I do this:

df1['DateTime'] = df1['DateTime'].apply(lambda 
x: x.strftime('%Y%d%m%H%M%S'))
df1['DateTime']= df1['DateTime'].astype(np.int64)

Which turns my DateTime into int so i can do this:

dftogether= pd.concat(z, sort=True)
dftogether= dftogether.sort_values('DateTime')
dftogether['unique'] = np.ediff1d(dftogether['DateTime'], to_begin=20181211150613411) <1
dftogether= dftogether[dftogether.unique == False]

And then I convert the int back to datetime

 dftogether['DateTime'] = dftogether['DateTime'].apply(lambda x: pd.to_datetime(str(x), format='%Y%d%m%H%M%S'))

I couldn't figure out how to create sample data for the timestamps so i will just copypaste parts of the dataframe.

df1

737    2018-12-18 12:37:19.717
738    2018-12-18 12:37:21.936
739    2018-12-18 12:37:22.841
740    2018-12-18 12:37:23.144
877    2018-12-18 12:40:53.268
878    2018-12-18 12:40:56.597
879    2018-12-18 12:40:56.899
880    2018-12-18 12:40:57.300
968    2018-12-18 12:43:31.411
969    2018-12-18 12:43:36.150
970    2018-12-18 12:43:36.452

df2

691    2018-12-18 12:35:23.612
692    2018-12-18 12:35:25.627
788    2018-12-18 12:38:33.248
789    2018-12-18 12:38:33.553
790    2018-12-18 12:38:34.759
866    2018-12-18 12:40:29.487
867    2018-12-18 12:40:31.199
868    2018-12-18 12:40:32.206

df3

699    2018-12-18 12:35:42.452
701    2018-12-18 12:35:45.081
727    2018-12-18 12:36:47.466
730    2018-12-18 12:36:51.796
741    2018-12-18 12:37:23.448
881    2018-12-18 12:40:57.603
910    2018-12-18 12:42:02.904
971    2018-12-18 12:43:37.361

I want the dftogether to look like this but with timestamps instead of ints

   Unique  DateTime
 737    False  20181812123719
 738    False  20181812123721
 739    False  20181812123722
 741    False  20181812123723
 742     True  20181812123723
 740     True  20181812123723
 785    False  20181812123830
 786    False  20181812123831
 787    False  20181812123832
 787     True  20181812123832
 788    False  20181812123833

so I can drop the ones where Unique == True

 785    False 2018-12-18 12:38:30
 786    False 2018-12-18 12:38:31
 787    False 2018-12-18 12:38:32
 788    False 2018-12-18 12:38:33
 790    False 2018-12-18 12:38:34
 812    False 2018-12-18 12:39:10
 813    False 2018-12-18 12:39:11

Something else: Where can I voice my opinion on the new stackoverflow ask a question? IMO this is really awful, it keeps scrolling up, entering/copypasting code is really confusing now and all the For Example is really distracting. It took me more than 30 minutes to write this question

like image 797
Martijn van Amsterdam Avatar asked Dec 15 '25 05:12

Martijn van Amsterdam


1 Answers

I joined your df1 and df2 to a df, and created a dates list like this:

df = pd.concat([df1,df2]).sort_values('DateTime').reset_index(drop=True)

date_list = [datetime.strptime(i, '%Y-%m-%d %H:%M:%S.%f') for i in df.DateTime.tolist()]

then I get the desired output with a 1 liner:

df[[x>1 for x in [0]+[(j-i).total_seconds() for i,j in zip(date_list, date_list[1:])]]]

To understand how it works, first check the output of:

[x>1 for x in [0]+[(j-i).total_seconds() for i,j in zip(date_list, date_list[1:])]]

Hope this helps. Cheers.

like image 188
Neo Avatar answered Dec 16 '25 20:12

Neo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!