Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detecting missing timestamp

I have following dataframe having a timestamp and value. Timestamp increases by 5 seconds and notice that there are missing records between 23:02:02 & 23:06:32.

Is there a simple way to detect if there are missing records between timestamps?

timestamp   value
23:01:27    2915
23:01:32    2916
23:01:37    2919
23:01:42    2924
23:01:47    2926
23:01:52    2928
23:01:57    2933
23:02:02    2937 # <- missing timestamp
23:06:32    3102 # <- between these lines
23:06:37    3109
23:06:42    3114
23:06:47    3122
23:06:52    3126
23:06:57    3129
like image 344
B_B Avatar asked Oct 29 '25 02:10

B_B


1 Answers

If your goal is to indicate where you are missing timestamps, you can convert to datetime and use diff to see the time difference between rows, then use >'00:00:05' to see if the gap is greater than 5 seconds:

>>> pd.to_datetime(df['timestamp']).diff() > '00:00:05'
0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8      True
9     False
10    False
11    False
12    False
13    False
Name: timestamp, dtype: bool

This indicates you are missing records above index 8

If your goal is simply to see whether you are missing timestamps, use any:

>>> (pd.to_datetime(df['timestamp']).diff() > '00:00:05').any()
True

Indicating that you are indeed missing timestamps somewhere

[EDIT] as per @JoranBeasley's suggestion, you can also use the mode of your time differences to infer the desired frequency:

d = pd.to_datetime(df['timestamp']).diff()

>>> (d > d.mode()[0])
0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8      True
9     False
10    False
11    False
12    False
13    False
Name: timestamp, dtype: bool

Because d.mode()[0] will return the most common frequency observed:

>>> d.mode()[0]
Timedelta('0 days 00:00:05')
like image 51
sacuL Avatar answered Oct 30 '25 18:10

sacuL