Here is my some Dataset which having Time,Temperature1,Temperature2
Timestamp. Temperature1. Temperature2
09/01/2016 00:00:08 53.4. 45.5
09/01/2016 00:00:38. 53.5. 45.2
09/01/2016 00:01:08. 54.6. 43.2
09/01/2016 00:01:38. 55.2. 46.3
09/01/2016 00:02:08. 54.5. 45.5
09/01/2016 00:04:08. 54.2. 35.5
09/01/2016 00:05:08. 52.4. 45.7
09/01/2016 00:05:38. 53.4. 45.2
My data is coming in every 30 second..
This is my dataset here..some Timestamp is missing..bcoz. every 30 second my data is coming..so some data points are missing.. How to find that data points..and insert there data as NAN ... Please help me..
you can use resample('30S', base=8) method:
In [20]: x.resample('30S', base=8).mean()
Out[20]:
Temperature1 Temperature2
Timestamp
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:02:38 NaN NaN
2016-09-01 00:03:08 NaN NaN
2016-09-01 00:03:38 NaN NaN
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:04:38 NaN NaN
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
The solution above assumes that the Timestamp is of datetime dtype and that it has been set as index.
If Timestamp is a regular column (not index), then starting from Pandas 0.19.0 we can resample on regular columns (it must be of datetime dtype), using on='column_name' parameter:
In [26]: x.resample('30S', on='Timestamp', base=8).mean()
Out[26]:
Temperature1 Temperature2
Timestamp
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:02:38 NaN NaN
2016-09-01 00:03:08 NaN NaN
2016-09-01 00:03:38 NaN NaN
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:04:38 NaN NaN
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
if you need to find your base value dynamically you can do it this way:
In [21]: x.index[0].second
Out[21]: 8
from docs:
base : int, default 0
For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for
5minfrequency, base could range from0through4.Defaults to
0
Assuming that the timestamps have been converted to datetime, if you set the index to the timestamp column and then reindex with a date range then the missing values will show up:
In [94]:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.set_index('Timestamp')
df
Out[94]:
Temperature1 Temperature2
Timestamp
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
In [96]:
df.reindex(pd.date_range(start=df.index[0], end=df.index[-1], freq='30s'))
Out[96]:
Temperature1 Temperature2
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:02:38 NaN NaN
2016-09-01 00:03:08 NaN NaN
2016-09-01 00:03:38 NaN NaN
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:04:38 NaN NaN
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
This assumes that the timestamps are regular, here we construct a date range using the timestamp first and last values with a frequency of 30 seconds:
In [97]:
pd.date_range(start=df.index[0], end=df.index[-1], freq='30s')
Out[97]:
DatetimeIndex(['2016-09-01 00:00:08', '2016-09-01 00:00:38',
'2016-09-01 00:01:08', '2016-09-01 00:01:38',
'2016-09-01 00:02:08', '2016-09-01 00:02:38',
'2016-09-01 00:03:08', '2016-09-01 00:03:38',
'2016-09-01 00:04:08', '2016-09-01 00:04:38',
'2016-09-01 00:05:08', '2016-09-01 00:05:38'],
dtype='datetime64[ns]', freq='30S')
When you reindex with this, any missing index labels become NaN values
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With