Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Time Offsetting in pandas using multiple DataFrames

Tags:

python

pandas

I have two pandas Dataframes:

One a float called sdtarray (representing seconds):

    z1  z2  z3  ...
0   NaN NaN NaN
1   2.6 3.4 63.0
2   NaN NaN NaN
3   0.1 1.1 60.7
4   4.7 5.2 64.9
5   0.1 0.6 61.1
...
[33945 rows x 95 columns]

and another a formatted date (thenewtime):

0    2014-09-01 05:22:00
1    2014-09-01 05:38:00
2    2014-09-01 06:08:00
3    2014-09-01 06:27:00
4    2014-09-01 06:37:00
5    2014-09-01 06:57:00
...
Name: thenewtime, dtype: datetime64[ns]

What is the best way to offset each row in the float DataFrame (sdtarray) by the corresponding date (same row index but thenewtime DataFrame) - ending up with a DataFrame of dates?

example output would be:

    z4                  z5                  z6              …
0   NaN                 NaN                 NaN 
1   01/09/2014 05:38:02 01/09/2014 05:38:03 01/09/2014 05:39:03 
2   NaN                 NaN                 NaN 
3   01/09/2014 06:27:00 01/09/2014 06:27:01 01/09/2014 06:28:00 
4   01/09/2014 06:37:04 01/09/2014 06:37:05 01/09/2014 06:38:04 
5   01/09/2014 06:57:00 01/09/2014 06:57:00 01/09/2014 06:58:01
…               

I am using pandas 0.13.1 which I know doesn't help things but am stuck with this as it has to be compatible with numpy 1.7.1 due to ArcGIS requirements.

I managed to get the right output using itertuples on each row (and using a timedelta but it's incredibly slow for large data (34k rows by 100 cols) and there must be a more efficient way not reliant on examining each rol / column in a loop.

Any help and guidance would be appreciated :)

like image 318
PaulB Avatar asked May 16 '26 20:05

PaulB


1 Answers

Below, I convert the dataframe of seconds (df1) into datetime.timedelta objects after first checking that they are not NaNs. I then add these values to the dates in df2.

Tested under Pandas 0.13.1.

import datetime as dt
import pandas as pd

df1 = pd.DataFrame({'z1': [None, 2.6, None, 0.1, 4.7, 0.1], 
                    'z2': [None, 3.4, None, 1.1, 5.2, 0.6], 
                    'z3': [None, 63, None, 60.7, 64.9, 61.1]})
df2 = pd.DataFrame({'Datetime': ['2014-09-01 05:22', '2014-09-01 05:38', '2014-09-01 06:08', 
                                 '2014-09-01 06:27', '2014-09-01 06:37', '2014-09-01 06:57']})
df2['Datetime'] = pd.to_datetime(df2.Datetime)

result = df1.applymap(lambda x: dt.timedelta(seconds=x) if not np.isnan(x) else None) 
         + np.tile(df2.values, (1, df1.shape[1]))

>>> pd.DataFrame(result)
                          z1                         z2                         z3
0                        NaT                        NaT                        NaT
1 2014-09-01 05:38:02.600000 2014-09-01 05:38:03.400000        2014-09-01 05:39:03
2                        NaT                        NaT                        NaT
3 2014-09-01 06:27:00.100000 2014-09-01 06:27:01.100000 2014-09-01 06:28:00.700000
4 2014-09-01 06:37:04.700000 2014-09-01 06:37:05.200000 2014-09-01 06:38:04.900000
5 2014-09-01 06:57:00.100000 2014-09-01 06:57:00.600000 2014-09-01 06:58:01.100000
like image 104
Alexander Avatar answered May 19 '26 09:05

Alexander