I have two pandas Dataframes:
One a float called sdtarray (representing seconds):
z1 z2 z3 ...
0 NaN NaN NaN
1 2.6 3.4 63.0
2 NaN NaN NaN
3 0.1 1.1 60.7
4 4.7 5.2 64.9
5 0.1 0.6 61.1
...
[33945 rows x 95 columns]
and another a formatted date (thenewtime):
0 2014-09-01 05:22:00
1 2014-09-01 05:38:00
2 2014-09-01 06:08:00
3 2014-09-01 06:27:00
4 2014-09-01 06:37:00
5 2014-09-01 06:57:00
...
Name: thenewtime, dtype: datetime64[ns]
What is the best way to offset each row in the float DataFrame (sdtarray) by the corresponding date (same row index but thenewtime DataFrame) - ending up with a DataFrame of dates?
example output would be:
z4 z5 z6 …
0 NaN NaN NaN
1 01/09/2014 05:38:02 01/09/2014 05:38:03 01/09/2014 05:39:03
2 NaN NaN NaN
3 01/09/2014 06:27:00 01/09/2014 06:27:01 01/09/2014 06:28:00
4 01/09/2014 06:37:04 01/09/2014 06:37:05 01/09/2014 06:38:04
5 01/09/2014 06:57:00 01/09/2014 06:57:00 01/09/2014 06:58:01
…
I am using pandas 0.13.1 which I know doesn't help things but am stuck with this as it has to be compatible with numpy 1.7.1 due to ArcGIS requirements.
I managed to get the right output using itertuples on each row (and using a timedelta but it's incredibly slow for large data (34k rows by 100 cols) and there must be a more efficient way not reliant on examining each rol / column in a loop.
Any help and guidance would be appreciated :)
Below, I convert the dataframe of seconds (df1) into datetime.timedelta objects after first checking that they are not NaNs. I then add these values to the dates in df2.
Tested under Pandas 0.13.1.
import datetime as dt
import pandas as pd
df1 = pd.DataFrame({'z1': [None, 2.6, None, 0.1, 4.7, 0.1],
'z2': [None, 3.4, None, 1.1, 5.2, 0.6],
'z3': [None, 63, None, 60.7, 64.9, 61.1]})
df2 = pd.DataFrame({'Datetime': ['2014-09-01 05:22', '2014-09-01 05:38', '2014-09-01 06:08',
'2014-09-01 06:27', '2014-09-01 06:37', '2014-09-01 06:57']})
df2['Datetime'] = pd.to_datetime(df2.Datetime)
result = df1.applymap(lambda x: dt.timedelta(seconds=x) if not np.isnan(x) else None)
+ np.tile(df2.values, (1, df1.shape[1]))
>>> pd.DataFrame(result)
z1 z2 z3
0 NaT NaT NaT
1 2014-09-01 05:38:02.600000 2014-09-01 05:38:03.400000 2014-09-01 05:39:03
2 NaT NaT NaT
3 2014-09-01 06:27:00.100000 2014-09-01 06:27:01.100000 2014-09-01 06:28:00.700000
4 2014-09-01 06:37:04.700000 2014-09-01 06:37:05.200000 2014-09-01 06:38:04.900000
5 2014-09-01 06:57:00.100000 2014-09-01 06:57:00.600000 2014-09-01 06:58:01.100000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With