I might be doing this wrong, or there might be a much better way than this, as i am still new to Python. Apologies upfront for any obvious mistakes.
I have a Pandas Dataframe with a STR column that holds a Date and Time. It is STR because the times are "Broadcast" formatted, which means there are 29 hours in the day. so we will see dates like 01/Jan/2018 29:59:59. As 1 second to that and its 02/Jan/2018 06:00:00.
My goal here is to convert this data to a real time. Which means any hour between 24 and 29 requires a date shift too. I have already split the STR into 2 new Columns ['Dt'] and ['Ti'], from ['Ti'], pulled out the Hour to a new Column as ['Hr'] and made it an INT.
I then applied a pd.to_datetime to the ['Dt'] and added a rule.
df['Dt'] = np.where(df['Hr'] > 23, df['Dt']+pd.DateOffset(1),df['Dt']+pd.DateOffset(0) )
this works perfect.
I now need to change the Hour to be real time, eg, 24 = 00, 25 = 02 etc.
I thought the best way was to use a DICT and map it, so i made a DICT,
HourMap = {'24':'00','25':'01','26':'02','27':'03','28':'04','29':'05','30':'06'}
Then wrote this
df['Hr1'] = np.where(df['Hr'] > 23, df.replace({'Hr':HourMap}),df['Hr'])
But I get a "ValueError"
ValueError: operands could not be broadcast together with shapes (273,) (273,29) (273,)
I have looked at those rows in the dataframe and they are just normal INTs. On testing I can apply Maths to them (eg. df['Test'] = df['Hr'] + 1.
I did convert them to STR and try the same rules, but got the same error.
Am I just crazy?
Thanks,
I believe need change:
df.replace({'Hr':HourMap})
to map and if some values is not matched and returned NaNs replace it to original values by fillna:
df['Hr'].map(HourMap).fillna(df['Hr'])
#alternative solution if performance is not important in large df
#df['Hr'].replace(HourMap)
because df.replace return all columns of DataFrame with replaced column Hr
You really shouldn't be using a dictionary here, you don't even need the np.where. Use the modulo operator
In [1]: import numpy as np
In [2]: np.arange(31)%24
Out[2]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 0, 1, 2, 3, 4, 5, 6], dtype=int32)
You have numbers that 'wrap around' at 24, this is the text book use case for modulo. So the full code just becomes:
df['Hr1'] = df['Hr'] % 24
Also by the same token you can add to your dates without np.where by just making use of integer division
df['Dt'] = df['Dt']+pd.DateOffset(Df['Hr']//24)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With