I have a data frame in pandas that is organized like so:
btc_price['btc_price'] = pd.to_numeric(btc_price['btc_price'].str.replace(',', ''))
btc_price.head(n=120)
Out[4]: 
                      btc_price
time                           
2017-08-27 22:50:00  4,389.6113
2017-08-27 22:51:00  4,389.0850
2017-08-27 22:52:00  4,388.8625
2017-08-27 22:53:00  4,389.7888
2017-08-27 22:56:00  4,389.9138
2017-08-27 22:57:00  4,390.1663
2017-08-27 22:58:00  4,390.2600
2017-08-27 22:59:00  4,392.4013
2017-08-27 23:00:00  4,391.6588
2017-08-27 23:01:00  4,391.9213
2017-08-27 23:02:00  4,394.0113
2017-08-27 23:03:00  4,396.9713
2017-08-27 23:04:00  4,397.3350
2017-08-27 23:05:00  4,397.0700
2017-08-27 23:06:00  4,398.6188
2017-08-27 23:07:00  4,398.5725
2017-08-27 23:08:00  4,397.4713
2017-08-27 23:09:00  4,398.0938
2017-08-27 23:10:00  4,398.7775
2017-08-27 23:11:00  4,398.0200
2017-08-27 23:12:00  4,397.9513
2017-08-27 23:13:00  4,398.0613
2017-08-27 23:14:00  4,398.0900
2017-08-27 23:15:00  4,398.0063
2017-08-27 23:16:00  4,397.6088
2017-08-27 23:17:00  4,394.3763
2017-08-27 23:46:00  4,389.1100
2017-08-27 23:48:00  4,390.6763
2017-08-27 23:49:00  4,392.5388
2017-08-27 23:49:00  4,392.5388
                        ...
2017-08-28 00:51:00  4,367.5738
2017-08-28 00:51:00  4,367.5738
2017-08-28 00:52:00  4,367.7888
2017-08-28 00:53:00  4,368.4188
2017-08-28 00:54:00  4,368.8225
2017-08-28 00:55:00  4,368.7438
2017-08-28 00:57:00  4,368.4700
2017-08-28 00:58:00  4,367.9963
2017-08-28 00:59:00  4,366.4750
2017-08-28 01:00:00  4,359.1988
2017-08-28 01:01:00  4,355.2825
2017-08-28 01:02:00  4,352.3675
2017-08-28 01:03:00  4,354.2188
2017-08-28 01:04:00  4,353.5263
2017-08-28 01:05:00  4,354.2488
2017-08-28 01:06:00  4,358.8063
2017-08-28 01:07:00  4,359.5738
2017-08-28 01:08:00  4,361.7313
2017-08-28 01:09:00  4,360.8638
2017-08-28 01:10:00  4,363.0750
2017-08-28 01:11:00  4,362.3375
2017-08-28 01:12:00  4,362.3338
2017-08-28 01:13:00  4,358.8000
2017-08-28 01:14:00  4,354.0463
2017-08-28 01:15:00  4,356.1950
2017-08-28 01:16:00  4,359.5975
2017-08-28 01:17:00  4,360.1588
2017-08-28 01:18:00  4,362.2338
2017-08-28 01:19:00  4,363.7900
2017-08-28 01:20:00  4,362.6150
I would like to create a new column change that has a value of -1,0,1. These should correspond to a 5% decrease in price over the past hour (-1), "no change" (0), and a 5% increase in price over the past hour(1). Also, the value of one hour should be mutable so I could change it to one day or 30 minutes for example, as I see fit. 
I found similar questions here and here but I am new to python and am not really sure how to apply these solutions specifically to my problem.
The other option is to calculate the avg price for each hour and then calculate the % change on an hourly basis, but I would prefer to be able to use a rolling timeframe.
I have also tried doing this in R with no luck. Please help.
I started by trying:
btc_price['change'] = btc_price.pct_change(periods=60, fill_method='pad', limit=None, freq=None) 
This works but does not quite give me what I am looking for, I would like to compare each value to the min and max values for the past "time frame" and calculate the % change based on this value instead of simply comparing two rows.
What I want to end up with is something like this (incomplete):
# Calculate the % change in btc_price
def calc_change(df):
    array = df.values
    a = array[:,1]
# Apply % change comparison to timeframe 
def rolling(df, period, func, min_periods = None):
        if min_periods is None:
            min_periods = period
        result = pd.Series(np.nan, index = df.index)
        for i in range(1, len(df) + 1):
            sub_df = df.iloc[max(i)]
I believe I could use something like df.rolling() found here but am not quite sure if this is exactly what I want because I dont quite understand how it works. An explanation would be great. 
pd.read_clipboardpd.to_numeric.After having valid data, you can do:
In [59]: df.head()
Out[59]: 
                     btc_price
time                          
2017-09-07 22:50:00  4389.6113
2017-09-07 22:51:00  4389.0850
2017-09-07 22:52:00  4388.8625
2017-09-07 22:53:00  4389.7888
2017-09-07 22:56:00  4389.9138
In [60]: df = df.resample('1MIN').ffill(); df.head(10)
Out[60]: 
                     btc_price
time                          
2017-09-07 22:50:00  4389.6113
2017-09-07 22:51:00  4389.0850
2017-09-07 22:52:00  4388.8625
2017-09-07 22:53:00  4389.7888
2017-09-07 22:54:00  4389.7888
2017-09-07 22:55:00  4389.7888
2017-09-07 22:56:00  4389.9138
2017-09-07 22:57:00  4390.1663
2017-09-07 22:58:00  4390.2600
2017-09-07 22:59:00  4392.4013
In [61]: WINDOW = 5  # 5 minutes, you can change to any window you want. Has to match resolution from resample
In [63]: df['change'] = df['btc_price'].pct_change(periods=WINDOW); df.head(10)
Out[63]: 
                     btc_price    change
time                                    
2017-09-07 22:50:00  4389.6113       NaN
2017-09-07 22:51:00  4389.0850       NaN
2017-09-07 22:52:00  4388.8625       NaN
2017-09-07 22:53:00  4389.7888       NaN
2017-09-07 22:54:00  4389.7888       NaN
2017-09-07 22:55:00  4389.7888  0.000040
2017-09-07 22:56:00  4389.9138  0.000189
2017-09-07 22:57:00  4390.1663  0.000297
2017-09-07 22:58:00  4390.2600  0.000107
2017-09-07 22:59:00  4392.4013  0.000595
In [64]: import numpy as np
]n [67]: df['change_label'] = pd.cut(df['change'], [np.NINF, -0.05, 0.05, np.PINF], labels=['below 5%', 'around 0%', 'above 5%']) 
In [69]: df.head(10)
Out[69]: 
                     btc_price        change change_label
time                                                     
2017-09-07 22:50:00  4389.6113           NaN          NaN
2017-09-07 22:51:00  4389.0850           NaN          NaN
2017-09-07 22:52:00  4388.8625           NaN          NaN
2017-09-07 22:53:00  4389.7888           NaN          NaN
2017-09-07 22:54:00  4389.7888           NaN          NaN
2017-09-07 22:55:00  4389.7888  4.043638e-05    around 0%
2017-09-07 22:56:00  4389.9138  1.888321e-04    around 0%
2017-09-07 22:57:00  4390.1663  2.970701e-04    around 0%
2017-09-07 22:58:00  4390.2600  1.073400e-04    around 0%
2017-09-07 22:59:00  4392.4013  5.951311e-04    around 0%
Feels that you need to:
Resample in order to get predictable resolutionFFill in order to not have holes. Or handle that in other way that makes sense in your case.pct_change.pd.cut. Also, simple df['change'].map(lamba v: # here logic) would work.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With