I have a DataFrame and need to calculate percent change compared to the beginning of the year by companies. Is there any way to use pct_change() or other method to perform this task? Thanks!
df looks like
security date price
IBM 1/1/2016 100
IBM 1/2/2016 102
IBM 1/3/2016 108
AAPL 1/1/2016 1000
AAPL 1/2/2016 980
AAPL 1/3/2016 1050
AAPL 1/4/2016 1070
results I want
security date price change
IBM 1/1/2016 100 NA
IBM 1/2/2016 102 2%
IBM 1/3/2016 108 8%
AAPL 1/1/2016 1000 NA
AAPL 1/2/2016 980 -2%
AAPL 1/3/2016 1050 5%
AAPL 1/4/2016 1070 7%
Sounds like you are looking for an expanding_window version of pct_change(). This doesn't exist out of the box AFAIK, but you could roll your own:
df.groupby('security')['price'].apply(lambda x: x.div(x.iloc[0]).subtract(1).mul(100))
This works, assuming you're already ordered by date within each possible grouping.
def pct_change(df):
df['pct'] = 100 * (1 - df.iloc[0].price / df.price)
return df
df.groupby('security').apply(pct_change)
Late to the party, but I faced a similar problem and I'd like to share my solution with you, in case it's helpful for anybody.
def get_variation(values: pd.Series) -> np.float64:
base = values.iloc[0] # first element in window iteration
current = values.iloc[-1] # last element in window iteration
return (current - base) / base if base else 0 # avoid ZeroDivisionError
variations = df.groupby('security')['price'].expanding(min_periods=2).apply(get_variation)
df = df.assign(change=variations.droplevel(0))
| security | date | price | change | |
|---|---|---|---|---|
| 0 | IBM | 1/1/2016 | 100 | NaN |
| 1 | IBM | 1/2/2016 | 102 | 0.02 |
| 2 | IBM | 1/3/2016 | 108 | 0.08 |
| 3 | AAPL | 1/1/2016 | 1000 | NaN |
| 4 | AAPL | 1/2/2016 | 980 | -0.02 |
| 5 | AAPL | 1/3/2016 | 1050 | 0.05 |
| 6 | AAPL | 1/4/2016 | 1070 | 0.07 |
You can do what you are asking by:
pd.DataFrame by security column,price column,apply on the expanding window operation and passing your custom variation function defined in step 1 as argument,pd.DataFrame.Optionally, you can replace the expanding window operation in step 3 with a rolling window operation by calling .rolling(window=2, min_periods=2), to get a step-by-step variation on each security price. window=2 defines the size of the window in order to return two elements on each iteration and min_periods=2 sets the minimum data needed for calculations (will result in NaN, if else).
Your variation function should accept a pd.Series as argument and use the first and last items in the collection to calculate the variation. Here, I'm using a standard variation function used in finance to calculate interest rates. The last line has an if/else statement to avoid ZeroDivisionError.
def get_variation(values: pd.Series) -> np.float64:
base = values.iloc[0]
current = values.iloc[-1]
return (current - base) / base if base else 0
pd.DataFrame by security columnNext, you should call .groupby('security') to group pd.DataFrame by security column, in order to prepare data for group calculations.
grouped_df = df.groupby('security')
| security | date | price | change | ||
|---|---|---|---|---|---|
| security | |||||
| AAPL | 3 | AAPL | 1/1/2016 | 1000 | NaN |
| 4 | AAPL | 1/2/2016 | 980 | -0.02 | |
| 5 | AAPL | 1/3/2016 | 1050 | 0.05 | |
| 6 | AAPL | 1/4/2016 | 1070 | 0.07 | |
| IBM | 0 | IBM | 1/1/2016 | 100 | NaN |
| 1 | IBM | 1/2/2016 | 102 | 0.02 | |
| 2 | IBM | 1/3/2016 | 108 | 0.08 |
price columnNext, you should use a expanding window operation by calling .expanding(min_periods=2) on the price groups. This will iterate each price group and get you all data up to the current iteration as a pd.Series. You call .expanding(min_periods=n) to set n as the minimum number of observations required to return a value on each iteration (or Nan, if else). In your case, given that you required NaN on the first record as it is not compared to anything else, you should pass min_periods=2; if you rather have 0 as a result instead, pass min_periods=1.
windows = grouped_df['price'].expanding(min_periods=2)
apply on the expanding window operation and passing your custom variation function defined in step 1 as argumentCalling .apply(get_variation) will apply your custom variation formula to each resulting window and return the result.
grouped_variations = windows.apply(get_variation)
| security | ||
|---|---|---|
| AAPL | 3 | NaN |
| 4 | -0.02 | |
| 5 | 0.05 | |
| 6 | 0.07 | |
| IBM | 0 | NaN |
| 1 | 0.02 | |
| 2 | 0.08 |
As you can see on .4, data is presented with a multi-index. We get rid of the outer index level ('AAPL', 'IBM') by calling .droplevel(0), to prepare the data for merging it correctly into the original dataframe.
variations = grouped_variations.droplevel(0)
pd.DataFrame
Finally, we assign the price variations into the original dataframe by calling df.assign. Data will be joined into the destination on its index.
df = df.assign(change=variations)
| security | date | price | change | |
|---|---|---|---|---|
| 0 | IBM | 1/1/2016 | 100 | NaN |
| 1 | IBM | 1/2/2016 | 102 | 0.02 |
| 2 | IBM | 1/3/2016 | 108 | 0.08 |
| 3 | AAPL | 1/1/2016 | 1000 | NaN |
| 4 | AAPL | 1/2/2016 | 980 | -0.02 |
| 5 | AAPL | 1/3/2016 | 1050 | 0.05 |
| 6 | AAPL | 1/4/2016 | 1070 | 0.07 |
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With