I'm looking for a way to take a pandas Series and return new Series representing the number of prior, consecutive values that are higher/lower than each row in the Series:
a = pd.Series([30, 10, 20, 25, 35, 15])
...should output:
Value Higher than streak Lower than streak
30 0 0
10 0 1
20 1 0
25 2 0
35 4 0
15 0 3
This will allow someone to identify how important each "regional max/min" value is in a time series.
Thanks in advance.
Since you're looking backwards at the previous values to see if there are consecutive values, you're going to have to interact with indices somehow. This solution first looks at any values prior to the value at the current index to see if they are less than or greater than the value, and then sets any values to False where there was a False following it. It also avoids creating iterators over the DataFrame, which may speed up operations for larger datasets.
import pandas as pd
from operator import gt, lt
a = pd.Series([30, 10, 20, 25, 35, 15])
def consecutive_run(op, ser, i):
"""
Sum the uninterrupted consecutive runs at index i in the series where the previous data
was true according to the operator.
"""
thresh_all = op(ser[:i], ser[i])
# find any data where the operator was not passing. set the previous data to all falses
non_passing = thresh_all[~thresh_all]
start_idx = 0
if not non_passing.empty:
# if there was a failure, there was a break in the consecutive truth values,
# so get the final False position. Starting index will be False, but it
# will either be at the end of the series selection and will sum to zero
# or will be followed by all successive True values afterwards
start_idx = non_passing.index[-1]
# count the consecutive runs by summing from the start index onwards
return thresh_all[start_idx:].sum()
res = pd.concat([a, a.index.to_series().map(lambda i: consecutive_run(gt, a, i)),
a.index.to_series().map(lambda i: consecutive_run(lt, a, i))],
axis=1)
res.columns = ['Value', 'Higher than streak', 'Lower than streak']
print(res)
Result:
Value Higher than streak Lower than streak
0 30 0 0
1 10 1 0
2 20 0 1
3 25 0 2
4 35 0 4
5 15 3 0
import pandas as pd
import numpy as np
value = pd.Series([30, 10, 20, 25, 35, 15])
Lower=[(value[x]<value[:x]).sum() for x in range(len(value))]
Higher=[(value[x]>value[:x]).sum() for x in range(len(value))]
df=pd.DataFrame({"value":value,"Higher":Higher,"Lower":Lower})
print(df)
Lower Higher value
0 0 0 30
1 1 0 10
2 1 1 20
3 1 2 25
4 0 4 35
5 4 1 15
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With