pandas

Question

I'm looking for a way to take a pandas Series and return new Series representing the number of prior, consecutive values that are higher/lower than each row in the Series:

a = pd.Series([30, 10, 20, 25, 35, 15])

...should output:

Value   Higher than streak  Lower than streak
30      0                   0
10      0                   1
20      1                   0
25      2                   0
35      4                   0
15      0                   3

This will allow someone to identify how important each "regional max/min" value is in a time series.

Thanks in advance.

benjwadams · Accepted Answer

Since you're looking backwards at the previous values to see if there are consecutive values, you're going to have to interact with indices somehow. This solution first looks at any values prior to the value at the current index to see if they are less than or greater than the value, and then sets any values to False where there was a False following it. It also avoids creating iterators over the DataFrame, which may speed up operations for larger datasets.

import pandas as pd
from operator import gt, lt

a = pd.Series([30, 10, 20, 25, 35, 15])

def consecutive_run(op, ser, i):
    """
    Sum the uninterrupted consecutive runs at index i in the series where the previous data
    was true according to the operator.
    """
    thresh_all = op(ser[:i], ser[i])
    # find any data where the operator was not passing.  set the previous data to all falses
    non_passing = thresh_all[~thresh_all]
    start_idx = 0
    if not non_passing.empty:
        # if there was a failure, there was a break in the consecutive truth values,
        # so get the final False position. Starting index will be False, but it
        # will either be at the end of the series selection and will sum to zero
        # or will be followed by all successive True values afterwards
        start_idx = non_passing.index[-1]
    # count the consecutive runs by summing from the start index onwards
    return thresh_all[start_idx:].sum()


res = pd.concat([a, a.index.to_series().map(lambda i: consecutive_run(gt, a, i)),
                 a.index.to_series().map(lambda i: consecutive_run(lt, a, i))],
       axis=1)
res.columns = ['Value', 'Higher than streak', 'Lower than streak']
print(res)

Result:

   Value  Higher than streak  Lower than streak
0     30                   0                  0
1     10                   1                  0
2     20                   0                  1
3     25                   0                  2
4     35                   0                  4
5     15                   3                  0

2Obe · Answer

import pandas as pd
import numpy as np

value = pd.Series([30, 10, 20, 25, 35, 15])



Lower=[(value[x]<value[:x]).sum() for x in range(len(value))]
Higher=[(value[x]>value[:x]).sum() for x in range(len(value))]


df=pd.DataFrame({"value":value,"Higher":Higher,"Lower":Lower})

print(df)





      Lower  Higher  value
0       0      0     30
1       1      0     10
2       1      1     20
3       1      2     25
4       0      4     35
5       4      1     15

pandas - Count streak of values higher/lower than current rows

Tags:

python

series

time-series

Bruno Vieira

2 Answers

benjwadams

2Obe

Recent Activity

Donate For Us