Suppose I have a timeseries of values called X.
And I now want to know the first index after which the values of some other series Y will be reached by X. Or put differently, for each index i I want to know the first index j after which the line formed by X from j-1 to j intersects the value of Y at i.
Below is an example set of example X, Y series, showing the resulting values for Z. The length of these series is always the same:
X | Y | Z
2 | 3 | 2
2 | 3 | NaN
4 | 4.5 | 3
5 | 5 | NaN
4 | 5 | NaN
3 | 2 | 6
1 | 2 | NaN
Do pandas or numpy offer something that will assist with this? This function will be run on large datasets so I can't use python loops.
Use numpy broadcasting by compare with shifted values, then get indices of first Trues by DataFrame.idxmax with small improvement - added NaN column for get NaN if all False values per row and last remove duplicates values:
a = df['X']
b = df['Y']
a1 = a.values
a2 = a.shift(-1).ffill().values
b1 = b.values[:, None]
arr = (((a1 < b1) & (a2 > b1)) | ((a1 > b1) & (a2 < b1)))
df = pd.DataFrame(arr)
df[np.nan] = True
out = df.idxmax(axis=1) + 1
out = out.mask(out.duplicated())
print (out)
0 2.0
1 NaN
2 3.0
3 NaN
4 NaN
5 6.0
6 NaN
dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With