I want to detect sign change of my data using either pandas or numpy. I want to count the number(s) of id which changes sign of y between two immediate TIMESTEP values (eg. for 2800 and 2900 TIMESTEPs, id 313 has changed sign (y becomes negative). I have tried the below code by counting negatives and then using drop duplicate but that again not efficient and correct.
df_negatives0 = df0.query('y < 0')
df_nonduplicate0=df_negatives0.drop_duplicates(subset=["id"])
My dataset:
TIMESTEP id mass y
0 42 0.755047 0.489375
0 245 0.723805 0.479446
0 344 0.675664 0.463363
...
...
2800 313 0.795699 0.00492984
2800 425 0.68311 0.282356
2900 42 0.755047 0.424421
2900 245 0.723805 0.0378489
2900 344 0.675664 0.127917
2900 313 0.795699 -0.0149792
2900 425 0.68311 0.273884
...
...
upto
10000000
My desired data:
TIMESTEP id_count mass
2900 1 0.795699
...
...
500000 2 0.85245 + 0.54852 (i want to sum the masses if id count is more than one)
...
...
upto
10000000
There is a dedicated function in numpy np.sign (thanks to @Asclepius for flagging an error in previous version of this answer) and for sign change from one row to the next, it's possible to use the .diff method:
from numpy import sign
from pandas import DataFrame
df = DataFrame([-2, 0, -1, 3, -2], columns=["x"])
# this will return the sign of the float x
df["sign"] = sign(df["x"])
# this will return the difference between the sign of two consecutive rows
print(df["sign"].diff())
# 0 NaN
# 1 1.0
# 2 -1.0
# 3 2.0
# 4 -2.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With