Lets suppose this dataframe, which I want to filter in such a way I iterate from the last index backwards until I find two consecutive 'a' = 0. Once that happens, the rest of the dataframe (including both zeros) shall be filtered:
a
1 6.5
2 0
3 0
4 4.0
5 0
6 3.2
Desired result:
a
4 4.0
5 0
6 3.2
My initial idea was ussing apply for filtering, and inside the apply function using shift(1) == 0 & shift(2) == 0, but based of that I could filter each row individually, but not returning false for the remaining rows after the double zero is found unless I use a global variable or something nasty like that.
Any smart way of doing this?
You could do that with sort_index with ascending=False, cumsum and dropna:
In [89]: df[(df.sort_index(ascending=False) == 0).cumsum() < 2].dropna()
Out[89]:
a
4 4.0
5 0.0
6 3.2
Step by step:
In [99]: df.sort_index(ascending=False)
Out[99]:
a
6 3.2
5 0.0
4 4.0
3 0.0
2 0.0
1 6.5
In [100]: df.sort_index(ascending=False) == 0
Out[100]:
a
6 False
5 True
4 False
3 True
2 True
1 False
In [101]: (df.sort_index(ascending=False) == 0).cumsum()
Out[101]:
a
6 0
5 1
4 1
3 2
2 3
1 3
In [103]: (df.sort_index(ascending=False) == 0).cumsum() < 2
Out[103]:
a
6 True
5 True
4 True
3 False
2 False
1 False
In [104]: df[(df.sort_index(ascending=False) == 0).cumsum() < 2]
Out[104]:
a
1 NaN
2 NaN
3 NaN
4 4.0
5 0.0
6 3.2
EDIT
IIUC you could use something like that using pd.rolling_sum and first_valid_index if your index started from 1:
df_sorted = df.sort_index(ascending=False)
df[df_sorted[(pd.rolling_sum((df_sorted==0), window=2) == 2)].first_valid_index()+1:]
With the @jezrael example:
In [208]: df
Out[208]:
a
1 6.5
2 0.0
3 0.0
4 7.0
5 0.0
6 0.0
7 0.0
8 4.0
9 0.0
10 0.0
11 3.2
12 5.0
df_sorted = df.sort_index(ascending=False)
In [210]: df[df_sorted[(pd.rolling_sum((df_sorted==0), window=2) == 2)].first_valid_index()+1:]
Out[210]:
a
11 3.2
12 5.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With