I have a pd.Series with duplicated indices, and each index containing a set of booleans:
FA155    False
FA155    False
FA155    False
FA155    True
FA155    True
FA155    True
FA155    True
FA155    True
FA155    False
What I'm trying to do for each different index in an efficient way, is to keep only as True the first and last True values of the sequence, and set the rest to False. There can also be False values between those that are True.
So for this sample the result would be:
FA155    False
FA155    False
FA155    False
FA155    True
FA155    False
FA155    False
FA155    False
FA155    True
FA155    False
Any help would be very appreciated.
groupby. nth() function is used to get the value corresponding the nth row for each group. To get the first value in a group, pass 0 as an argument to the nth() function.
Sort Values in Descending Order with Groupby You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first n rows. It is useful for quickly testing if your object has the right type of data in it.
The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.
agg is an alias for aggregate . Use the alias. A passed user-defined-function will be passed a Series for evaluation. The aggregation is for each column.
You can use loc with idxmax with both your original df and your inverted df.
This will yield the index of your first and last True values. Just set the different indexes to False afterwards.
For example:
z = sio("""i    v
FA154    False
FA155    False
FA155    True
FA155    True
FA155    True
FA155    True
FA155    True
FA155    False
FA156    False
FA156    True
FA156    False
FA156    False
FA156    True""")
df = pd.read_table(z, delim_whitespace=True)
    i       v
0   FA154   False
1   FA155   False
2   FA155   True
3   FA155   True
4   FA155   True
5   FA155   True
6   FA155   True
7   FA155   False
8   FA156   False
9   FA156   True
10  FA156   False
11  FA156   False
12  FA156   True
idxmax()Which is the same thing as getting your df and using reset_index. Then, get list of indexes for you first (v1) and last (v2) True values:
v1 = df.groupby("i").v.idxmax().values
v2 = df[::-1].groupby("i").v.idxmax().values
And use your logic:
df.loc[v1, "v"] = True & df.loc[v1, "v"]
df.loc[v2, "v"] = True & df.loc[v2, "v"]
df.loc[~df.index.isin(np.concatenate([v1,v2])), "v"] = False
The idea behind using & is not to accidentally set any False values to True.
Result:
>>> df.set_index("i")
        v
i   
FA154   False
FA155   False
FA155   True
FA155   False
FA155   False
FA155   False
FA155   True
FA155   False
FA156   False
FA156   True
FA156   False
FA156   False
FA156   True
You filter True values and then you aggregate to find the first and last values. Then you can use loc to replace those values in df.
df is your dataframe. col is the name of your column with True and False values
df["nb"] = range(df.shape[0])
df.reset_index(inplace=True)
elem = (df[df[col]==True].groupby("index")["nb"].agg({ "first_True": 'first', "last_True":"last"})).values
indexes_to_False = sum(elem.tolist(), [])
df.loc[indexes_to_False, col] = False
Then you can drop the column nb and reindex if you wish
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With