Let's say I have the following DataFrame:
df = pd.DataFrame({'player': ['LBJ', 'LBJ', 'LBJ', 'Kyrie', 'Kyrie', 'LBJ', 'LBJ'],
                   'points': [25, 32, 26, 21, 29, 21, 35]})
How can I perform the operation opposite of ffill so I can get the following DataFrame:
df = pd.DataFrame({'player': ['LBJ', np.nan, np.nan, 'Kyrie', np.nan, 'LBJ', np.nan],
                   'points': [25, 32, 26, 21, 29, 21, 35]})
That is, I want to fill directly repeated values with NaN.
Here's what I have so far but I'm hoping there's a built-in pandas method or a better approach:
for i, (index, row) in enumerate(df.iterrows()):
    if i == 0:
        continue
    go_back = 1
    while True:
        past_player = df.ix[i-go_back, 'player']
        if pd.isnull(past_player):
            go_back += 1
            continue
        if row['player'] == past_player:
            df.set_value(index, 'player', value=np.nan)
        break
Pandas DataFrame ffill() Method The ffill() method replaces the NULL values with the value from the previous row (or previous column, if the axis parameter is set to 'columns' ).
ffill() function is used to fill the missing value in the dataframe. 'ffill' stands for 'forward fill' and will propagate last valid observation forward.
method='ffill': Ffill or forward-fill propagates the last observed non-null value forward until another non-null value is encountered. method='bfill': Bfill or backward-fill propagates the first observed non-null value backward until another non-null value is met.
subtract() function is used for finding the subtraction of dataframe and other, element-wise. This function is essentially same as doing dataframe – other but with a support to substitute for missing data in one of the inputs.
ffinv = lambda s: s.mask(s == s.shift())
df.assign(player=ffinv(df.player))
  player  points
0    LBJ      25
1    NaN      32
2    NaN      26
3  Kyrie      21
4    NaN      29
5    LBJ      21
6    NaN      35
Probably not the most efficient solution but working would be to use itertools.groupby and itertools.chain:
>>> df['player'] = list(itertools.chain.from_iterable([key] + [float('nan')]*(len(list(val))-1) 
                        for key, val in itertools.groupby(df['player'].tolist())))
>>> df
  player  points
0    LBJ      25
1    NaN      32
2    NaN      26
3  Kyrie      21
4    NaN      29
5    LBJ      21
6    NaN      35
More specifically this illustrates how it works:
for key, val in itertools.groupby(df['player']):
    print([key] + [float('nan')]*(len(list(val))-1))
giving:
['LBJ', nan, nan]
['Kyrie', nan]
['LBJ', nan]
which is then "chained" together.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With