Let's say I have the following series.
s = pandas.Series([0, 1, 2, 3, 3, 3, 3, 4, 5, 6, 6, 6, 7, 7])
I can keep the first duplicate (for each duplicate value) of the series with the following
s[s.duplicated(keep='first')]
I can keep the last duplicate (for each duplicate value) of the series with the following
s[s.duplicated(keep='last')]
However, I'm looking to do the following.
3, but keep the other 3's. Keep all other remaining duplicates.3, but drop all other 3's. Keep all other remaining duplicates.I've been racking my brain using cumsum() and diff() to capture the change when a duplicate has been detected. I imagine a solution would involve this, but I can't seem to get a perfect solution. I've gone through too many truth tables right now...
ind = s[s.duplicated()].index[0]
gives you the first index where a record is duplicated. Use it to drop.
In [45]: s.drop(ind)
Out[45]:
0 0
1 1
2 2
4 3
5 3
6 3
7 4
8 5
9 6
10 6
11 6
12 7
13 7
dtype: int64
For part 2, there must be a neat solution, but the only one I can think of is to use create a series of bools to indicate where the index does not equal ind and the value at the index does equal the ind value and then use np.logical_xor:
s[np.logical_xor(s.index != ind, s==s.iloc[ind])]
Out[95]:
0 0
1 1
2 2
4 3
7 4
8 5
9 6
10 6
11 6
12 7
13 7
dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With