Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NumPy/Pandas: remove sequential duplicate values (equivalent of bash uniq without sort) [duplicate]

Given a Pandas Series (or numpy array) like this:

import pandas as pd
myseries = pd.Series([1, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 2, 2, 3, 3, 1])

Is there a good way to remove sequential duplicates, much like the unix uniq tool does? The numpy/pandas unique() and pandas drop_duplicates functions remove all duplicates (like unix's | sort | uniq), but I don't want this:

>>> print(myseries.unique())
[1 2 3 4]

I want this:

>>> print(myseries.my_mystery_function())
[1, 2, 3, 4, 3, 2, 3, 1]
like image 970
DrAl Avatar asked Nov 16 '25 16:11

DrAl


1 Answers

Compare by ne (!=) with shifted Series and filter by boolean indexing:

myseries = myseries[myseries.ne(myseries.shift())].tolist()
print (myseries)
[1, 2, 3, 4, 3, 2, 3, 1]

If performance is important, use Divakar solution.

like image 191
jezrael Avatar answered Nov 19 '25 05:11

jezrael