Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify duplicated values and append a unique string to them

Tags:

python

pandas

Let's say I have the following DataFrame:

df = pd.DataFrame({"my_col": ["one","two","two","one","two","one","one"]})

  my_col
0    one
1    two
2    two
3    one
4    two
5    one
6    one

I would like append a string on the duplicated values with their duplicate count. Here is what I mean:

   my_col
0  one_0
1  two_0
2  two_1
3  one_1
4  two_2
5  one_2
6  one_3

I know I could do something like df.groupby('my_col').apply(my_function_to_do_this) with something like this :

def my_function_to_do_this(group: pd.DataFrame) -> pd.DataFrame:
    str_to_append = pd.Series(range(group.shape[0]), index=group.index).astype(str)
    group["my_col"] += "_" + str_to_append
    return group

but that's quite slow on a large DataFrame with a lot of small groups of like 4 rows maximum.

I'm trying to find a faster approach if any.

Many thanks in advance for the help !

like image 887
Benjamin Carlier Avatar asked Dec 07 '25 10:12

Benjamin Carlier


1 Answers

Use GroupBy.cumcount for counter, convert to strings and add to original with Series.str.cat:

df['my_col'] = df['my_col'].str.cat(df.groupby('my_col').cumcount().astype(str), sep='_')
print (df)
  my_col
0  one_0
1  two_0
2  two_1
3  one_1
4  two_2
5  one_2
6  one_3

Or join by +:

df['my_col'] += '_' +  df.groupby('my_col').cumcount().astype(str)
#longer version
#df['my_col'] = df['my_col'] + '_' +  df.groupby('my_col').cumcount().astype(str)
like image 90
jezrael Avatar answered Dec 08 '25 23:12

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!