I have a dataframe, df
import pandas as pd
df = pd.DataFrame(
    {
        'Name':['foo','foo','foo','bar','bar','bar','baz','baz','baz'],
        'Color':['red','blue','red','green','green','blue','yellow','orange','red']
    }
)
  Name   Color
0  foo     red
1  foo    blue
2  foo     red
3  bar   green
4  bar   green
5  bar    blue
6  baz  yellow
7  baz  orange
8  baz     red
I would like to add an enumerating suffix for each Name that has a duplicate Color
pd.DataFrame(
    {
        'Name':['foo_1','foo','foo_2','bar_1','bar_2','bar','baz','baz','baz'],
        'Color':['red','blue','red','green','green','blue','yellow','orange','red']
    }
)
    Name   Color
0  foo_1     red
1    foo    blue
2  foo_2     red
3  bar_1   green
4  bar_2   green
5    bar    blue
6    baz  yellow
7    baz  orange
8    baz     red
As you can see, there is a suffix with an incremental count for each time a Name has a repeat Color. If a Name has a Color only one time, there is no suffix added.
I was thinking of taking a .groupby() with an aggregate of .value_counts() to get a total count, and somehow use that to assign the suffixes if necessary. Here is an idea I had that seems very inefficient:
# group by name aggregate color value counts
gb = df.groupby(['Name']).agg(Color_count=('Color','value_counts')).reset_index()
# keep only counts that are >1 ie need a suffix
gb = gb.loc[gb.Color_count > 1].copy()
# merge back to original df
df.merge(gb, on=['Name','Color'],how='left').fillna(0)
# from here, somehow start an incremental suffix for nonzero values of `Color_count`...
                Check for duplicates based on your columns, then use the result to groupby using the same columns as groups and use cumsum to count the duplicates. Combine the values with your column and after that use a mask to keep the non duplicate values.
m =  df.duplicated(subset=['Name', 'Color'], keep=False)
n = m.groupby([df['Name'], df['Color']]).cumsum()
df['New_Name'] = (df['Name'] + '_' + n.astype('str')).where(m , df['Name'])
End result:
 Name   Color New_Name
0  foo     red    foo_1
1  foo    blue      foo
2  foo     red    foo_2
3  bar   green    bar_1
4  bar   green    bar_2
5  bar    blue      bar
6  baz  yellow      baz
7  baz  orange      baz
8  baz     red      baz
                        Another possible solution:
g = df.groupby(['Name', 'Color'], sort=False)
df.assign(
    Name = lambda d: d['Name'].mask(
        g.transform('size').gt(1),
        d['Name'] + '_' + (g.cumcount().add(1).astype(str))
    )
)
The code groups rows by (Name, Color) with groupby(sort=False) to preserve original order, then uses assign to rebuild Name: transform('size').gt(1) flags groups where the (Name, Color) pair occurs more than once; mask keeps the original value when the flag is false and, when true, replaces it by concatenating the base name, an underscore, and a 1-based index computed via cumcount().add(1).astype(str); the string cast ensures safe concatenation, yielding foo_1, foo_2, etc., while unique pairs remain unchanged.
Output:
    Name   Color
0  foo_1     red
1    foo    blue
2  foo_2     red
3  bar_1   green
4  bar_2   green
5    bar    blue
6    baz  yellow
7    baz  orange
8    baz     red
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With