I want to find the first or primary coordinator from a list of coordinator names. However, I also need to save all other coordinators listed. As you can guess, the list of coordinator names has duplicates:
Tags Name
0 333000 Lala
1 333000 Dipsy
2 333000 Poe
3 111111 Tinky Winky
Therefore, in my own dataframe I would like to return:
Tags Primary Others
0 333000 Lala Dipsy, Poe
1 111111 Tinky Winky NaN
While I am able to return Primary fine with this code:
df['Primary'] = df.join(coordinator_df.groupby(['Tags']).nth(0)['Name'], on='Tags)['Name']
my attempt for Others returns an error:
df['Primary'] = df.join(coordinator_df.groupby(['Tags']).nth([0, 1, 2])['Name'], on='Tags)['Name']
Error:
ValueError: cannot reindex from a duplicate axis
I would appreciate help either with this specific error, or any other approach.
Try this:
def Others(x):
return ', '.join(x.iloc[1:])
df.groupby('Tags')['Name'].agg({'first', Others})
Output:
Others first
Tags
111111 Tinky Winky
333000 Dipsy, Poe Lala
Where instead of NaN, you have the empty string.
I would try to store all values in a list after grouping, and then just splitting that list column up into 2 new variables.
import numpy as np
df = df.groupby(['Tags']).agg(lambda x: list(x)).reset_index()
df['Primary'] = df['Name'].apply(lambda x: x[0])
df['Others'] = df['Name'].apply(lambda x: x[1:] if len(x) > 1 else np.nan)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With