Returning first values, then all the other values, in two separate columns in Pandas

Question

I want to find the first or primary coordinator from a list of coordinator names. However, I also need to save all other coordinators listed. As you can guess, the list of coordinator names has duplicates:

      Tags                   Name
0     333000                 Lala
1     333000                Dipsy
2     333000                  Poe
3     111111          Tinky Winky

Therefore, in my own dataframe I would like to return:

       Tags                Primary                              Others
0    333000                   Lala                          Dipsy, Poe
1    111111            Tinky Winky                                 NaN

While I am able to return Primary fine with this code:

df['Primary'] = df.join(coordinator_df.groupby(['Tags']).nth(0)['Name'], on='Tags)['Name']

my attempt for Others returns an error:

df['Primary'] = df.join(coordinator_df.groupby(['Tags']).nth([0, 1, 2])['Name'], on='Tags)['Name']

Error: ValueError: cannot reindex from a duplicate axis

I would appreciate help either with this specific error, or any other approach.

Quang Hoang · Accepted Answer

Try this:

def Others(x):
    return ', '.join(x.iloc[1:])

df.groupby('Tags')['Name'].agg({'first', Others})

Output:

            Others        first
Tags                           
111111              Tinky Winky
333000  Dipsy, Poe         Lala

Where instead of NaN, you have the empty string.

eva-vw · Answer

I would try to store all values in a list after grouping, and then just splitting that list column up into 2 new variables.

import numpy as np
df = df.groupby(['Tags']).agg(lambda x: list(x)).reset_index()

df['Primary'] = df['Name'].apply(lambda x: x[0])
df['Others'] = df['Name'].apply(lambda x: x[1:] if len(x) > 1 else np.nan)

Returning first values, then all the other values, in two separate columns in Pandas

Tags:

python

pandas

Christina Zhou

2 Answers

Quang Hoang

eva-vw

Recent Activity

Donate For Us

Returning first values, then all the other values, in two separate columns in Pandas

Tags:

python

pandas

Christina Zhou

2 Answers

Quang Hoang

eva-vw

Related questions

Recent Activity

Donate For Us