Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interchanging two substrings in a pandas string column

I'd like to replace a string with a part of itself in a Pandas DataFrame.

Example:

Change MSc Joe L. Scott to Joe L. Scott MSc

So only MSc needs to be moved. I can fix this with a regex but don't know how to do this with a Pandas DataFrame

result = re.sub(r'(MSc)(.*)' , r'\2 \1',s)

I was thinking of something like this (but what's to_replace and value here?):

['Name_modified'].replace(regex=True, inplace=True, to_replace= **??**, value=**??**)

Or using DataFrame.sub()

But despite the documentation I do not get it done

like image 468
John Doe Avatar asked Nov 24 '25 08:11

John Doe


1 Answers

As a contrived example, consider

df = pd.DataFrame({'Name' : ['MSc Joe L. Scott', 'BSc J. Doe']})
df
               Name
0  MSc Joe L. Scott
1        BSc J. Doe

You can use str.replace here with backreferences. This can easily handle multiple different designations.

designations = ['MSc', 'BSc']
df['Name_modified'] = df['Name'].str.replace(
    rf"^({'|'.join(designations)})\s(.*)$", r"\2 \1")

df
               Name     Name_modified
0  MSc Joe L. Scott  Joe L. Scott MSc
1        BSc J. Doe        J. Doe BSc

You can assign this result back.

like image 75
cs95 Avatar answered Nov 26 '25 22:11

cs95



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!