I'd like to replace a string with a part of itself in a Pandas DataFrame.
Example:
Change MSc Joe L. Scott to Joe L. Scott MSc
So only MSc needs to be moved. I can fix this with a regex but don't know how to do this with a Pandas DataFrame
result = re.sub(r'(MSc)(.*)' , r'\2 \1',s)
I was thinking of something like this (but what's to_replace and value here?):
['Name_modified'].replace(regex=True, inplace=True, to_replace= **??**, value=**??**)
Or using DataFrame.sub()
But despite the documentation I do not get it done
As a contrived example, consider
df = pd.DataFrame({'Name' : ['MSc Joe L. Scott', 'BSc J. Doe']})
df
Name
0 MSc Joe L. Scott
1 BSc J. Doe
You can use str.replace here with backreferences. This can easily handle multiple different designations.
designations = ['MSc', 'BSc']
df['Name_modified'] = df['Name'].str.replace(
rf"^({'|'.join(designations)})\s(.*)$", r"\2 \1")
df
Name Name_modified
0 MSc Joe L. Scott Joe L. Scott MSc
1 BSc J. Doe J. Doe BSc
You can assign this result back.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With