Say I have df as follows:
MyCol
Red Motor
Green Taxi
Light blue small Taxi
Light blue big Taxi
I would like to split the color and the vehicle into two columns. I used this command to split the last word. But sometimes, there is a 'big' or 'small' associated with the car name. How can do the splitting with conditions?
df[['color','vehicle']] = df.myCol.str.rsplit(pat=' ', n=1, expand=True)
I think the best approach is to use extract with a regex pattern
df['MyCol'].str.extract('^(.*?)\s((?:small|big)?\s?\w+)$')
0 1
0 Red Motor
1 Green Taxi
2 Light blue small Taxi
3 Light blue big Taxi
Regex details:
^: Matches start of the string(.*?): first capturing group
.*?: matches any character zero or more times but as few times as possible (lazy match)\s: Matches the space((?:small|big)?\s?\w+): Second capturing group
(?:small|big)? : matches small or big zero or one time\s?: matches space zero or one time\w+: matches word characters oner or more times$: matches end of the stringThe Series.str.extract is used here to extracts two groups using a regular expression. The first group is before a whitespace and the second group is after the whitespace. The second group may contain the word "small" or "big" and returns a new DataFrame with two columns containing the extracted groups.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With