Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting based on condtions

Say I have df as follows:

MyCol
Red Motor
Green Taxi 
Light blue small Taxi  
Light blue big Taxi 

I would like to split the color and the vehicle into two columns. I used this command to split the last word. But sometimes, there is a 'big' or 'small' associated with the car name. How can do the splitting with conditions?

df[['color','vehicle']] = df.myCol.str.rsplit(pat=' ', n=1, expand=True)
like image 231
test tes Avatar asked Dec 06 '25 20:12

test tes


1 Answers

I think the best approach is to use extract with a regex pattern

df['MyCol'].str.extract('^(.*?)\s((?:small|big)?\s?\w+)$')

            0           1
0         Red       Motor
1       Green        Taxi
2  Light blue  small Taxi
3  Light blue    big Taxi

Regex details:

  • ^: Matches start of the string
  • (.*?): first capturing group
    • .*?: matches any character zero or more times but as few times as possible (lazy match)
  • \s: Matches the space
  • ((?:small|big)?\s?\w+): Second capturing group
    • (?:small|big)? : matches small or big zero or one time
    • \s?: matches space zero or one time
    • \w+: matches word characters oner or more times
  • $: matches end of the string

The Series.str.extract is used here to extracts two groups using a regular expression. The first group is before a whitespace and the second group is after the whitespace. The second group may contain the word "small" or "big" and returns a new DataFrame with two columns containing the extracted groups.

like image 97
Shubham Sharma Avatar answered Dec 08 '25 09:12

Shubham Sharma



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!