Splitting based on condtions

Question

Say I have df as follows:

MyCol
Red Motor
Green Taxi 
Light blue small Taxi  
Light blue big Taxi

I would like to split the color and the vehicle into two columns. I used this command to split the last word. But sometimes, there is a 'big' or 'small' associated with the car name. How can do the splitting with conditions?

df[['color','vehicle']] = df.myCol.str.rsplit(pat=' ', n=1, expand=True)

Shubham Sharma · Accepted Answer

I think the best approach is to use extract with a regex pattern

df['MyCol'].str.extract('^(.*?)\s((?:small|big)?\s?\w+)$')

            0           1
0         Red       Motor
1       Green        Taxi
2  Light blue  small Taxi
3  Light blue    big Taxi

Regex details:

^: Matches start of the string
(.*?): first capturing group
- .*?: matches any character zero or more times but as few times as possible (lazy match)
\s: Matches the space
((?:small|big)?\s?\w+): Second capturing group
- (?:small|big)? : matches small or big zero or one time
- \s?: matches space zero or one time
- \w+: matches word characters oner or more times
$: matches end of the string

The Series.str.extract is used here to extracts two groups using a regular expression. The first group is before a whitespace and the second group is after the whitespace. The second group may contain the word "small" or "big" and returns a new DataFrame with two columns containing the extracted groups.

Splitting based on condtions

Tags:

python

dataframe

test tes

1 Answers

Shubham Sharma

Recent Activity

Donate For Us

Splitting based on condtions

Tags:

python

dataframe

test tes

1 Answers

Shubham Sharma

Related questions

Recent Activity

Donate For Us