I have a pandas DataFrame that looks like this:
0               UDP/ax/bsd
1     T Traffic/sa
2     ICMP/v/e,stuff hi/a/abc,ab/a
I want to replace everything from the first encountered / till a comma or end of line. So I tried initially df.col_A.replace('/.+','',regex=True) which just gave me the first word (till first slash). 
To get comma separated words I attempted the following:
`df.col_A.replace('/.+[,$]',',',regex=True)` 
My logic being replace everything from slash till [comma or EOL]. This didn't have the expected behaviour. How do I amend this?
The expected ouput from line 2(3) of the data frame is:
ICMP,stuff hi, ab
Note that I am trying to avoid using split as I think this may take longer since it stores the irrelevant pieces as well.
You can use:
 >>> print re.sub(r'/[^,]*(,|$)', ' \1', 'ICMP/v/e,stuff hi/a/abc,ab/a')
ICMP stuff hi ab
RegEx Demo
RegEx Breakup:
/       # match literal /
[^,]*   # match 0 or more of any character that is not comma
(,|$)   # Match comma or end of line and capture it as group #1
Replacement is " \1" which means space followed by back-reference to group #1
The construction [....] matches a set of characters. In this context $ is a character. You should use the pipe (|) if you want to match alternative regular expressions (where $ is a regular expression). I also prefer to use \Z instead of $ and since the normal + operator eat as much as possible, you need to use +? get the shortest extent and not eat the entire line.
df.col_A.replace('/.+?(,|$)','',regex=True)
However, since the + operator try matches as much as possible, you can get away with this:
df.col_A.replace('/[^,]+','',regex=True)
Where [^,]+ means "as many characters as possible that is not a comma."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With