I am looking for a regex that gives words starting with letters, numbers or underscore ('_'). It can only include dot ('.') between the words and not at the end and should remove all other special characters. e.g
WARC-_Target-URI: http://www.allchocolate.com/health/basics/
should give
WARC, _Target, URI, http, www.allchocolate.com, health, basics
Any sort of help will be appreciated.
Here you are:
from re import findall
print findall(r'\w[\w.]*\w', 'WARC-_Target-URI: http://www.allchocolate.com/health/basics/')
['WARC', '_Target', 'URI', 'http', 'www.allchocolate.com', 'health', 'basics']
Unlike the other solutions, this will work in any situation (not just the example that you posted).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With