I am trying use the following regular expression to extract domain name from a text, but it just produce nothing, what's wrong with it?
I don't know if this is suitable to ask this "fix code" question, maybe I should read more.
I just want to save some time.
Thanks.
pat_url = re.compile(r'''
(?:https?://)*
(?:[\w]+[\-\w]+[.])*
(?P<domain>[\w\-]*[\w.](com|net)([.](cn|jp|us))*[/]*)
''')
print re.findall(pat_url,"http://www.google.com/abcde")
I want the output to be google.com.
Don't use regex for this. Use the urlparse standard library instead. It's far more straightforward and easier to read/maintain.
http://docs.python.org/library/urlparse.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With