I am using a Regular Expression pattern for my blog site to make URL addresses as clickable links, what works great. The pattern has this format:
/(href=")?([-a-zA-Z0-9@:%_\+.~#?&\/\/=]{2,256}\.[a-z]{2,4}\b(\/?[-a-zA-Z0-9@:%_\+.~#?&\/\/=]+)?)/
But in the near past I found that this pattern also matches filenames so when the user post some filename in the comment, system will make it as link. You can see this effect here:

What I am trying to achieve is match every of these URL formats except the last one example (see image below), so mysite.com or filename.php won't be highlighted.

Inputs what should be matched:
+--------------------------+------------------------------------------------------+
|         Example          |                     Explanation                      |
+--------------------------+------------------------------------------------------+
| http(s)://www.mysite.com | because it starts with http(s):// and has URL format |
| www.mysite.com           | because it starts with www. and has URL format       |
+--------------------------+------------------------------------------------------+
Inputs what shouldn't be matched:
+-------------------+--------------------------------------------------+
|      Example      |                    Explanation                   |
+-------------------+--------------------------------------------------+
| mysite.com        | because it doesn't start with http(s):// or www. |
|                   | even it has URL format                           |
| http(s)://mytext  | because it doesn't have URL format               |
| http://localhost/ | because it doesn't have URL format               |
+-------------------+--------------------------------------------------+
How URL format looks like?
For this case, we can specify URL format by this pattern:
([-a-zA-Z0-9_.]{2,256}\.[a-z]{2,4}\b(\/?[-a-zA-Z0-9:%_\+.~#?&\/=]+)?))
Examples:
google.com, google.co.uk, accounts.google.com, google.com/somepath/ ...
A tried adding www\. string into this pattern, but no matches found then. So how can I edit this regex to match URLs beginning with 'www' or 'http(s)://' and nothing else?
Thanks in advance.
This regexp is definitelly not perfect but will do what you want:
(http[s]?:\/\/|www.|ftp:\/\/){1,2}([-a-zA-Z0-9_]{2,256}\.[a-z]{2,4}\b(\/?[-a-zA-Z0-9@:%_\+.~#?&\/=]+)?)
It can be tricked to match non-urls, but this can't be abused. Increasing smartness greatly increases complexity.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With