I'm building code matching and replacing several types of patterns (bbCode). One of the matches I'm trying to make, is [url=http:example.com] replacing all with anchor links. I'm also trying to match and replace plain textual urls with anchor links. And the combination of these two is where I'm running in to some trouble.
Since my routine is recursive, matching and replacing the entire text each run, I'm having trouble NOT replacing urls already contained in anchors.
This is the recursive routine I'm running:
if(text.search(p.pattern) !== -1) {
text = text.replace(p.pattern, p.replace);
}
This is my regexp for plain urls so far:
/(?!href="|>)(ht|f)tps?:\/\/.*?(?=\s|$)/ig
And URLs can start with http or https or ftp or ftps, and contain whatever text afterwards, ending with whitespace or a punctuation mark (. / ! / ? / ,)
Just to be absolutely clear, I'm using this as a test for matches:
Should match:
Should not match
I would really appretiate any help I can get here.
EDIT The first accepted solution by jkshah below does have some flaws. For instance, it will match
<img src="http://www.example.com/test.jpg">
The comments in Jerry's solution however did make me want to try it again, and that solution solved this issue as well. I therefore accepted that solution instead. Thank you all for your kind help on this. :)
Maybe something like this?
/(?:(?:ht|f)tps?:\/\/|www)[^<>\]]+?(?![^<>\]]*([>]|<\/))(?=[\s!,?\]]|$)/gm
And then trim the dots at the end if any.
regex101 demo
Though if the link contains more punctuations, it might cause some issues... I would then suggest capturing the link first, then remove the trailing punctuations with a second replace.
[^<>\]]+
will match every character except <
, >
and ]
(?![^<>\]]*([>]|<\/))
prevents the matching of a link between html tags.
(?=[\s!,?\]]|$)
is for the punctuations and whitespace.
Following regex should work. It's giving desired result on your sample inputs.
/((?:(?:ht|f)tps?:\/\/|www)[^\s,?!]+(?!.*<\/a>))/gm
See it in action here
(?!.*<\/a>)
- Negative lookahead for anchor
Matching content will be stored in $1
and can be used in replace string.
EDIT
To not match content with <img src ..
following can be used
(^(?!.*<img\s+src)(?:(?:ht|f)tps?:\/\/|www)[^\s,?!]+(?!.*<\/a>))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With