URL replace with anchor, not replacing existing anchors

Question

I'm building code matching and replacing several types of patterns (bbCode). One of the matches I'm trying to make, is [url=http:example.com] replacing all with anchor links. I'm also trying to match and replace plain textual urls with anchor links. And the combination of these two is where I'm running in to some trouble.

Since my routine is recursive, matching and replacing the entire text each run, I'm having trouble NOT replacing urls already contained in anchors.

This is the recursive routine I'm running:

if(text.search(p.pattern) !== -1) {
    text = text.replace(p.pattern, p.replace);
}

This is my regexp for plain urls so far:

/(?!href="|>)(ht|f)tps?:\/\/.*?(?=\s|$)/ig

And URLs can start with http or https or ftp or ftps, and contain whatever text afterwards, ending with whitespace or a punctuation mark (. / ! / ? / ,)

Just to be absolutely clear, I'm using this as a test for matches:

Should match:

http://www.example.com
http://www.example.com/test
http://example.com/test
www.example.com/test

Should not match

<a href="http://www.example.com">http://www.example.com </a>
<a href="http://www.example.com/test">http://www.example.com/test </a>
<a href="http://example.com/test">http://example.com/test </a>
<a href="www.example.com/test">www.example.com/test </a>

I would really appretiate any help I can get here.

EDIT The first accepted solution by jkshah below does have some flaws. For instance, it will match

<img src="http://www.example.com/test.jpg">

The comments in Jerry's solution however did make me want to try it again, and that solution solved this issue as well. I therefore accepted that solution instead. Thank you all for your kind help on this. :)

Jerry · Accepted Answer

Maybe something like this?

/(?:(?:ht|f)tps?:\/\/|www)[^<>\]]+?(?![^<>\]]*([>]|<\/))(?=[\s!,?\]]|$)/gm

And then trim the dots at the end if any.

regex101 demo

Though if the link contains more punctuations, it might cause some issues... I would then suggest capturing the link first, then remove the trailing punctuations with a second replace.

[^<>\]]+ will match every character except <, > and ]

(?![^<>\]]*([>]|<\/)) prevents the matching of a link between html tags.

(?=[\s!,?\]]|$) is for the punctuations and whitespace.

jkshah · Answer

Following regex should work. It's giving desired result on your sample inputs.

/((?:(?:ht|f)tps?:\/\/|www)[^\s,?!]+(?!.*<\/a>))/gm

See it in action here

(?!.*<\/a>) - Negative lookahead for anchor

Matching content will be stored in $1 and can be used in replace string.

EDIT

To not match content with <img src .. following can be used

(^(?!.*<img\s+src)(?:(?:ht|f)tps?:\/\/|www)[^\s,?!]+(?!.*<\/a>))

URL replace with anchor, not replacing existing anchors

Tags:

javascript

regex

Øystein Amundsen

2 Answers

Jerry

jkshah

Recent Activity

Donate For Us

URL replace with anchor, not replacing existing anchors

Tags:

javascript

regex

Øystein Amundsen

2 Answers

Jerry

jkshah

Related questions

Recent Activity

Donate For Us