I have text of a form where there are paragraphs of text with urls interspersed. I would like to parse the string creating html links from the urls and using the following text as the descriptive link text i.e.
possibly some text here http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present
into
<a href="http://www.somewebsite.com/some/path/somepage.html">descriptive text which may or may not be present</a>
This SO article, JS: Find URLs in Text, Make Links, is relevant to what I'm attempting to do but simply places the url as the text within the anchor element.
I am successfully matching the url with
var urlRE= new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?([^ ])+");
but am unsure how to perform the match afterwards.
I came across this post Regex - Matching text AFTER certain characters which seems applicable. I've attempted to wrap my RE in /(?<=my url pattern here).+/ but get an error stating that there is an invalid group and that this results in an invalid RE.
In that post J-Law mentions that
Variable-length lookbehinds aren’t allowed
Is this what I'm attempting to do?
Since I'm already matching the url I feel like I could easily do some substring math to get the desired results.
I'm just using this as an attempt to learn more about regex.
Thanks
Just add another capturing group to capture all the stuff at the end and make your inner groups non-capturing. Something like:
var urlRE= new RegExp("((?:[a-zA-Z0-9]+://)?(?:[a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?(?:[a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(?::[0-9]+)?(?:[^ ])+)(.*)$");
var s = "possibly some text here http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present"
var match = urlRE.exec(s);
alert(match[0] + "\n\n" + match[1] + "\n\n" + match[2]);
// Returns:
// ["http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present",
// "http://www.somewebsite.com/some/path/somepage.html",
// " descriptive text which may or may not be present"]
I wrapped your entire regex in brackets () to form the first capturing group and inside that I made all your existing groups non-capturing with ?:, You don't absolutely need to do that (making them non-capturing), but it does simplify the output. Then I just added one more group (.*) to capture everything else until the end of the string $.
After .exec if you have a match, your match will be in [0], the url part will be in [1] and the rest of your text in [2]. This is why we used the non-capturing groups because otherwise you'd have a bunch of other captures that may or may not be useful.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With