Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex find last body tag

I know that a parser would best be suited for this situation but in my current situation it has to be just straight javascript.

I have a regex to find the closing body tag of an html doc.

var closing_body_tag = /(<\/body>)/i;

However, this fails when source has more than 1 body tag set. So I was thinking about going with something like this..

var last_closing_body_tag = /(<\/body>)$/gmi;

This works for the case when multiple tags are found, but for some reason it is failing on cases with just 1 set of tags.

Am I making a mistake that would cause mixed results for single tag cases?

Yes, I understand more than one body tag is incorrect, however, we have to handle all bad source.

like image 904
Adam Avatar asked Sep 18 '25 06:09

Adam


1 Answers

You can use this regex:

  /<\/body>(?![\s\S]*<\/body>[\s\S]*$)/i

(?![\s\S]*<\/body>[\s\S]*$) is a lookahead that ensures there is no more closing body tag before the end of the string.

Here is a demo.

Sample code for adding a tag:

var re = /<\/body>(?![\s\S]*<\/body>[\s\S]*$)/i; 
var str = '<html>\n<body>\n</body>\n</html>\n<html>\n<body>\n</body>\n</html>';
var subst = '<tag/>'; 
var result = str.replace(re, subst);
like image 95
Wiktor Stribiżew Avatar answered Sep 20 '25 21:09

Wiktor Stribiżew