Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to remove <br> from <pre>

Tags:

html

c#

regex

I am trying to remove the <br /> tags that appear in between the <pre></pre> tags. My string looks like

string str = "Test<br/><pre><br/>Test<br/></pre><br/>Test<br/>---<br/>Test<br/><pre><br/>Test<br/></pre><br/>Test"

string temp = "`##`";
while (Regex.IsMatch(result, @"\<pre\>(.*?)\<br\>(.*?)\</pre\>", RegexOptions.IgnoreCase))
{
    result = System.Text.RegularExpressions.Regex.Replace(result, @"\<pre\>(.*?)\<br\>(.*?)\</pre\>", "<pre>$1" + temp + "$2</pre>", RegexOptions.IgnoreCase);
}
str = str.Replace(temp, System.Environment.NewLine);

But this replaces all <br> tags between first and the last <pre> in the whole text. Thus my final outcome is:

str = "Test<br/><pre>\r\nTest\r\n</pre>\r\nTest\r\n---\r\nTest\r\n<pre>\r\nTest\r\n</pre><br/>Test"

I expect my outcome to be

str = "Test<br/><pre>\r\nTest\r\n</pre><br/>Test<br/>---<br/>Test<br/><pre>\r\nTest\r\n</pre><br/>Test"
like image 941
Ashish Avatar asked Aug 13 '10 06:08

Ashish


1 Answers

If you are parsing whole HTML pages, RegEx is not a good choice - see here for a good demonstration of why.

Use an HTML parser such as the HTML Agility Pack for this kind of work. It also works with fragments like the one you posted.

like image 179
Oded Avatar answered Sep 22 '22 02:09

Oded



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!