Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

html sanitizer remove custom tag and keep inner content

Tags:

html

tags

How can i remove all unknown existings custom tags keeping html content in this following example :

<div>
  <h1>my header</h1>
  <custom:p>
     <h2>my Title</h2>
  </custom:p>
  <anothercustom:p>
     <h3>my SubTitle</h3>
  </anothercustom:p>
</div>

I would like to return

<div>
  <h1>my header</h1>
  <h2>my Title</h2>
  <h3>my SubTitle</h3>
</div>

Is there any solution with HTML sanitizer ?

Thanks for your help.

like image 317
Laurent T. Avatar asked Sep 18 '25 20:09

Laurent T.


2 Answers

I've been looking for the same thing. I found that HtmlSanitizer has a KeepChildNodes option in version 3.4.156, which I'm using, that does exactly this.

var sanitizer = new HtmlSanitizer();
sanitizer.KeepChildNodes = true;
sanitizer.Sanitize(html);
like image 98
Rahul Sekhar Avatar answered Sep 21 '25 11:09

Rahul Sekhar


You can use the HtmlSanitizer.RemovingTag event to keep the contents of the tag:

        var sanitizer = new HtmlSanitizer();

        sanitizer.RemovingTag += (sender, args) =>
        {
            args.Tag.OuterHtml = sanitizer.Sanitize(args.Tag.InnerHtml);
            args.Cancel = true;
        };

        var sanitized = sanitizer.Sanitize("<unknown>this will not be removed</unknown>");
like image 36
Luna Avatar answered Sep 21 '25 12:09

Luna