Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all classes and ids from parsed HTML with HtmlAgilityPack

I use HtmlAgilityPack for parsing some html page, I extract html tags from this page like this:

HtmlNode bodyContent = document.DocumentNode.SelectSingleNode("//body");
var all_text = bodyContent.SelectNodes("//div | //ul | //p | //table");

in returned html each tag contain class and id, I want to remove all id-s and all class how I can to do this?

like image 472
Alex Avatar asked Oct 15 '25 13:10

Alex


1 Answers

Maybe you should check this link: link.

As far as I can, tell when you have HtmlNode you can use its property Attributes. This collection has method Remove(string) that receive name of attribute that you want to remove. Well, I used it like this in one small project. I am not sure if this helps you.

So basically:

HtmlNode bodyContent = document.DocumentNode.SelectSingleNode("//body");
var all_text = bodyContent.SelectNodes("//div | //ul | //p | //table");

foreach(var node in all_text)
{
   node.Attributes.Remove("class");
   node.Attributes.Remove("id");
} 
like image 150
Ivan Vasiljevic Avatar answered Oct 18 '25 09:10

Ivan Vasiljevic



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!