I have made a web crawler by using Asp.net. It's work well. Problem is when I want to extract content from it. Some of content wrap by between HTML tags. I have some of solutions to extract content from it but I don't know which one are better. It should be good performance and easy to implement.
Using Regex with many patterns to extact content.
Using Linq to XML to extract content.
Using XPath to extract content.
Somebody please help me choose the better solutions. I think I will go with XPath but I am not sure about performance are better than RegEx or Linq2XML.
Many thanks for any ideas.
None of your solutions is particularly good.
Instead, you should use a HTML parsing library like the Html Agility Pack.
Neither. Use a proper HTML parser such as HTML Agility Pack
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With