I am trying to build a simple search-engine using HtmlAgilityPack and Xpath with C# (.NET 4). I want to find every node containing a userdefined searchword, but I can't seem to get the XPath right. For Example:
<HTML>
 <BODY>
  <H1>Mr T for president</H1>
   <div>We believe the new president should be</div>
   <div>the awsome Mr T</div>
   <div>
    <H2>Mr T replies:</H2>
     <p>I pity the fool who doesn't vote</p>
     <p>for Mr T</p>
   </div>
  </BODY>
</HTML>
If the specified searchword is "Mr T" I'd want the following nodes: <H1>, The second <div>, <H2> and the second <p>.
I have tried numerous variants of doc.DocumentNode.SelectNodes("//text()[contains(., "+ searchword +")]"); but I always seem to wind up with every single node in the entire DOM.
Any hints to get me in the right direction would be very appreciated.
Use:
//*[text()[contains(., 'Mr T')]]
This selects all elements in the XML document that have a text-node child which contains the string 'Mr T'.
This can also be written shorter as:
//text()[contains(., 'Mr T')]/..
This selects the parent(s) of any text node that contains the string 'Mr T'.
According to Xpath, if you want to find a specific keyword you need to follow the format ("keyword" is the word you like to search) :
//*[text()[contains(., 'keyword')]]
You have to follow the same format as above in C#, keyword is the string variable you call:
doc.DocumentNode.SelectNodes("//*[text()[contains(., '" + keyword + "')]]");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With