Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding a keyword in a node and getting the node name in DOM

I want to search a DOM for a specific keyword, and when it is found, I want to know which Node in the tree it is from.

static void search(String segment, String keyword) {

    if (segment == null)
        return;

    Pattern p=Pattern.compile(keyword,Pattern.CASE_INSENSITIVE);
    StringBuffer test=new StringBuffer (segment);
    matcher=p.matcher(test);

    if(!matcher.hitEnd()){        
        total++;
        if(matcher.find())
        //what to do here to get the node?
    }
}

public static void traverse(Node node) {
    if (node == null || node.getNodeName() == null)
        return;

    search(node.getNodeValue(), "java");

    check(node.getFirstChild());

    System.out.println(node.getNodeValue() != null && 
                       node.getNodeValue().trim().length() == 0 ? "" : node);
    check(node.getNextSibling());
}
like image 648
lonesome Avatar asked Dec 04 '25 18:12

lonesome


1 Answers

Consider using XPath (API):

// the XML & search term
String xml = "<foo>" + "<bar>" + "xml java xpath" + "</bar>" + "</foo>";
InputSource src = new InputSource(new StringReader(xml));
final String term = "java";
// search expression and term variable resolver
String expression = "//*[contains(text(),$term)]";
final QName termVariableName = new QName("term");
class TermResolver implements XPathVariableResolver {
  @Override
  public Object resolveVariable(QName variableName) {
    return termVariableName.equals(variableName) ? term : null;
  }
}
// perform the search
XPath xpath = XPathFactory.newInstance().newXPath();
xpath.setXPathVariableResolver(new TermResolver());
Node node = (Node) xpath.evaluate(expression, src, XPathConstants.NODE);

If you want to do more complex matching via regular expressions, you can provide your own function resolver.

Breakdown of the XPath expression //*[contains(text(),$term)]:

  • //* the asterisk selects any element; the double-slash means any parent
  • [contains(text(),$term)] is a predicate that matches the text
  • text() is a function that gets the element's text
  • $term is a variable; this can be used to resolve the term "java" via the variable resolver; a resolver is preferred to string concatenation to prevent injection attacks (similar to SQL injection issues)
  • contains(arg1,arg2) is a function that returns true if arg1 contains arg2

XPathConstants.NODE tells the API to select a single node; you could use NODESET to get all matches as a NodeList.

like image 189
McDowell Avatar answered Dec 06 '25 07:12

McDowell



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!