Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove Empty Attributes from XML

Tags:

java

xml

xpath

jaxp

I have a buggy xml that contains empty attributes and I have a parser that coughs on empty attributes. I have no control over the generation of the xml nor over the parser that coughs on empty attrs. So what I want to do is a pre-processing step that simply removes all empty attributes.

I have managed to find the empty attributes, but now I don't know how to remove them:

   XPathFactory xpf = XPathFactory.newInstance();
   XPath xpath = xpf.newXPath();
   XPathExpression expr = xpath.compile("//@*");
   Object result = expr.evaluate(d, XPathConstants.NODESET);

   if (result != null) {
    NodeList nodes = (NodeList) result;
    for(int node=0;node<nodes.getLength();node++)
    {
     Node n = nodes.item(node);
     if(isEmpty(n.getTextContent()))
     {
      this.log.warn("Found empty attribute declaration "+n.toString());
      NamedNodeMap parentAttrs = n.getParentNode().getAttributes();
      parentAttrs.removeNamedItem(n.getNodeName());
     }
    }

   } 

This code gives me a NPE when accessing n.getParentNode().getAttributes(). But how can I remove the empty attribute from an element, when I cannot access the element?

like image 263
er4z0r Avatar asked Sep 06 '25 03:09

er4z0r


2 Answers

If you want to limit it to just the empty attributes, you can use this XPATH:

//*[@*[.='']]

To find attributes that are either empty or that have only whitespace:

//*[@*[normalize-space()='']].

That way you select the attributes you want to remove and don't have to loop over every single attribute just to find the empty ones.

like image 107
Mads Hansen Avatar answered Sep 07 '25 21:09

Mads Hansen


The following stylesheet will copy all content in the source document - except attributes that contain only whitespace. The first template simply copies everything - including empty attributes. However, the second template has a higher priority than the first due to its use of a predicate, which is why it will be chosen in preference to the more general first template when an empty attribute is encountered: and this second template does not generate any output.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> 
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="@*[normalize-space()='']"/>
</xsl:stylesheet>
like image 39
Eamon Nerbonne Avatar answered Sep 07 '25 20:09

Eamon Nerbonne