I need to remove cases like this:
<text> </text>
I have codes that works when there is no whitespace, but what about if there is whitespace?
Code:
doc = etree.XML("""<root><a>1</a><b><c></c></b><d></d></root>""")
def remove_empty_elements(doc):
for element in doc.xpath('//*[not(node())]'):
element.getparent().remove(element)
I also need to do it with lxml and not BeautifulSoup.
This XPath,
//*[not(*)][not(normalize-space())]
will select all leaf elements with only whitespace content.
For your example specifically,
<root><a>1</a><b><c></c></b><d></d></root>
these elements will be selected: c and d.
For an example that also includes whitespace-only elements,
<root>
<a>1</a>
<b>
<c></c>
</b>
<d/>
<e> </e>
<f>
</f>
</root>
these elements will be selected: c, d, e, and f.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With