Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I make lxml's parser preserve whitespace outside of the root element?

I am using lxml to manipulate some existing XML documents, and I want to introduce as little diff noise as possible. Unfortunately by default lxml.etree.XMLParser doesn't preserve whitespace before or after the root element of a document:

>>> xml = '\n    <etaoin>shrdlu</etaoin>\n'
>>> lxml.etree.tostring(lxml.etree.fromstring(xml))
'<etaoin>shrdlu</etaoin>'
>>> lxml.etree.tostring(lxml.etree.fromstring(xml)) == xml
False

Is this possible using lxml? Is it supported by the underlying libxml2?

like image 457
DanC Avatar asked Jan 22 '26 02:01

DanC


1 Answers

I don't know of any XML library that will do it for you. But using a regex sounds like a decent idea if you really need to do this.

>>> xml = '\n    <etaoin>shrdlu</etaoin>\n'
>>> head, tail = re.findall(r"^\s*|\s*$", xml)[:2]
>>> root = etree.fromstring(xml)
>>> out = head + etree.tostring(root) + tail
>>> out == xml
True
like image 66
Filip Salomonsson Avatar answered Jan 25 '26 00:01

Filip Salomonsson



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!