Remove a root tag from xml/html using tostring() of lxml

Question

How to make a html text without a root tag (usually it's <html></html>)? To example, for use in CDATA:

<![CDATA[<div class="foo"></div><p>bar</p>]]>

My code:

from lxml import etree

html = etree.Element('root')
etree.SubElement(html, 'div', attrib={'class':'foo'})
etree.SubElement(html, 'p').text='bar'

t = etree.tostring(html)
# '<root><div class="foo"/><p>bar</p></root>'

I would not want to use regex to remove the root tag.

Valentino · Accepted Answer

If you need the text representation of all subelements without the root element, you can do:

subels = ''.join([etree.tostring(el).decode('ascii') for el in html])

where html is the Element of your question. In this case subels is a string:

'<div class="foo"/><p>bar</p>'

This can be further improved to get only specific tags using the iter method. For example:

subels = ''.join([etree.tostring(el).decode('ascii') for el in html.iter('div', 'p'])

will return only the 'div' and 'p' tags, so if there had be other tags they would have been omitted.
You can use it to filter out unwanted tags, but just be careful because it may broke the document hierarchy: it still returns children tags of undesired tags.

EDIT after comments

If the root tag has a text attibute which you want to keep, just add it back.

subels = ''.join([html.text] + [etree.tostring(el).decode('ascii') for el in html])

Remove a root tag from xml/html using tostring() of lxml

Tags:

python

cdata

lxml

bl79

1 Answers

EDIT after comments

Valentino

Recent Activity

Donate For Us

Remove a root tag from xml/html using tostring() of lxml

Tags:

python

cdata

lxml

bl79

1 Answers

EDIT after comments

Valentino

Related questions

Recent Activity

Donate For Us