There are many ways to read XML, both all-at-once (DOM) and one-bit-at-a-time (SAX). I have used SAX or lxml to iteratively read large XML files (e.g. wikipedia dump which is 6.5GB compressed).
However after doing some iterative processing (in python using ElementTree) of that XML file, I want to write out the (new) XML data to another file.
Are there any libraries to iteratively write out XML data? I could create the XML tree and then write it out, but that is not possible without oodles of ram. Is there anyway to write the XML tree to a file iteratively? One bit at a time?
I know I could generate the XML myself with print "<%s>" % tag_name, etc., but that seems a bit... hacky.
Fredrik Lundh's elementtree.SimpleXMLWriter will let you write out XML incrementally. Here's the demo code embedded in the module:
from elementtree.SimpleXMLWriter import XMLWriter
import sys
w = XMLWriter(sys.stdout)
html = w.start("html")
w.start("head")
w.element("title", "my document")
w.element("meta", name="generator", value="my application 1.0")
w.end()
w.start("body")
w.element("h1", "this is a heading")
w.element("p", "this is a paragraph")
w.start("p")
w.data("this is ")
w.element("b", "bold")
w.data(" and ")
w.element("i", "italic")
w.data(".")
w.end("p")
w.close(html)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With