Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parsing xml in python

I want to parse text from a xml file.Consider that I have a some lines in a file.xml

<s id="1792387-2">Castro Verde is situated in the Baixo Alentejo Subregion within a territory known locally as the Campo Branco (English: White Plains).</s>

How can I extract the following text from the above line:

Castro Verde is situated in the Baixo Alentejo Subregion within a territory known locally as the Campo Branco (English: White Plains).

And after making some changes with the text, I want to get return the change text with the same tag as like below.

<s id="1792387-2"> Changed Text </s>

Any suggestion please.Thanks!

like image 575
Blue Ice Avatar asked Mar 18 '26 21:03

Blue Ice


2 Answers

LXML makes this particularly easy.

>>> from lxml import etree
>>> text = '''<s id="1792387-2">Castro Verde is situated in the Baixo Alentejo Subregion within a territory known locally as the Campo Branco (English: White Plains).</s>'''
>>> def edit(s):
...     return 'Changed Text'
... 
>>> t = etree.fromstring(text)
>>> t.text = edit(t.text)
>>> etree.tostring(t)
'<s id="1792387-2">Changed Text</s>'
like image 191
Fred Foo Avatar answered Mar 20 '26 10:03

Fred Foo


There are a couple stdlib methods for parsing xml… But in general ElementTree is the simplest:

from xml.etree import ElementTree
from StringIO import StringIO
doc = ElementTree.parse(StringIO("""<doc><s id="1792387-2">Castro…</s><s id="1792387-3">Other stuff</s></doc>"""))
for elem in doc.findall("s"):
    print "Text:", elem.text
    elem.text = "new text"
    print "New:", ElementTree.dump(elem)

And if your XML is coming from a file, you can use:

f = open("path/to/foo.xml")
doc = ElementTree.parse(f)
f.close()
… use `doc` …
like image 23
David Wolever Avatar answered Mar 20 '26 09:03

David Wolever