I'm trying to pull out an escape noded from an XML document. The raw text for the node looks like this:
<Notes>{"Phase": 0, "Flipper": 0, "Guide": 0,
"Sample": 0, "Triangle8": 0, "Triangle5": 0,
"Triangle4": 0, "Triangle7": 0, "Triangle6": 0,
"Triangle1": 0, "Triangle3": 0, "Triangle2": 0}</Notes>
I'm pulling the text out as follows:
infile = ET.parse("C:/userfiles/EXP011/SESAME_60/SESAME_60_runinfo.xml")
r = infile.getroot()
XMLNS = "{http://example.com/foo/bar/runinfo_v4_3}"
x=r.find(".//"+XMLNS+"Notes")
print(x.text)
I expected to get:
{"Phase": 0, "Flipper": 0, "Guide"": 0,
"Sample": 0, "Triangle8": 0, "Triangle5": 0,
"Triangle4": 0, "Triangle7": 0, "Triangle6": 0,
"Triangle1": 0, "Triangle3": 0, "Triangle2": 0}
but, instead, I got:
{"Phase": 0, "Flipper": 0, "Guide": 0,
"Sample": 0, "Triangle8": 0, "Triangle5": 0,
"Triangle4": 0, "Triangle7": 0, "Triangle6": 0,
"Triangle1": 0, "Triangle3": 0, "Triangle2": 0}
How do I get the unescaped string?
Use HTMLParser.HTMLParser():
In [8]: import HTMLParser
In [11]: HTMLParser.HTMLParser().unescape('"')
Out[11]: u'"'
saxutils handles <, > and &, but it does not handle ".
In [9]: import xml.sax.saxutils as saxutils
In [10]: saxutils.unescape('"')
Out[10]: '"'
Since python 3.4 you can use html.unescape.
>>> from html import unescape
>>> unescape('"')
'"'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With