I'm playing with a simple script to escape certain HTML characters, and am encountering a bug which seems to be caused by the order of elements in my list escape_pairs. I'm not modifying the lists during a loop, so I can't think of any Python/programming principles I'm overlooking here.
escape_pairs = [(">", ">"),("<","<"),('"',"""),("&","&")]
def escape_html(s):
    for (i,o) in escape_pairs:
        s = s.replace(i,o)
    return s
print escape_html(">")
print escape_html("<")
print escape_html('"')
print escape_html("&")
returns
&gt;
&lt;
&quot;
&
However when I switch the order of the elements in my escape_pairs list to the bug disappears
>>> escape_pairsMod = [("&","&"),("<","<"),('"',"""),(">", ">")]
>
<
"
&
Yes, in your first implementation, it can.
Lets take the case of > and the list -
escape_pairs = [(">", ">"),("<","<"),('"',"""),("&","&")]
When iterating through escape_pairs , you first get > and replace it with > . This causes the string to become '> . Then you keep on iterating, and at the end you find ("&","&") , and you replace the & in the string with & , making the result the one you get right now.
When you change the order of the lists, you get the correct result. But still this is just because you first took into consideration & and only after that you took other in consideration.
You can use str.translate instead to translate the string coorectly , according to a dictionary. Example -
>>> escape_pairs = [(">", ">"),("<","<"),('"',"""),("&","&")]
>>> escape_dict = dict(escape_pairs)
>>> t = str.maketrans(escape_dict)
>>> ">".translate(t)
'>'
>>> "> & <".translate(t)
'> & <'
But if what you want to do is HTML escape the string, then you should use the standard library - cgi -
>>> import cgi
>>> cgi.escape("< > &")
'< > &'
Also, if you are using Python 3.2 + , you can use html.escape instead, Example -
>>> import html
>>> html.escape("< > &")
'< > &'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With