Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Python output a string and a unicode of the same value differently?

Tags:

python

unicode

I'm using Python 2.6.5 and when I run the following in the Python shell, I get:

>>> print u'Andr\xc3\xa9'
André
>>> print 'Andr\xc3\xa9'
André
>>>

What's the explanation for the above? Given u'Andr\xc3\xa9', how can I display the above value properly in an html page so that it shows André instead of André?

like image 401
Thierry Lam Avatar asked Jan 19 '26 00:01

Thierry Lam


2 Answers

'\xc3\xa9' is the UTF-8 encoding of the unicode character u'\u00e9' (which can also be specified as u'\xe9'). So you can use u'Andr\u00e9' or u'Andr\xe9'.

You can convert from one to the other:

>>> 'Andr\xc3\xa9'.decode('utf-8')
u'Andr\xe9'
>>> u'Andr\xe9'.encode('utf-8')
'Andr\xc3\xa9'

Note that the reason print 'Andr\xc3\xa9' gave you the expected result is only because your system's default encoding is UTF-8. For example, on Windows I get:

>>> print 'Andr\xc3\xa9'
André

As for outputting HTML, it depends on which web framework you use and what encoding you output in the HTML page. Some frameworks (e.g. Django) will convert unicode values to the correct encoding automatically, while others will require you to do so manually.

like image 135
interjay Avatar answered Jan 20 '26 16:01

interjay


Try this:

>>> unicode('Andr\xc3\xa9', 'utf-8')
u'Andr\xe9'
>>> print u'Andr\xe9'
André

That may answer your question.

EDIT: or see the above answer

like image 41
darelf Avatar answered Jan 20 '26 16:01

darelf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!