Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python encoding ISO to UTF8

I am trying to read my emails using a Python script (Python 2.5 and PyPy) Some of my results are not in ASCII and i get strings like this:

=?ISO-8859-7?B?0OXm7/Dv8d/hIPP07+0gyuno4enx/u3h?='

Is there any way to decode it and convert to utf-8 so that i can process it? I tried .decode('ISO-8859-7') but i got the same string

like image 394
PanosJee Avatar asked Jun 11 '26 02:06

PanosJee


2 Answers

import email.header as eh

unicode_data= u''.join(
    str_data.decode(codec or 'ascii')
    for str_data, codec
    in eh.decode_header('=?ISO-8859-7?B?0OXm7/Dv8d/hIPP07+0gyuno4enx/u3h?='))
# unicode_data now is u'Πεζοπορία στον Κιθαιρώνα'

You should work with unicode_data here. However, if you (think you) need UTF-8 encoded string, you can:

utf8data= unicode_data.encode('utf-8')

Update: I changed the .decode call to cater for cases where the codec is None (e.g. eh.decode_header('plain text'))

like image 195
tzot Avatar answered Jun 12 '26 14:06

tzot


Read up on MIME encoding and Base64 encoding. The base64 module will be useful.

like image 42
Mark Ransom Avatar answered Jun 12 '26 14:06

Mark Ransom



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!