I have a csv file saved encoded as UTF-8.
It contains non-ascii chars [umlauts].
I am reading the file using:
csv.DictReader(<file>,delimiter=<delimiter>).
My questions are:
I noticed that in order to refer to the strings as utf-8 I need to perform:
str.decode('utf-8')
Is there a better approach then reading the file in one encoding and then to convert to another, i.e. utf-8?
[Python version: 2.7]
In Python 2.7, the CSV module does not apply any decoding - it opens the file in binary mode and returns bytes strings.
Use https://github.com/jdunck/python-unicodecsv, which decodes on the fly.
Use it like:
with open("myfile.csv", 'rb') as my_file:
r = unicodecsv.DictReader(my_file, encoding='utf-8')
r will contain a dict of Unicodes. It's important that the source file is opened as binary mode.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With