Which encoding is in use by csv.DictReader when reading csv?

Question

I have a csv file saved encoded as UTF-8.

It contains non-ascii chars [umlauts].

I am reading the file using:

csv.DictReader(<file>,delimiter=<delimiter>).

My questions are:

In which encoding is the file being read?
I noticed that in order to refer to the strings as utf-8 I need to perform:
```
str.decode('utf-8')
```
Is there a better approach then reading the file in one encoding and then to convert to another, i.e. utf-8?

[Python version: 2.7]

Alastair McCormack · Accepted Answer

In Python 2.7, the CSV module does not apply any decoding - it opens the file in binary mode and returns bytes strings.

Use https://github.com/jdunck/python-unicodecsv, which decodes on the fly.

Use it like:

with open("myfile.csv", 'rb') as my_file:    
    r = unicodecsv.DictReader(my_file, encoding='utf-8')

r will contain a dict of Unicodes. It's important that the source file is opened as binary mode.

Donate For Us