My webpages are served by a script that dynamically imports a bunch of files with
try:
    with open (filename, 'r') as f:
        exec(f.read())
except IOError: pass
(actually, can you suggest a better method of importing a file? I'm sure there is one.)
Sometimes the files have strings in different languages, like
# contents of language.ru
title = "Название"
Those were all saved as UTF-8 files. Python has no problem running the script in command line or serving a page from my MacBook:
    OK: [server command line] python3.0 page.py /index.ru
    OK: http://whitebox.local/index.ru
but it throws an error when trying to serve a page from a server we just moved to:
      157     try:
      158         with open (filename, 'r') as f:
      159             exec(f.read())
      160     except IOError: pass
      161 
      /usr/local/lib/python3.0/io.py in read(self=, n=-1)
      ...
      UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 627: ordinal not in range(128) 
All the files were copied from my laptop where they were perfectly served by Apache. What is the reason?
Update: I found out the default encoding for open() is platform-dependent so it was utf8 on my laptop and ascii on server. I wonder if there is a per-program function to set it in Python 3 (sys.setdefaultencoding is used in site module and then deleted from the namespace).
Since Python 3.0, the language's str type contains Unicode characters, meaning any string created using "unicode rocks!" , 'unicode rocks!' , or the triple-quoted string syntax is stored as Unicode.
In Python 3, all strings are sequences of Unicode characters. There is a bytes type that holds raw bytes.
Use open(filename, 'r', encoding='utf8').
See Python 3 docs for open.
Use codecs library, I'm using python 2.6.6 and I do not use the usual open with encoding argument:
import codecs
codecs.open('filename','r',encoding='UTF-8')
You can use something like
with open(fname, 'r', encoding="ascii", errors="surrogateescape") as f:
    data = f.read()
# make changes to the string 'data'
with open(fname + '.new', 'w',
           encoding="ascii", errors="surrogateescape") as f:
    f.write(data)
more information is on python unicode documents
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With