Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Overriding os.path.supports_unicode_filenames on Ubuntu

I am running a python web app on an Ubuntu server, while I development locally on OS X.

I use a lot of unicode strings for the Hebrew language, including manipulating filenames of images, so they will be saved on the filesystem with Hebrew characters.

My Ubuntu server is fully configured for UTF-8 - I have other images on the file system (outside of this app) with Hebrew names, in Hebrew named directories, etc.

However, my app returns errors when trying to save an image with a Hebrew filename on Ubuntu (but not on OS X).

The error being:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

After alot of investigating, I got to the last possible cause as far as I can see:

# Inside my virtualenv, Mac OS X
>>> import os.path
>>> os.path.supports_unicode_filenames
>>> True

# Inside my virtualenv, Ubuntu 12.04
>>> import os.path
>>> os.path.supports_unicode_filenames
>>> False

And just for the curious, here are my Ubuntu locale settings:

locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Update: adding the code, and an example string:

# a string, of the type I would get for instance.product.name, as used below.
u'\\u05e7\\u05e8\\u05d5\\u05d1-\\u05e8\\u05d7\\u05d5\\u05e7'


#utils.py
# I get an image object from django, and I run this function so django 
# can use the generated filepath for the image.
def get_upload_path(instance, filename):

    tmp = filename.split('.')
    extension = '.' + tmp[-1]

    if instance.__class__.__name__ == 'MyClass':

        seo_filename = unislugify(instance.product.name)
        # unislugify takes a string and strips spaces, etc.
        value = IMAGES_PRODUCT_DIR + seo_filename + extension

    else:

        value = IMAGES_GENERAL_DIR + unislugify(filename)

    return value

Example stacktrace:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 60-66: ordinal not in range(128)

Stacktrace (most recent call last):

  File "django/core/handlers/base.py", line 111, in get_response
    response = callback(request, *callback_args, **callback_kwargs)

  File "django/contrib/admin/options.py", line 366, in wrapper
    return self.admin_site.admin_view(view)(*args, **kwargs)

  File "django/utils/decorators.py", line 91, in _wrapped_view
    response = view_func(request, *args, **kwargs)

  File "django/views/decorators/cache.py", line 89, in _wrapped_view_func
    response = view_func(request, *args, **kwargs)

  File "django/contrib/admin/sites.py", line 196, in inner
    return view(request, *args, **kwargs)

  File "django/utils/decorators.py", line 25, in _wrapper
    return bound_func(*args, **kwargs)

  File "django/utils/decorators.py", line 91, in _wrapped_view
    response = view_func(request, *args, **kwargs)

  File "django/utils/decorators.py", line 21, in bound_func
    return func(self, *args2, **kwargs2)

  File "django/db/transaction.py", line 209, in inner
    return func(*args, **kwargs)

  File "django/contrib/admin/options.py", line 1055, in change_view
    self.save_related(request, form, formsets, True)

  File "django/contrib/admin/options.py", line 733, in save_related
    self.save_formset(request, form, formset, change=change)

  File "django/contrib/admin/options.py", line 721, in save_formset
    formset.save()

  File "django/forms/models.py", line 497, in save
    return self.save_existing_objects(commit) + self.save_new_objects(commit)

  File "django/forms/models.py", line 628, in save_new_objects
    self.new_objects.append(self.save_new(form, commit=commit))

  File "django/forms/models.py", line 731, in save_new
    obj.save()

  File "django/db/models/base.py", line 463, in save
    self.save_base(using=using, force_insert=force_insert, force_update=force_update)

  File "django/db/models/base.py", line 551, in save_base
    result = manager._insert([self], fields=fields, return_id=update_pk, using=using, raw=raw)

  File "django/db/models/manager.py", line 203, in _insert
    return insert_query(self.model, objs, fields, **kwargs)

  File "django/db/models/query.py", line 1593, in insert_query
    return query.get_compiler(using=using).execute_sql(return_id)

  File "django/db/models/sql/compiler.py", line 909, in execute_sql
    for sql, params in self.as_sql():

  File "django/db/models/sql/compiler.py", line 872, in as_sql
    for obj in self.query.objs

  File "django/db/models/fields/files.py", line 249, in pre_save
    file.save(file.name, file, save=False)

  File "django/db/models/fields/files.py", line 86, in save
    self.name = self.storage.save(name, content)

  File "django/core/files/storage.py", line 44, in save
    name = self.get_available_name(name)

  File "django/core/files/storage.py", line 70, in get_available_name
    while self.exists(name):

  File "django/core/files/storage.py", line 230, in exists
    return os.path.exists(self.path(name))

  File "python2.7/genericpath.py", line 18, in exists
    os.stat(path)

1 Answers

os.path.supports_unicode_filenames is always False on posix systems except darwin, that's because they don't really care about the encoding of the filename, it's simply a byte sequence. The locale settings specify how to interpret this bytes, that's why you can end up with broken characters in a terminal whenn the locale setting isn't right.

How are you running your web app? If your running it through a web server (apache?) using cgi or wsgi, the locale may not be what you see in the shell, so this could be the reason why python tries to use the ascii codec to encode the pathname.

To make it work, you could manually encode the pathname as utf-8 when opening the file.

Edit:
So the fails is a call to os.stat, which, wenn called with an unicode string, tries to convert it to a byte string according to the default encoding (sys.getdefaultencoding()), which within a uWSGI environment always seems to be ascii when using python2. To fix this you can make sure to encode any unicode string to utf-8 before it can be passed on to os.stat.

like image 53
mata Avatar answered Jan 22 '26 12:01

mata