Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encoding issue on subprocess.Popen args

Yet another encoding question on Python.

How can I pass non-ASCII characters as parameters on a subprocess.Popen call?

My problem is not on the stdin/stdout as the majority of other questions on StackOverflow, but passing those characters in the args parameter of Popen.

Python script used for testing:

import subprocess

cmd = 'C:\Python27\python.exe C:\path_to\script.py -n "Testç on ã and ê"'

process = subprocess.Popen(cmd,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
output, err = process.communicate()
result = process.wait()

print result, '-', output

For this example call, the script.py receives Testç on ã and ê. If I copy-paste this same command string on a CMD shell, it works fine.

What I've tried, besides what's described above:

  1. Checked if all Python scripts are encoded in UTF-8. They are.
  2. Changed to unicode (cmd = u'...'), received an UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 128: ordinal not in range(128) on line 5 (Popen call).
  3. Changed to cmd = u'...'.decode('utf-8'), received an UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 128: ordinal not in range(128) on line 3 (decode call).
  4. Changed to cmd = u'...'.encode('utf8'), results in Testç on ã and ê
  5. Added PYTHONIOENCODING=utf-8 env. variable with no luck.

Looking on tries 2 and 3, it seems like Popen issues a decode call internally, but I don't have enough experience in Python to advance based on this suspicious.

Environment: Python 2.7.11 running on an Windows Server 2012 R2.

I've searched for similar problems but haven't found any solution. A similar question is asked in what is the encoding of the subprocess module output in Python 2.7?, but no viable solution is offered.

I read that Python 3 changed the way string and encoding works, but upgrading to Python 3 is not an option currently.

Thanks in advance.

like image 684
Dinei Avatar asked Jun 24 '26 12:06

Dinei


1 Answers

As noted in the comments, subprocess.Popen in Python 2 is calling the Windows function CreateProcessA which accepts a byte string in the currently configured code page. Luckily Python has an encoding type mbcs which stands in for the current code page.

cmd = u'C:\Python27\python.exe C:\path_to\script.py -n "Testç on ã and ê"'.encode('mbcs')

Unfortunately you can still fail if the string contains characters that can't be encoded into the current code page.

like image 193
Mark Ransom Avatar answered Jun 26 '26 16:06

Mark Ransom