Yet another encoding question on Python.
How can I pass non-ASCII characters as parameters on a subprocess.Popen call?
My problem is not on the stdin/stdout as the majority of other questions on StackOverflow, but passing those characters in the args parameter of Popen.
Python script used for testing:
import subprocess
cmd = 'C:\Python27\python.exe C:\path_to\script.py -n "Testç on ã and ê"'
process = subprocess.Popen(cmd,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
output, err = process.communicate()
result = process.wait()
print result, '-', output
For this example call, the script.py receives Testç on ã and ê. If I copy-paste this same command string on a CMD shell, it works fine.
What I've tried, besides what's described above:
cmd = u'...'), received an UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 128: ordinal not in range(128) on line 5 (Popen call).cmd = u'...'.decode('utf-8'), received an UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 128: ordinal not in range(128) on line 3 (decode call).cmd = u'...'.encode('utf8'), results in Testç on ã and êPYTHONIOENCODING=utf-8 env. variable with no luck.Looking on tries 2 and 3, it seems like Popen issues a decode call internally, but I don't have enough experience in Python to advance based on this suspicious.
Environment: Python 2.7.11 running on an Windows Server 2012 R2.
I've searched for similar problems but haven't found any solution. A similar question is asked in what is the encoding of the subprocess module output in Python 2.7?, but no viable solution is offered.
I read that Python 3 changed the way string and encoding works, but upgrading to Python 3 is not an option currently.
Thanks in advance.
As noted in the comments, subprocess.Popen in Python 2 is calling the Windows function CreateProcessA which accepts a byte string in the currently configured code page. Luckily Python has an encoding type mbcs which stands in for the current code page.
cmd = u'C:\Python27\python.exe C:\path_to\script.py -n "Testç on ã and ê"'.encode('mbcs')
Unfortunately you can still fail if the string contains characters that can't be encoded into the current code page.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With