I have a Python script that calls another Python script using subprocess.Popen. I know the called code always returns 10 ,which means it failed.
My problem is, the caller only reads 10 approximatively 75% of the time. The other 25% it reads 0 and mistakes the called program failure code as a success. Same command, same environment, apparently random occurences.
Environment: Python 2.7.10, Linux Redhat 6.4. The code presented here is a (very) simplified version but I can still reproduce the problem using it.
This is the called script, constant_return.py:
#!/usr/bin/env python2.7
# -*- coding: utf-8 -*-
"""
Simplified called code
"""
import sys
if __name__ == "__main__":
sys.exit(10)
This is the caller code:
#!/usr/bin/env python2.7
# -*- coding: utf-8 -*-
"""
Simplified version of the calling code
"""
try:
import sys
import subprocess
import threading
except Exception, eImp:
print "Error while loading Python library : %s" % eImp
sys.exit(100)
class BizarreProcessing(object):
"""
Simplified caller class
"""
def __init__(self):
"""
Classic initialization
"""
object.__init__(self)
def logPipe(self, isStdOut_, process_):
"""
Simplified log handler
"""
try:
if isStdOut_:
output = process_.stdout
logfile = open("./log_out.txt", "wb")
else:
output = process_.stderr
logfile = open("./log_err.txt", "wb")
#Read pipe content as long as the process is running
while (process_.poll() == None):
text = output.readline()
if (text != '' and text.strip() != ''):
logfile.write(text)
#When the process is finished, there might still be lines remaining in the pipe
output.readlines()
for oneline in output.readlines():
if (oneline != None and oneline.strip() != ''):
logfile.write(text)
finally:
logfile.close()
def startProcessing(self):
"""
Launch process
"""
# Simplified command line definition
command = "/absolute/path/to/file/constant_return.py"
# Execute command in a new process
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
#Launch a thread to gather called programm stdout and stderr
#This to avoid a deadlock with pipe filled and such
stdoutTread = threading.Thread(target=self.logPipe, args=(True, process))
stdoutTread.start()
stderrThread = threading.Thread(target=self.logPipe, args=(False, process))
stderrThread.start()
#Wait for the end of the process and get process result
stdoutTread.join()
stderrThread.join()
result = process.wait()
print("returned code: " + str(result))
#Send it back to the caller
return (result)
#
# Main
#
if __name__ == "__main__":
# Execute caller code
processingInstance = BizarreProcessing()
aResult = processingInstance.startProcessing()
#Return the code
sys.exit(aResult)
Here is what I type in bash to execute the caller script:
for res in {1..100}
do
/path/to/caller/script.py
echo $? >> /tmp/returncodelist.txt
done
It seems to be somehow connected to the way I read the called program outputs, because when I create the subprocess with process = subprocess.Popen(command, shell=True, stdout=sys.stdout, stderr=sys.stderr) and remove all the Thread stuff it reads the correct return code (but doesn't log as I want anymore...)
Any idea what I did wrong ?
Thanks a lot for your help
logPipe is also checking whether the process is alive to determine whether there's more data to read. This is not correct - you should be checking whether the pipe has reached EOF, by looking for a zero-length read, or by using output.readlines(). The I/O pipes may outlive the process.
This simplifies logPipe significantly: Change logPipe as below:
def logPipe(self, isStdOut_, process_):
"""
Simplified log handler
"""
try:
if isStdOut_:
output = process_.stdout
logfile = open("./log_out.txt", "wb")
else:
output = process_.stderr
logfile = open("./log_err.txt", "wb")
#Read pipe content as long as the process is running
with output:
for text in output:
if text.strip(): # ... checks if it's not an empty string
logfile.write(text)
finally:
logfile.close()
Second, don't join your logging threads until after process.wait(), for the same reason - the I/O pipes may outlive the process.
What I think is happening under the covers is that there's a SIGPIPE being emitted and mishandled somewhere - possibly being misconstrued as the process termination condition. This is because the pipe is being closed on one end or the other without being flushed. SIGPIPE can sometimes be a nuisance in larger applications; it may be that the Python library swallows it or does something childish with it.
edit As @Blackjack points out, SIGPIPE is automatically blocked by Python. So, that rules out SIGPIPE malfeasance. A second theory though: The documentation behind Popen.poll() states:
Check if child process has terminated. Set and return returncode attribute.
If you strace this (eg, strace -f -o strace.log ./caller.py), this appears to be being done via wait4(WNOHANG). You've got 2 threads waiting with WNOHANG and one waiting normally, but only one call will return correctly with the process exit code. If there is no lock in the implementation of subprocess.poll(), then there is quite likely a race to assign process.resultcode, or a potential failure to do so correctly. Limiting your Popen.waits/polls to a single thread should be a good way to avoid this. See man waitpid.
edit as an aside, if you can hold all your stdout/stderr data in memory, subprocess.communicate() is much easier to use and does not require the logPipe or background threads at all.
https://docs.python.org/2/library/subprocess.html#subprocess.Popen.communicate
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With