Python subprocess: read returncode is sometimes different from returned code

Question

I have a Python script that calls another Python script using subprocess.Popen. I know the called code always returns 10 ,which means it failed.

My problem is, the caller only reads 10 approximatively 75% of the time. The other 25% it reads 0 and mistakes the called program failure code as a success. Same command, same environment, apparently random occurences.

Environment: Python 2.7.10, Linux Redhat 6.4. The code presented here is a (very) simplified version but I can still reproduce the problem using it.

This is the called script, constant_return.py:

#!/usr/bin/env python2.7
# -*- coding: utf-8 -*-

"""
Simplified called code
"""
import sys

if __name__ == "__main__":
    sys.exit(10)

This is the caller code:

#!/usr/bin/env python2.7
# -*- coding: utf-8 -*-

"""
Simplified version of the calling code
"""

try:
    import sys
    import subprocess
    import threading

except Exception, eImp:
    print "Error while loading Python library : %s" % eImp
    sys.exit(100)


class BizarreProcessing(object):
    """
    Simplified caller class
    """

    def __init__(self):
        """
        Classic initialization
        """
        object.__init__(self)


    def logPipe(self, isStdOut_, process_):
        """
        Simplified log handler
        """
        try:
            if isStdOut_:
                output = process_.stdout
                logfile = open("./log_out.txt", "wb")
            else:
                output = process_.stderr
                logfile = open("./log_err.txt", "wb")

            #Read pipe content as long as the process is running
            while (process_.poll() == None):
                text = output.readline()
                if (text != '' and text.strip() != ''):
                    logfile.write(text)

        #When the process is finished, there might still be lines remaining in the pipe
            output.readlines()
            for oneline in output.readlines():
                if (oneline != None and oneline.strip() != ''):
                    logfile.write(text)
        finally:
            logfile.close()


    def startProcessing(self):
        """
        Launch process
        """

        # Simplified command line definition
        command = "/absolute/path/to/file/constant_return.py"

        # Execute command in a new process
        process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

        #Launch a thread to gather called programm stdout and stderr
        #This to avoid a deadlock with pipe filled and such
        stdoutTread = threading.Thread(target=self.logPipe, args=(True, process))
        stdoutTread.start()
        stderrThread = threading.Thread(target=self.logPipe, args=(False, process))
        stderrThread.start()

        #Wait for the end of the process and get process result
        stdoutTread.join()
        stderrThread.join()
        result = process.wait()

        print("returned code: " + str(result))

        #Send it back to the caller
        return (result)


#
# Main
#
if __name__ == "__main__":

    # Execute caller code
    processingInstance = BizarreProcessing()
    aResult = processingInstance.startProcessing()

    #Return the code
    sys.exit(aResult)

Here is what I type in bash to execute the caller script:

for res in {1..100}
do
    /path/to/caller/script.py
    echo $? >> /tmp/returncodelist.txt
done

It seems to be somehow connected to the way I read the called program outputs, because when I create the subprocess with process = subprocess.Popen(command, shell=True, stdout=sys.stdout, stderr=sys.stderr) and remove all the Thread stuff it reads the correct return code (but doesn't log as I want anymore...)

Any idea what I did wrong ?

Thanks a lot for your help

lyngvi · Accepted Answer

logPipe is also checking whether the process is alive to determine whether there's more data to read. This is not correct - you should be checking whether the pipe has reached EOF, by looking for a zero-length read, or by using output.readlines(). The I/O pipes may outlive the process.

This simplifies logPipe significantly: Change logPipe as below:

  def logPipe(self, isStdOut_, process_):
      """
      Simplified log handler
      """
      try:
          if isStdOut_:
              output = process_.stdout
              logfile = open("./log_out.txt", "wb")
          else:
              output = process_.stderr
              logfile = open("./log_err.txt", "wb")

          #Read pipe content as long as the process is running
          with output:
              for text in output:
                  if text.strip(): # ... checks if it's not an empty string
                      logfile.write(text)

      finally:
          logfile.close()

Second, don't join your logging threads until after process.wait(), for the same reason - the I/O pipes may outlive the process.

What I think is happening under the covers is that there's a SIGPIPE being emitted and mishandled somewhere - possibly being misconstrued as the process termination condition. This is because the pipe is being closed on one end or the other without being flushed. SIGPIPE can sometimes be a nuisance in larger applications; it may be that the Python library swallows it or does something childish with it.

edit As @Blackjack points out, SIGPIPE is automatically blocked by Python. So, that rules out SIGPIPE malfeasance. A second theory though: The documentation behind Popen.poll() states:

Check if child process has terminated. Set and return returncode attribute.

If you strace this (eg, strace -f -o strace.log ./caller.py), this appears to be being done via wait4(WNOHANG). You've got 2 threads waiting with WNOHANG and one waiting normally, but only one call will return correctly with the process exit code. If there is no lock in the implementation of subprocess.poll(), then there is quite likely a race to assign process.resultcode, or a potential failure to do so correctly. Limiting your Popen.waits/polls to a single thread should be a good way to avoid this. See man waitpid.

edit as an aside, if you can hold all your stdout/stderr data in memory, subprocess.communicate() is much easier to use and does not require the logPipe or background threads at all.

https://docs.python.org/2/library/subprocess.html#subprocess.Popen.communicate

Python subprocess: read returncode is sometimes different from returned code

Tags:

python

subprocess

seb_fx

1 Answers

lyngvi

Recent Activity

Donate For Us

Python subprocess: read returncode is sometimes different from returned code

Tags:

python

subprocess

seb_fx

1 Answers

lyngvi

Related questions

Recent Activity

Donate For Us