As part of a test suite written in Python 3[.4-.6] on Linux, I have to run a number 3rd-party tests. The 3rd-party tests are bash scripts. They are designed to run with Perl's prove TAP harness. One bash script can contain up to a few thousands of individual tests - and some of them can hang indefinitely. After a timeout, I want to kill the test script and gather some information on where it got stuck. 
Because the bash scripts create processes of their own, I try to isolate the entire prove process tree into a new process group, so I can eventually kill the entire process group as a whole if things go wrong. Because the tests must run with root privileges, I use sudo -b for creating a new process group with root privileges. This strategy (as opposed to using setsid in one way or the other) is a result of the comments I received on this question at SE Unix&Linux
The problem is that I loose all output from the prove TAP harness if I kill it 'prematurely' when launched with sudo -b through Python's subprocess.Popen. 
I isolated it into a simple test case. The following is a bash test script named job.t:
#!/bin/bash
MAXCOUNT=20
echo "1..$MAXCOUNT"
for (( i=1; i<=$MAXCOUNT; i++ ))
do
   echo "ok $i"
   sleep 1
done
Just for comparison, I also wrote a Python script named job.py producing more or less the same output and exhibiting the same behavior:
import sys
import time
if __name__ == '__main__':
    maxcount = 20
    print('1..%d' % maxcount)
    for i in range(1, maxcount + 1):
        sys.stdout.write('ok %d\n' % i)
        time.sleep(1)
Last but not least, the following is my stripped-down "Python test infrastructure" named demo.py:
import psutil # get it with "pip install psutil"
import os
import signal
import subprocess
def run_demo(cmd, timeout_after_seconds, signal_code):
    print('DEMO: %s' % ' '.join(cmd))
    proc = subprocess.Popen(cmd, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
    try:
        outs, errs = proc.communicate(timeout = timeout_after_seconds)
    except subprocess.TimeoutExpired:
        print('KILLED!')
        kill_pid = _get_pid(cmd)
        subprocess.Popen(['sudo', 'kill', '-%d' % signal_code, '--', '-%d' % os.getpgid(kill_pid)]).wait()
        outs, errs = proc.communicate()
    print('Got our/err:', outs.decode('utf-8'), errs.decode('utf-8'))
def _get_pid(cmd_line_list):
    for pid in psutil.pids():
        proc = psutil.Process(pid)
        if cmd_line_list == proc.cmdline():
            return proc.pid
    raise # TODO some error ...
if __name__ == '__main__':
    timeout_sec = 5
    # Works, output is captured and eventually printed
    run_demo(['sudo', '-b', 'python', 'job.py'], timeout_sec, signal.SIGINT)
    # Failes, output is NOT captured (i.e. printed) and therefore lost
    run_demo(['sudo', '-b', 'prove', '-v', os.path.join(os.getcwd(), 'job.t')], timeout_sec, signal.SIGINT)
When demo.py is launched, it runs the routine run_demo twice - with different configurations. Both times, a new process group with root privileges is started. Both times, a "test job" prints a new line (ok [line number]) once a second - theoretically for 20 seconds / 20 lines. However, there is a timeout of 5 seconds for both scripts and the entire process group is killed after this timeout.
When run_demo runs for the first time with my little Python script job.py, all output of that script all the way up to the point when it is killed is captured and printed successfully. When run_demo runs for the second time with the demo bash test script job.t on top of prove, no output is captured and only empty strings are printed. 
user@computer:~> python demo.py 
DEMO: sudo -b python job.py
KILLED!
Got our/err: 1..20
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
 Traceback (most recent call last):
  File "job.py", line 11, in <module>
    time.sleep(1)
KeyboardInterrupt
DEMO: sudo -b prove -v /full/path/to/job.t
KILLED!
Got our/err:  
user@computer:~>
What is happening here and how can I fix it?
I.e. how can I interrupt / terminate the bash test script running with prove (and its entire process group) in such a manner that I can capture its output?
EDIT: It was suggested in an answer that the observed behavior happens because of Perl buffering its output. Within an individual Perl script, this can be turned off. However, there is no apparent option allowing to switch off buffering for prove [-v]. How can I achieve this?
I can work around this issue by running my test job with bash directly. The following command has to be changed from 
run_demo(['sudo', '-b', 'prove', '-v', os.path.join(os.getcwd(), 'job.t')], timeout_sec, signal.SIGINT)
to
run_demo(['sudo', '-b', 'bash', os.path.join(os.getcwd(), 'job.t')], timeout_sec, signal.SIGINT)
This way, I do not get the test statistics printed by prove, but I can generate them myself.
By default, the STDOUT of many programs (including perl) is line-buffered (flushed on newline) when the STDOUT is connected to a terminal, and block-buffered (flushed when the file's buffer is full) otherwise (e.g. when it's connected to a pipe).
You can fool such programs into using line-buffering by using a pseudo-tty (ptty) instead of a pipe. To that end, unbuffer is your friend. On Ubuntu, this is part of the expect package (sudo apt install expect).
From the docs:
unbuffer disables the output buffering that occurs when program output is redirected from non-interactive programs. For example, suppose you are watching the output from a fifo by running it through od and then more.
od -c /tmp/fifo | moreYou will not see anything until a full page of output has been produced.
You can disable this automatic buffering as follows:
unbuffer od -c /tmp/fifo | more
I tried your example code and got the same result as you describe (thanks to your Minimal, Complete, and Verifiable example!).
I then changed
run_demo(['sudo', '-b', 'prove', '-v', os.path.join(os.getcwd(), 'job.t')], timeout_sec, signal.SIGINT)
to
run_demo(['sudo', '-b', 'unbuffer', 'prove', '-v', os.path.join(os.getcwd(), 'job.t')], timeout_sec, signal.SIGINT)
That is: I simply prepended unbuffer to the prove command. The output then was:
DEMO: sudo -b python job.py
KILLED!
Got our/err: 1..20
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
 Traceback (most recent call last):
  File "job.py", line 8, in <module>
    time.sleep(1)
KeyboardInterrupt
DEMO: sudo -b unbuffer prove -v /home/dirk/w/sam/p/job.t
KILLED!
Got our/err: /home/dirk/w/sam/p/job.t .. 
1..20
ok 1
ok 2
ok 3
ok 4
ok 5
This is a start of an answer, it has more info than I could squeeze into a comment.
The issue you laid out isn't really related to bash, it's related to Perl.  On my system, which prove points to /usr/bin/prove, which is a perl script.  The real question here is generally about perl scripts, not even specific to prove.  I copied your files above and tested that I can reproduce what you're seeing, then I created a third test:
$ cat job.pl
#!/usr/bin/perl
foreach (1..20){
  print "$_\n";   
  sleep 1;
}
Cool, then I added this to the demo program:
(After importing shlex as well`):
cmdargs = shlex.split('sudo -b '+os.path.join(os.getcwd(), 'job.pl'))
run_demo(cmdargs, timeout_sec, signal.SIGINT)
And, to be sure, this simple perl script faile to produce output when killed.
$ python3 demo.py
...(output as you wrote above followed by)... 
DEMO: sudo -b /home/jawguychooser/job.pl
KILLED!
Got our/err:  
$
So, that means that your problem is really a specific instance of how to capture the output from a killed perl program running in the background controlled by a Python program.
As a next step, I set job.pl to unbuffer stdout:
$ cat job.pl
#!/usr/bin/perl
$| = 1;
foreach (1..20){
  print "$_\n"; 
  sleep 1;
}
And then, I rerun demo.py and voila!
$ python3 demo.py 
DEMO: sudo -b /home/jawguychooser/job.pl
KILLED!
Got our/err: 1
2
3
4
5
6
$ 
So, perhaps if you hack into the prove script and set things up to run unbuffered that will do what you want.  In any case, I think your question is now "how can I run prove -v in unbuffered mode".
I hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With