Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linux, forked process hangs immediately

Tags:

c

linux

fork

I got a problem with fork that only occurs sporadically. It works basically all the time, but fails every once in a while on a test system.

My research didn't turn up anybody else mentioning a similar problem.

The problem occurs on an embedded Linux system. There is no swap partition available.

The process running has all signals blocked in all threads and handles them via sigtimedwait in a dedicated thread.

If I start a child process via fork:

  • The parent process continues with a return value > 0. So the fork works. There is no -1 returned - so no error, not out of memory! The parent then waits on the child process and never returns from the wait.
  • The child process never does anything observable. The very first thing the child process should do, is writing a log message. This log message never appears. Then it should spawn two child processes a time-out process and a worker process. These processes never appear.
  • If I check via ps on the command line, I can see the existing child process. It is in the state S (Interruptible sleep (waiting for an event to complete)). It never gets any CPU time, it shows no CPU usage.
  • If I kill -9 the child process, the parent process finishes waiting and continues happily.

Pseudo code showing the problem:

const pid_t childPid = fork();
if(0 == childPid) {
    // child process
    LOG_MSG("Child process started."); // <- This never shows up in the syslog.

    // do some stuff

} else if(-1 == childPid) {
    // error
    LOG_MSG("Parent process: Error starting child process!");
    result = false;
} else {
    // parent process
    LOG_MSG("Parent process: Child process started. PID: %.", childPid); // <- This shows up in the syslog.

    // do some stuff
    int status = 0;
    const int options = 0;
    const auto waitResult = waitpid(childPid, &status, options);
    // more stuff
}

Questions:

  1. What could cause this hanging child process?
  2. What would happen, if the new process runs out of memory in the LOG_MSG call that leads to syslog? Would this raise a signal (that could no be delivered because it is blocked)?
like image 676
DrP3pp3r Avatar asked Oct 20 '25 15:10

DrP3pp3r


1 Answers

I took the sample from Adrien Descamps' link (see also comments above) and C++-ified and modified it a little:

#include <thread>
#include <iostream>
#include <atomic>

#include <unistd.h>
#include <syslog.h>
#include <sys/wait.h>


std::atomic<bool> go(true);


void syslogBlaster() {
   int j = 0;
   while(go) {
      for(int i = 0; i < 100; ++i) {
         syslog(LOG_INFO, "syslogBlaster: %d@%d", i, j);
      }
      ++j;

      std::this_thread::sleep_for(std::chrono::milliseconds(30));
   }
}

int main() {
   std::thread blaster(syslogBlaster);

   for(int i = 0; i < 1000; ++i) {
      const auto forkResult = fork();
      if(0 == forkResult) {
          syslog(LOG_INFO, "Child process: '%d'.", static_cast<int>(getpid()));
          exit(0);
      } else if(forkResult < 0) {
         std::cout << "fork() failed!" << std::endl;
      } else {
         syslog(LOG_INFO, "Parent process.");
         std::cout << "Waiting #" << i << "!" << std::endl;
         int status = 0;
         const int options = 0;
         const auto waitResult = waitpid(forkResult, &status, options);
         if(-1 == waitResult) {
             std::cout << "waitpid() failed!";
         } else {
             std::cout << "Bye zombie #" << i << "!" << std::endl;
         }
      }

      std::this_thread::sleep_for(std::chrono::milliseconds(28));
   }

   go = false;
   blaster.join();

   std::cout << "Wow, we survived!" << std::endl;
}

Running this sample, the process gets stuck (on my device) between the first and the fifth try.

Explanation

syslog is the problem!

In general: non async-signal-safe functions are the problem!

As stated by Damian Pietras (see linked page)

calling any function that is not async-safe (man 7 signal) in child process after fork() call in a multi-threaded program has undefined behaviour

Technically the problem (undefined behavior) arises from data in critical sections that is inconsistent (because a thread that is not the one forking was right in the middle of it during the fork) or - like in this case - from a mutex that was locked in the parent and then stays this way forever in the child.

Credit for this answer goes to Adrien Descamps for finding the root cause (syslog), but also to PSkocik and Jan Spurny for detecting the source (LOG_MSG).

like image 128
DrP3pp3r Avatar answered Oct 23 '25 05:10

DrP3pp3r



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!