Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the main thread while debugging core file

I have a program, where main threads creates lots of threads. It crashed, and I'm debugging core file. Crash happened in one of child threads. In order to find the reason, I need to know whether the main thread is still alive. Is there any way to find out which thread was the initial one?

like image 758
user1289 Avatar asked Oct 17 '25 13:10

user1289


2 Answers

Is there any way to find out which thread was the initial one?

When there are 100s of threads, I use the following technique to look through them:

(gdb) shell rm gdb.txt
(gdb) set logging on   # GDB output will go to gdb.txt
(gdb) thread apply all where

Now load gdb.txt into your editor or pager of choice, look for main, etc.

like image 126
Employed Russian Avatar answered Oct 20 '25 14:10

Employed Russian


As a general approach for UNIX-based systems, the accepted answer works as expected.

On Linux (and OSes that chose a similar POSIX threads implementation strategy), identifying the main thread can be much more straightforward. Typically, the file name of a core dump contains the PID of the faulting process (e.g. core.<pid>) unless the core pattern (/proc/sys/kernel/core_pattern) was changed. With that, you can reliably determine the main thread using thread find <pid>:

$ gdb executable core.24533
[...]
(gdb) thread find 24533
Thread 7 has target id 'Thread 0x7f8ae2169740 (LWP 24533)'
(gdb) thread 7
[Switching to thread 7 (Thread 0x7f8ae2169740 (LWP 24533))]
#0  0x00007f8ae1d40017 in pthread_join (threadid=140234458433280, thread_return=0x0) at pthread_join.c:90
90      lll_wait_tid (pd->tid);
(gdb) bt
#0  0x00007f8ae1d40017 in pthread_join (threadid=140234458433280, thread_return=0x0) at pthread_join.c:90
#1  0x00007f8ae1ae40f7 in __gthread_join (__value_ptr=0x0, __threadid=<optimized out>)
    at /usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/x86_64-redhat-linux/bits/gthr-default.h:668
#2  std::thread::join (this=this@entry=0x5595aac42990) at ../../../../../libstdc++-v3/src/c++11/thread.cc:107
#3  0x00005595a9681468 in operator() (t=..., __closure=<optimized out>) at segv.cxx:31
#4  for_each<__gnu_cxx::__normal_iterator<std::thread*, std::vector<std::thread> >, ThreadPool::wait()::__lambda1> (__last=..., __first=..., __f=...)
    at /usr/include/c++/4.8.2/bits/stl_algo.h:4417
#5  wait (this=0x7ffcac67d860) at segv.cxx:32
#6  main (argc=<optimized out>, argv=<optimized out>) at segv.cxx:75

If the file name is missing the PID, it can be recovered from the core dump itself. The PID is stored in a note section (PT_NOTE). Both, NT_PRSTATUS and NT_PRPSINFO contain the PID. In case of multiple threads, NT_PRSTATUS exists for each individual thread including the main thread and the order is unspecified, NT_PRPSINFO on the other hand exists only once.

The definition in case of Linux x86_64 (pr_pid is our field of interest):

struct elf_prpsinfo
{
        char    pr_state;       /* numeric process state */
        char    pr_sname;       /* char for pr_state */
        char    pr_zomb;        /* zombie */
        char    pr_nice;        /* nice val */
        unsigned long pr_flag;  /* flags */
        __kernel_uid_t  pr_uid;
        __kernel_gid_t  pr_gid;
        pid_t   pr_pid, pr_ppid, pr_pgrp, pr_sid;
        /* Lots missing */
        char    pr_fname[16];   /* filename of executable */
        char    pr_psargs[ELF_PRARGSZ]; /* initial part of arg list */
};

eu-readelf -n (provided by elfutils) can be used to extract the PID from NT_PRPSINFO:

$ eu-readelf -n core
[...]
  CORE                 136  PRPSINFO
    state: 2, sname: D, zomb: 0, nice: 0, flag: 0x0000000040402504
    uid: 0, gid: 0, pid: 24533, ppid: 17322, pgrp: 24533, sid: 17299
                         ^^^^^
    fname: segv, psargs: ./segv 2 
[...]
like image 45
horstr Avatar answered Oct 20 '25 14:10

horstr