I have a C application we have deployed to a customers site. It was compiled and runs on HP-UX. The user has reported a crash and we have obtained a core dump. So far, I've been unable to duplicate the crash in house.
As you would suspect, the core file/deployed executable is completely devoid of any sort of symbols. When I load it up in gdb and do a bt, the best I get is this:
(gdb) bt
#0  0xc0199470 in ?? ()
I can do a 'strings core' on the file, but my understanding is that all I get there is all the strings in the executable, so it seems semi-impossible to track down anything there.
I do have a debug version (compiled with -g) of the executable, which is unfortunately a couple of months newer than the released version. If I try to start gdb with that hub, I see this:
warning: exec file is newer than core file.
Core was generated by `program_name'.
Program terminated with signal 11, Segmentation fault.
__dld_list is not valid according to __dld_flags.
#0  0xc0199470 in ?? ()
(gdb) bt
#0  0xc0199470 in ?? ()
While it would be feasible to compile a debug version and deploy it at the customer's site and then wait for another crash, it would be relatively difficult and undesirable for a number of reasons.
I am quite familiar with the code and have a relatively good idea of where in code it is crashing based on the customer's bug report.
Is there ANY way I can glean any more information from this core dump? Via strings or another debugger or anything? Thanks.
Try running a "pmap" against the core file (if hp/ux has this tool). This should report the starting addresses of all modules in the core file. With this info, you should be able to take the address of the failure location and figure out what library crashed.
You just need a binary (with debugging symbols included) that is identical to the one that generated the core dump file. Then you can run gdb path/to/the/binary path/to/the/core/dump/file to debug it. When it starts up, you can use bt (for backtrace) to get a stack trace from the time of the crash.
This type of response from gdb:
(gdb) bt
#0  0xc0199470 in ?? ()
can also happen in the case that the stack was smashed by a buffer overrun, where the return address was overwritten in memory, so the program counter gets set to a seemingly random area.
This is one of the ways that even a build with a corresponding symbol database can cause a symbol lookup error (or strange looking backtraces). If you still get this after you have the symbol table, your problem is likely that your customer's data is causing some issues with your code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With