I have a device which has an library. Some of its functions are most awesomely ill-behaved, in the "occasionally hang forever" sense.
I have a program which uses this device. If/when it hangs, I need to be able to recover gracefully and reset it. The offending calls should return within milliseconds and are being called in a loop many many times per second.
My first question is: when a thread running the recalcitrant function hangs, what do I do? Even if I litter the thread with interruption points, this happens:
boost::this_thread::interruption_point(); // irrelevant, in the past
deviceLibrary.thatFunction(); // <-- hangs here forever
boost::this_thread::interruption_point(); // never gets here!
The only word I've read on what to do there is to modify the function itself, but that's out of the question for a variety of reasons -- not least of which is "this is already miles outside of my skill set".
I have tried asynchronous launching with C++11 futures:
// this was in a looping thread -- it does not work: wait_for sometimes never returns
std::future<void> future = std::async(std::launch::async, 
    [this] () { deviceLibrary.thatFunction(*data_ptr); }); 
if (future.wait_for(std::chrono::seconds(timeout)) == std::future_status::timeout) { 
    printf("no one will ever read this\n"); 
    deviceLibrary.reset(); // this would work if it ever got here
}
No dice, in that or a number of variations.
I am now trying boost::asio with a thread_group of a number of worker threads running io_service::run(). It works magnificently until the second time it times out. Then I've run out of threads, because each hanging thread eats up one of my thread_group and it never comes back ever.
My latest idea is to call work_threads.create_thread to make a new thread to replace the now-hanging one. So my second question is: if this is a viable way of dealing with this, how should I cope with the slowly amassing group of hung threads? How do I remove them? Is it fine to leave them there?
Incidentally, I should mention that there is in fact a version of deviceLibrary.thatFunction() that has a timeout. It doesn't. 
I found this answer but it's C# and Windows specific, and this one which seems relevant. But I'm not so sure about spawning hundreds of extra processes a second (edit: oh right; I could banish all the calls to one or two separate processes. If they communicate well enough and I can share the device between them. Hm...)
Pertinent background information: I'm using MSVC 2013 on Windows 7, but the code has to cross-compile for ARM on Debian with GCC 4.6 also. My level of C++ knowledge is... well... if it seems like I'm missing something obvious, I probably am.
Thanks!
Can we write multithreading programs in C? Unlike Java, multithreading is not supported by the language standard. POSIX Threads (or Pthreads) is a POSIX standard for threads. Implementation of pthread is available with gcc compiler.
If you want to reliably kill something that's out of your control and may hang, use a separate process.
While process isolation was once considered to be very 'heavy-handed', browsers like Chrome today will implement it on a per-tab basis. Each tab gets a process, the GUI has a process, and if the tab rendering dies it doesn't take down the whole browser.
How can Google Chrome isolate tabs into separate processes while looking like a single application?
Threads are simply not designed for letting a codebase defend itself from ill-behaved libraries. Processes are.
So define the services you need, put that all in one program using your flaky libraries, and use interprocess communication from your main app to speak with the bridge. If the bridge times out or has a problem due to the flakiness, kill it and restart it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With