The following code
#include <iostream>
#include <future>
#include <thread>
#include <mutex>
std::mutex m;
struct Foo {
Foo() {
std::unique_lock<std::mutex> lock{m};
std::cout <<"Foo Created in thread " <<std::this_thread::get_id() <<"\n";
}
~Foo() {
std::unique_lock<std::mutex> lock{m};
std::cout <<"Foo Deleted in thread " <<std::this_thread::get_id() <<"\n";
}
void proveMyExistance() {
std::unique_lock<std::mutex> lock{m};
std::cout <<"Foo this = " << this <<"\n";
}
};
int threadFunc() {
static thread_local Foo some_thread_var;
// Prove the variable initialized
some_thread_var.proveMyExistance();
// The thread runs for some time
std::this_thread::sleep_for(std::chrono::milliseconds{100});
return 1;
}
int main() {
auto a1 = std::async(std::launch::async, threadFunc);
auto a2 = std::async(std::launch::async, threadFunc);
auto a3 = std::async(std::launch::async, threadFunc);
a1.wait();
a2.wait();
a3.wait();
std::this_thread::sleep_for(std::chrono::milliseconds{1000});
return 0;
}
Compiled and run width clang in macOS:
clang++ test.cpp -std=c++14 -pthread
./a.out
Got result
Foo Created in thread 0x70000d9f2000 Foo Created in thread 0x70000daf8000 Foo Created in thread 0x70000da75000 Foo this = 0x7fd871d00000 Foo this = 0x7fd871c02af0 Foo this = 0x7fd871e00000 Foo Deleted in thread 0x70000daf8000 Foo Deleted in thread 0x70000da75000 Foo Deleted in thread 0x70000d9f2000
Compiled and run in Visual Studio 2015 Update 3:
Foo Created in thread 7180 Foo this = 00000223B3344120 Foo Created in thread 8712 Foo this = 00000223B3346750 Foo Created in thread 11220 Foo this = 00000223B3347E60
Destructor are not called.
Is this a bug or some undefined grey zone?
P.S.
If the sleep std::this_thread::sleep_for(std::chrono::milliseconds{1000});
at the end is not long enough, you may not see all 3 "Delete" messages sometimes.
When using std::thread
instead of std::async
, the destructors get called on both platform, and all 3 "Delete" messages will always be printed.
Introductory Note: I have now learned a lot more about this and have therefore re-written my answer. Thanks to @super, @M.M and (latterly) @DavidHaim and @NoSenseEtAl for putting me on the right track.
tl;dr Microsoft's implementation of std::async
is non-conformant, but they have their reasons and what they have done can actually be useful, once you understand it properly.
For those who don't want that, it is not too difficult to code up a drop-in replacement replacement for std::async
which works the same way on all platforms. I have posted one here.
Edit: Wow, how open MS are being these days, I like it, see: https://github.com/MicrosoftDocs/cpp-docs/issues/308
Let's being at the beginning. cppreference has this to say (emphasis and strikethrough mine):
The template function
async
runs the functionf
asynchronously (potentiallyoptionally in a separate thread which may be part of a thread pool).
However, the C++ standard says this:
If
launch::async
is set inpolicy
, [std::async
] calls [the function f] as if in a new thread of execution ...
So which is correct? The two statements have very different semantics as the OP has discovered. Well of course the standard is correct, as both clang and gcc show, so why does the Windows implementation differ? And like so many things, it comes down to history.
The (oldish) link that M.M dredged up has this to say, amongst other things:
... Microsoft has its implementation of [
std::async
] in the form of PPL (Parallel Pattern Library) ... [and] I can understand the eagerness of those companies to bend the rules and make these libraries accessible throughstd::async
, especially if they can dramatically improve performance...... Microsoft wanted to change the semantics of
std::async
when called withlaunch_policy::async.
I think this was pretty much ruled out in the ensuing discussion ... (rationale follows, if you want to know more then read the link, it's well worth it).
And PPL is based on Windows' built-in support for ThreadPools, so @super was right.
So what does the Windows thread pool do and what is it good for? Well, it's intended to manage frequently-sheduled, short-running tasks in an efficient way so point 1 is don't abuse it, but my simple tests show that if this is your use-case then it can offer significant efficiencies. It does, essentially, two things
std::async
will block until a thread becomes free. On my machine, this number is 768.So knowing all that, we can now explain the OP's observations:
A new thread is created for each of the three tasks started by main()
(because none of them terminates immediately).
Each of these three threads creates a new thread-local variable Foo some_thread_var
.
These three tasks all run to completion but the threads they are running on remain in existence (sleeping).
The program then sleeps for a short while and then exits, leaving the 3 thread-local variables un-destructed.
I ran a number of tests and in addition to this I found a few key things:
std::async
might therefore take a while to return (up to 300ms in my tests). In the meantime, it's just hanging around, hoping that its ship will come in. This behaviour is documented but I call it out here in case it takes you by surprise.Conclusions:
Microsoft's implementation of std::async
is non-conformant but it is clearly designed with a specific purpose, and that purpose is to make good use of the Win32 ThreadPool API. You can beat them up for blantantly flouting the standard but it's been this way for a long time and they probably have (important!) customers who rely on it. I will ask them to call this out in their documentation. Not doing that is criminal.
It is not safe to use thread_local variables in std::async
tasks on Windows. Just don't do it, it will end in tears.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With