As far as my knowledge on resource management goes, allocating something on the heap (operator new) should always be slower than allocating on the stack (automatic storage), because the stack is a LIFO-based structure, thus it requires minimal bookkeeping, and the pointer of the next address to allocate is trivial.
So far, so good. Now look at the following code:
/* ...includes... */
using std::cout;
using std::cin;
using std::endl;
int bar() { return 42; }
int main()
{
    auto s1 = std::chrono::steady_clock::now();
    std::packaged_task<int()> pt1(bar);
    auto e1 = std::chrono::steady_clock::now();
    auto s2 = std::chrono::steady_clock::now();
    auto sh_ptr1 = std::make_shared<std::packaged_task<int()> >(bar);
    auto e2 = std::chrono::steady_clock::now();
    auto first = std::chrono::duration_cast<std::chrono::nanoseconds>(e1-s1);
    auto second = std::chrono::duration_cast<std::chrono::nanoseconds>(e2-s2);
    cout << "Regular: " << first.count() << endl
         << "Make shared: " << second.count() << endl;
    pt1();
    (*sh_ptr1)();
    cout << "As you can see, both are working correctly: " 
         << pt1.get_future().get() << " & " 
         << sh_ptr1->get_future().get() << endl;
    return 0;
}
The results seem to contradict the stuff explained above:
Regular: 6131
Make shared: 843
As you can see, both are working correctly: 42 & 42
Program ended with exit code: 0
In the second measurement, apart from the call of operator new, the constructor of the std::shared_ptr (auto sh_ptr1) has to finish. I can't seem to understand why is this faster then regular allocation.
What is the explanation for this?
Stack memory allocation is considered safer as compared to heap memory allocation because the data stored can only be access by owner thread. Memory allocation and de-allocation is faster as compared to Heap-memory allocation. Stack-memory has less storage space as compared to Heap-memory.
Key Difference Between Stack and Heap Memory Stack accesses local variables only while Heap allows you to access variables globally. Stack variables can't be resized whereas Heap variables can be resized. Stack memory is allocated in a contiguous block whereas Heap memory is allocated in any random order.
Because the heap is a far more complicated data structure than the stack. For many architectures, allocating memory on the stack is just a matter of changing the stack pointer, i.e. it's one instruction.
Stack allocation is much faster since all it really does is move the stack pointer. Using memory pools, you can get comparable performance out of heap allocation, but that comes with a slight added complexity and its own headaches.
The problem is that the first call to the constructor of std::packaged_task is responsible for initializing a load of per-thread state that is then unfairly attributed to pt1.  This is a common problem of benchmarking (particularly microbenchmarking) and is alleviated by warmup; try reading How do I write a correct micro-benchmark in Java?
If I copy your code but run both parts first, the results are the same to within the limits of the resolution of the system clock. This demonstrates another issue of microbenchmarking, that you should run small tests multiple times to allow total time to be measured accurately.
With warmup and running each part 1000 times, I get the following (example):
Regular: 132.986
Make shared: 211.889
The difference (approx 80ns) accords well with the rule of thumb that malloc takes 100ns per call.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With