I am using the Boost ASIO library to implement a Windows UDP client which needs to be capable of high throughput.
I would like to use asynchronous receive calls so that I can eventually implement a receive timeout, ie. after a certain amount of time if no datagrams have been received my application will exit.
My problem is that I see 30% higher data throughput using synchronous receives vs. asynchronous receives. I have observed this issue when running the application on multiple Dell R630, R710 Windows 2008 Servers, and even my Lenovo ThinkPad laptop.
What are the main performance differences between the two code segments below?
Is there more overhead in calling ioService.run_one() after each async receive?
I am a new user of the Boost library, so any help would be much appreciated!
Synchronous Receive:
socket_->receive_from(boost::asio::buffer(&vector_[0], datagramSize),  
                      endPoint_);
vs.
Asynchronous Receive (with blocking):
err = boost::asio::error::would_block;
socket_->async_receive_from(
    boost::asio::mutable_buffers_1(&vector_[0], datagramSize),
    endPoint_,
    boost::bind(&HandleRead, _1, _2, &err, &bytesReceived));
do
{
    ioService_.run_one()
}
while(err == boost::asio::error::would_block)
Asynchronous Receive Handler Function:
static void HandleRead
(
    const boost::system::error_code& error, 
    std::size_t bytesRead,
    boost::system::error_code* outError, 
    std::size_t* outBytesRead
)
{
    *outError = error;
    *outBytesRead = bytesRead;
}
It shouldn't come as a surprise that the async_ family of API functions has as most important property that they run asynchronously.
Running anything asynchronously is not - by itself - going to make it faster. In fact, due to scheduling artefacts it might be slower.
The thing is that asynchrony can allow you to do many more things on a small number of threads (e.g. the main thread).
It sounds a little bit as if your application doesn't require that multiplexing kind of operation. If your application indeed consumes a single source of packets as fast as possible in a linear fashion, than indeed it makes no sense to
io_service to schedule the tasks across the available service threads¹ (you have only one)shared_ptr<>s. If so, these are all sources of more delays (due to reduced locality of reference, more dynamic allocations etc.).Don't use asynchronous mode if you don't need it.
Even if you have a limited number of essentially single-threaded, sequential running tasks, you can probably achieve the most by having a thread for each, a io_service per thread and avoid the coordination.
¹ threads running io_service::run or similar
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With