Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why would Task parallelism not speed up uploads / downloads?

I wrote a simple C# Console application to measure the speed of downloading 20 Azure blob files (images each about 3mb) sequentially, and also in parallel.

I was under the impression that parallel downloading files would be significantly faster, but in my experience it actually takes a little bit longer. Here is the code for the parallel downloads:

    List<Task> tasks = new List<Task>();
    foreach (string blobName in blobNames)
    {
        Task t = Task.Run(() =>
        {
            CloudBlockBlob blockBlob = container.GetBlockBlobReference(blobName);
            blockBlob.DownloadToFileAsync(blobName, FileMode.Create).Wait();
        });

        tasks.Add(t);
    }

    Task.WaitAll(tasks.ToArray());

Am I approaching this wrong, causing an unnecessary bottleneck or something? Or am I fundamentally misunderstanding the benefits of parallelism?

like image 587
vargonian Avatar asked Nov 16 '25 19:11

vargonian


2 Answers

IMO, you shouldn't start a new task for downloading each blob, since this is an I/O intensive operation and not a computational intensive operation. Using multiple tasks creates more overhead managing the Tasks while you actually do not gain anything.

Change your code to:

List<Task> tasks = new List<Task>();
foreach (string blobName in blobNames)
{
    CloudBlockBlob blockBlob = container.GetBlockBlobReference(blobName);
    tasks.Add(blockBlob.DownloadToFileAsync(blobName, FileMode.Create));
}


Task.WaitAll(tasks.ToArray());

This will spin up multiple async I/O requests, and your code will continue once all blobs have been downloaded from Azure blob-storage. Since we're not awaiting on each download-task seperately, all downloads run concurrently.

like image 131
Frederik Gheysels Avatar answered Nov 19 '25 09:11

Frederik Gheysels


I was under the impression that parallel downloading files would be significantly faster, but in my experience it actually takes a little bit longer. Here is the code for the parallel downloads:

Performance of IO operations depend on a lot of things.

You can only speed up by using parallelization if not all of the nodes are already performing at maximum.

For example, if one of the following conditions apply, you won't benefit and overall performance is likely to downgrade due to overhead:

Specifically for downloading:

  • you are hitting your download/upload capacity of your network, LAN or internet, WAN
  • you are hitting your maximum processing capacity (DISK, memory etc.)
  • you are hitting the remote server's maximum upload capacity (note: can be IP bound, especially with cloud providers)
like image 23
Stefan Avatar answered Nov 19 '25 07:11

Stefan