In my Azure role code I download a 400 megabytes file that is splitted into 10-megabyte chunks and stored in Blob Storage. I use CloudBlob.DownloadToStream() for the download.
I tried two options. One is using a FileStream - I create a "write" FileStream and download chunks one by one into the same stream without rewinding and so I end up with an original file. The other option is creating a MemoryStream object by passing a number slightly larger than the original file size as the stream size (to avoid reallocations) and downloading the chunks into that MemoryStream - this way I end up with a MemoryStream holding the original file data.
Here's some pseudocode:
var writeStream = new StreamOfChoice( params );
foreach( uri in urisToDownload ) {
    blobContainer.GetBlobReference( uri ).DownloadToStream( writeStream );
}
Now the only difference is that it's a FileStream in one case and a MemoryStream in the other, all the rest is the same. It turns out that it takes about 20 seconds with a FileStream and about 30 seconds with a MemoryStream - yes, the FileStream turns out to be faster. According to \Memory\Available Bytes performance counter the virtual machine has about 1 gigabyte memory available at the moment before MemoryStream is created, so it's not due to paging.
Why would writing to a file be faster than to a MemoryStream?
Jon is probably on the ball there. The most likely explanation is,
Regardless of whether memory is quicker or not, you really shouldn't allocate out such large blocks of memory. Have a read about LOH vs SOH here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With