I have created a simple console app that downloads a single (PDF) file from archive.org using the new ASP.NET Core 2.1 HttpClientFactory.
For the particular URL used in that program I always get a TaskCanceledException. If you try to run this code, you would probably get the same exception. It works for other URLs on archive.org though. When downloading the file using wget from the exact the same URL (wget https://archive.org/download/1952-03_IF/1952-03_IF.pdf --output-document=IF.pdf)
the download is successful.
However when I do it with HttpClient I get the below exception.
What could I possible be doing wrong?
Here is the simple code:
using System;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Extensions.DependencyInjection;
using System.IO;
using System.Diagnostics;
namespace test2
{
    public class Program
    {
        public static async Task Main(string[] args)
        {
            var serviceCollection = new ServiceCollection();
            serviceCollection.AddHttpClient("archive", c =>
            {
                c.BaseAddress = new Uri("https://archive.org/download/");
                c.DefaultRequestHeaders.Add("Accept", "application/pdf");
            })
            .AddTypedClient<ArchiveClient>();
            var services = serviceCollection.BuildServiceProvider();
            var archive = services.GetRequiredService<ArchiveClient>();
            await archive.Get();
        }
        private class ArchiveClient
        {
            public ArchiveClient(HttpClient httpClient)
            {
                HttpClient = httpClient;
            }
            public HttpClient HttpClient { get; }
            public async Task Get()
            {
                var request = new HttpRequestMessage(HttpMethod.Get, "1952-03_IF/1952-03_IF.pdf");
                var response = await HttpClient.SendAsync(request).ConfigureAwait(false);
                response.EnsureSuccessStatusCode();
                using (Stream contentStream = await response.Content.ReadAsStreamAsync(), 
                    fileStream = new FileStream("Worlds of IF 1952-03.pdf", FileMode.Create, FileAccess.Write, FileShare.None, 8192, true))
                {
                    var totalRead = 0L;
                    var totalReads = 0L;
                    var buffer = new byte[8192];
                    var isMoreToRead = true;
                    do
                    {
                        var read = await contentStream.ReadAsync(buffer, 0, buffer.Length);
                        if (read == 0)
                        {
                            isMoreToRead = false;
                        }
                        else
                        {
                            await fileStream.WriteAsync(buffer, 0, read);
                            totalRead += read;
                            totalReads += 1;
                            if (totalReads % 2000 == 0)
                            {
                                Console.WriteLine(string.Format("bytes downloaded: {0:n0}", totalRead));
                            }
                        }
                    }
                    while (isMoreToRead);
                }
            }
        }
    }
}
This is the full exception I get:
Unhandled Exception: System.Threading.Tasks.TaskCanceledException: The operation was canceled. 
---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. 
---> System.Net.Sockets.SocketException: Operation canceled    
--- End of inner exception stack trace ---    
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error)    
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)    
at System.Net.Security.SslStreamInternal.<FillBufferAsync>g__InternalFillBufferAsync|38_0[TReadAdapter](TReadAdapter adap, ValueTask`1 task, Int32 min, Int32 initial)    
at System.Net.Security.SslStreamInternal.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer)    
at System.Net.Http.HttpConnection.FillAsync()    
at System.Net.Http.HttpConnection.CopyToExactLengthAsync(Stream destination, UInt64 length, CancellationToken cancellationToken)    
at System.Net.Http.HttpConnection.ContentLengthReadStream.CompleteCopyToAsync(Task copyTask, CancellationToken cancellationToken)    
--- End of inner exception stack trace ---    
at System.Net.Http.HttpConnection.ContentLengthReadStream.CompleteCopyToAsync(Task copyTask, CancellationToken cancellationToken)    
at System.Net.Http.HttpConnection.HttpConnectionResponseContent.SerializeToStreamAsync(Stream stream, TransportContext context, CancellationToken cancellationToken) 
at System.Net.Http.HttpContent.LoadIntoBufferAsyncCore(Task serializeToStreamTask, MemoryStream tempBuffer)    at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)    
at test2.Program.ArchiveClient.Get() in /Users/Foo/Temp/test3/Program.cs:line 42    
at test2.Program.Main(String[] args) in /Users/Foo/Temp/test3/Program.cs:line 27    
at test2.Program.<Main>(String[] args)
In your case it seems that size is the problem. Another thing I would try is to pass the
HttpCompletionOption.ResponseHeadersRead
in the SendAsync() as the second argument. What happens is that your method returns as soon as the headers are read. The response is not buffered in a MemoryStream buffer any more but is read directly from the socket. That means you can start streaming before the whole object is streamed. Performance wise it is significantly faster and in your case speed might be of the essence.
Just remember to dispose the response message otherwise the connection will not be released.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With