I'm designing a .net core web api that consumes an external api that I do not control. I've found some excellent answers on stack overflow that allowed me to throttle my requests to this external API while in the same thread using semaphoreslim. I'm wondering how best to extend this throttling to be application wide instead of just throttling for a specific list of Tasks. I've been learning about HttpMessageHandlers and this seems to be a possible way to intercept all outgoing messages and apply throttling. But I'm concerned about thread safety and locking issues I may not understand. I'm including my current throttling code and hope that may be helpful in understanding what I'm trying to do, but across multiple threads, and with tasks being continuously added instead of a pre-defined list of tasks.
private static async Task<List<iMISPagedResultResponse>> GetAsyncThrottled(List<int> pages, int throttle, IiMISClient client, string url, int limit)
{
        var rtn = new List<PagedResultResponse>();
        var allTasks = new List<Task>();
        var throttler = new SemaphoreSlim(initialCount: throttle);
        foreach (var page in pages)
        {
            await throttler.WaitAsync();
            allTasks.Add(
                Task.Run(async () =>
                {
                    try
                    {
                        var result = await GetPagedResult(client, url, page);
                        return result;
                    }
                    finally
                    {
                        throttler.Release();
                    }
                }));
        }
        await Task.WhenAll(allTasks);
        foreach (var task in allTasks)
        {
            var result = ((Task<PagedResultResponse>)task).Result;
            rtn.Add(result);
        }
        return rtn;
}
One way to implement API throttling in distributed systems is to use sticky sessions. In this method, all requests from a user are always serviced by a particular server. However, this solution is not well-balanced or fault tolerant. The second solution to API throttling in distributed systems are locks.
The term Rate-Limiting refers to the broader concept of restricting the request traffic to an API endpoint at any point in time. Throttling is a particular process of applying rate-limiting to an API endpoint. There are other ways an API endpoint can apply rate-limiting. One such way is the use of Request Queues.
ASP.NET Core apps should be designed to process many requests simultaneously. Asynchronous APIs allow a small pool of threads to handle thousands of concurrent requests by not waiting on blocking calls. Rather than waiting on a long-running synchronous task to complete, the thread can work on another request.
You can implement throttling by adding @Throttling annotation to the service method of the request that should be throttled. As you can see @Throttling annotation alone is equivalent to the annotation below with parameters. The default is 1 method call per second.
SemaphoreSlim is thread-safe so there are no thread-safety or locking concerns about using it as a parallelism throttle across multiple threads.HttpMessageHandlers are indeed an outbound middleware mechanism to intercept calls placed through HttpClient. So they are an ideal way to apply parallelism-throttling to Http calls using SemaphoreSlim.So a ThrottlingDelegatingHandler might look like this:
public class ThrottlingDelegatingHandler : DelegatingHandler
{
    private SemaphoreSlim _throttler;
    public ThrottlingDelegatingHandler(SemaphoreSlim throttler)
    {
        _throttler = throttler ?? throw new ArgumentNullException(nameof(throttler));
    }
    protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        if (request == null) throw new ArgumentNullException(nameof(request));
        await _throttler.WaitAsync(cancellationToken);
        try
        {
            return await base.SendAsync(request, cancellationToken);
        }
        finally
        {
            _throttler.Release();
        }
    }
}
Create and maintain an instance as a singleton:
int maxParallelism = 10;
var throttle = new ThrottlingDelegatingHandler(new SemaphoreSlim(maxParallelism)); 
Apply that DelegatingHandler to all instances of HttpClient through which you want to parallel-throttle calls:
HttpClient throttledClient = new HttpClient(throttle);
That HttpClient does not need to be a singleton: only the throttle instance does.
I've omitted the Dot Net Core DI code for brevity, but you would register the singleton ThrottlingDelegatingHandler instance with .Net Core's container, obtain that singleton by DI at point-of-use, and use it in HttpClients you construct as shown above.
But:
The above still begs the question how you are going to manage HttpClient lifetimes:
HttpClients do not pick up DNS updates. Your app will be ignorant of DNS updates unless you kill and restart it (perhaps undesirable).using (HttpClient client = ) { }, on the other hand, can cause socket exhaustion.One of the design goals of HttpClientFactory was to manage the lifecycles of HttpClient instances and their delegating handlers, to avoid these problems.
In .NET Core 2.1, you could use HttpClientFactory to wire it all up in ConfigureServices(IServiceCollection services) in the Startup class, like this:
int maxParallelism = 10;
services.AddSingleton<ThrottlingDelegatingHandler>(new ThrottlingDelegatingHandler(new SemaphoreSlim(maxParallelism)));
services.AddHttpClient("MyThrottledClient")
    .AddHttpMessageHandler<ThrottlingDelegatingHandler>();
("MyThrottledClient" here is a named-client approach just to keep this example short; typed clients avoid string-naming.)
At point-of-use, obtain an IHttpClientFactory by DI (reference), then call
var client = _clientFactory.CreateClient("MyThrottledClient");
to obtain an HttpClient instance pre-configured with the singleton ThrottlingDelegatingHandler.
All calls through an HttpClient instance obtained in this manner will be throttled (in common, across the app) to the originally configured int maxParallelism.
And HttpClientFactory magically deals with all the HttpClient lifetime issues.
Polly is deeply integrated with IHttpClientFactory and Polly also provides Bulkhead policy which works as a parallelism throttle by an identical SemaphoreSlim mechanism.
So, as an alternative to hand-rolling a ThrottlingDelegatingHandler, you can also just use Polly Bulkhead policy with IHttpClientFactory out of the box.  In your Startup class, simply:
int maxParallelism = 10;
var throttler = Policy.BulkheadAsync<HttpResponseMessage>(maxParallelism, Int32.MaxValue);
services.AddHttpClient("MyThrottledClient")
    .AddPolicyHandler(throttler);
Obtain the pre-configured HttpClient instance from HttpClientFactory as earlier.  As before, all calls through such a "MyThrottledClient" HttpClient instance will be parallel-throttled to the configured maxParallelism.
The Polly Bulkhead policy additionally offers the ability to configure how many operations you want to allow simultaneously to 'queue' for an execution slot in the main semaphore. So, for instance:
var throttler = Policy.BulkheadAsync<HttpResponseMessage>(10, 100);
when configured as above into an HttpClient, would allow 10 parallel http calls, and up to 100 http calls to 'queue' for an execution slot.  This can offer extra resilience for high-throughput systems by preventing a faulting downstream system causing an excessive resource bulge of queuing calls upstream.
To use the Polly options with HttpClientFactory, pull in the Microsoft.Extensions.Http.Polly and Polly nuget packages.
References: Polly deep doco on Polly and IHttpClientFactory; Bulkhead policy.
The question uses Task.Run(...) and mentions :
a .net core web api that consumes an external api
and:
with tasks being continuously added instead of a pre-defined list of tasks.
If your .net core web api only consumes the external API once per request the .net core web api handles, and you adopt the approaches discussed in the rest of this answer, offloading the downstream external http call to a new Task with Task.Run(...) will be unnecessary and only create overhead in additional Task instances and thread-switching. Dot net core will already be running the incoming requests on multiple threads on the thread pool.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With