My .NET Core 3.1 app uses Polly 7.1.0 retry and bulkhead policies for http resilience. The retry policy uses HandleTransientHttpError() to catch possible HttpRequestException.
Now http requests fired with MyClient sometimes return an HttpRequestException. Around half of them are caught and retried by Polly. The other half however ends up in my try-catch-block and I have to retry them manually. This happens before the maximum number of retries is exhausted.
How did I manage to create a race condition preventing Polly from catching all exceptions? And how can I fix this?
I register the policies with the IHttpClientFactory as follows.
public void ConfigureServices(IServiceCollection services)
{
services.AddHttpClient<MyClient>(c =>
{
c.BaseAddress = new Uri("https://my.base.url.com/");
c.Timeout = TimeSpan.FromHours(5); // Generous timeout to accomodate for retries
})
.AddPolicyHandler(GetHttpResiliencePolicy());
}
private static AsyncPolicyWrap<HttpResponseMessage> GetHttpResiliencePolicy()
{
var delay = Backoff.DecorrelatedJitterBackoffV2(medianFirstRetryDelay: TimeSpan.FromSeconds(1), retryCount: 5);
var retryPolicy = HttpPolicyExtensions
.HandleTransientHttpError() // This should catch HttpRequestException
.OrResult(msg => msg.StatusCode == HttpStatusCode.NotFound)
.WaitAndRetryAsync(
sleepDurations: delay,
onRetry: (response, delay, retryCount, context) => LogRetry(response, retryCount, context));
var throttlePolicy = Policy.BulkheadAsync<HttpResponseMessage>(maxParallelization: 50, maxQueuingActions: int.MaxValue);
return Policy.WrapAsync(retryPolicy, throttlePolicy);
}
The MyClient that is firing the http requests looks as follows.
public async Task<TOut> PostAsync<TOut>(Uri requestUri, string jsonString)
{
try
{
using (var content = new StringContent(jsonString, Encoding.UTF8, "application/json"))
using (var response = await httpClient.PostAsync(requestUri, content)) // This throws HttpRequestException
{
// Handle response
}
}
catch (HttpRequestException ex)
{
// This should never be hit, but unfortunately is
}
}
Here is some additional information, although I'm not sure that it's relevant.
HttpClient is DI-registered transiently, there are 10 instances of it flying around per unit of work.HttpRequestExceptionWhenever we are talking about Polly policies then we can distinguish two different exceptions:
HttpRequestException).WebException in our case)."Around half of them are caught and retried by Polly.
The other half however ends up in my try-catch-block"
This can happen if some of your retries run out of attempts. In other words there are some requests which could not succeeded in 6 attempts (5 retry and 1 initial attempt).
This can be easily verified with one of the following two tools:
onRetry + contextFallback + contextonRetry + contextThe onRetry is called when the retry policy is triggered but before the sleep duration. The delegate receives the retryCount. So to be able to connect / relate separate log entries of the same request you need to use some sort of correlation id. The simplest way to have one can be coded like this:
public static class ContextExtensions
{
private const string Key = "CorrelationId";
public static Context SetCorrelation(this Context context, Guid? id = null)
{
context[Key] = id ?? Guid.NewGuid();
return context;
}
public static Guid? GetCorrelation(this Context context)
{
if (!context.TryGetValue(Key, out var id))
return null;
if (id is Guid correlation)
return correlation;
return null;
}
}
Here is a simplified example:
The to be executed method
private async Task<string> Test()
{
await Task.Delay(1000);
throw new CustomException("");
}
The policy
var retryPolicy = Policy<string>
.Handle<CustomException>()
.WaitAndRetryAsync(5, _ => TimeSpan.FromSeconds(1),
(result, delay, retryCount, context) =>
{
var id = context.GetCorrelation();
Console.WriteLine($"{id} - #{retryCount} retry.");
});
The usage
var context = new Context().SetCorrelation();
try
{
await retryPolicy.ExecuteAsync(async (ctx) => await Test(), context);
}
catch (CustomException)
{
Console.WriteLine($"{context.GetCorrelation()} - All retry has been failed.");
}
The sample output
3319cf18-5e31-40e0-8faf-1fba0517f80d - #1 retry.
3319cf18-5e31-40e0-8faf-1fba0517f80d - #2 retry.
3319cf18-5e31-40e0-8faf-1fba0517f80d - #3 retry.
3319cf18-5e31-40e0-8faf-1fba0517f80d - #4 retry.
3319cf18-5e31-40e0-8faf-1fba0517f80d - #5 retry.
3319cf18-5e31-40e0-8faf-1fba0517f80d - All retry has been failed.
FallbackAs it was being said whenever the policy can't succeed then it will re-throw the handled exception. In other words if a policy fails then it escalates the problem to the next level (next outer policy).
Here is a simplified example:
The policy
var fallbackPolicy = Policy<string>
.Handle<CustomException>()
.FallbackAsync(async (result, ctx, ct) =>
{
await Task.FromException<CustomException>(result.Exception);
return result.Result; //it will never be executed << just to compile
},
(result, ctx) =>
{
Console.WriteLine($"{ctx.GetCorrelation()} - All retry has been failed.");
return Task.CompletedTask;
});
The usage
var context = new Context().SetCorrelation();
try
{
var strategy = Policy.WrapAsync(fallbackPolicy, retryPolicy);
await strategy.ExecuteAsync(async (ctx) => await Test(), context);
}
catch (CustomException)
{
Console.WriteLine($"{context.GetCorrelation()} - All policies failed.");
}
The sample output
169a270e-acf7-45fd-8036-9bd1c034c5d6 - #1 retry.
169a270e-acf7-45fd-8036-9bd1c034c5d6 - #2 retry.
169a270e-acf7-45fd-8036-9bd1c034c5d6 - #3 retry.
169a270e-acf7-45fd-8036-9bd1c034c5d6 - #4 retry.
169a270e-acf7-45fd-8036-9bd1c034c5d6 - #5 retry.
169a270e-acf7-45fd-8036-9bd1c034c5d6 - All retry has been failed.
169a270e-acf7-45fd-8036-9bd1c034c5d6 - All policies failed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With