Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improve performance of Async Post using C# HttpClient

I am trying to find way to further improve the performance of my console app (already fully working).

I have a CSV file which contains a list of addresses (about 100k). I need to query a Web API whose POST response would be the geographical coordinates of such addresses. Then I am going to write a GeoJSON file to the file system with the address data enriched with geographical coordinates (latitude and longitude).

My current solution splits the data into batches of 1000 records and sends Async POST requests to the Web API using HttpClient (.NET core 3.1 with console app and class library using .NET Standard 2.0). GeoJSON is my DTO class.

public class GeoJSON
    {
        public string Locality { get; set; }
        public string Street { get; set; }
        public string StreetNumber { get; set; }
        public string ZIP { get; set; }
        public string Latitude { get; set; }
        public string Longitude { get; set; }
    }


public static async Task<List<GeoJSON>> GetAddressesInParallel(List<GeoJSON> geos)
        {
            //calculating number of batches based on my batchsize (1000)
            int numberOfBatches = (int)Math.Ceiling((double)geos.Count() / batchSize);

            for (int i = 0; i < numberOfBatches; i++)
            {
                var currentIds = geos.Skip(i * batchSize).Take(batchSize);
                var tasks = currentIds.Select(id => SendPOSTAsync(id));
                geoJSONs.AddRange(await Task.WhenAll(tasks));
            }

            return geoJSONs;
        }

My Async POST method looks like this:

 public static async Task<GeoJSON> SendPOSTAsync(GeoJSON geo)
        {
            string payload = JsonConvert.SerializeObject(geo);
            HttpContent c = new StringContent(payload, Encoding.UTF8, "application/json");
            using HttpResponseMessage response = await client.PostAsync(URL, c).ConfigureAwait(false);

            if (response.IsSuccessStatusCode)
            {
                var address = JsonConvert.DeserializeObject<GeoJSON>(await response.Content.ReadAsStringAsync());
                geo.Latitude = address.Latitude;
                geo.Longitude = address.Longitude;
            }
            return geo;
        }

The Web API runs on my local machine as Self Hosted x86 application. The whole application ends in less than 30s. The most time consuming part is the Async POST part (about 25s). The Web API takes only one address for each post, otherwise I'd have sent multiple addresses in one request.

Any ideas on how to improve performance of the request against the Web API?

like image 820
fpsanti Avatar asked Mar 21 '26 14:03

fpsanti


1 Answers

A potential problem of your batching approach is that a single delayed response may delay the completion of a whole batch. It may not be an actual problem because the web service you are calling may have very consistent response times, but in any case you could try an alternative approach that allows controlling the concurrency without the use of batching. The example bellow uses the TPL Dataflow library, which is built-in the .NET Core platform and available as a package for .NET Framework:

public static async Task<List<GeoJSON>> GetAddressesInParallel(List<GeoJSON> geos)
{
    var block = new ActionBlock<GeoJSON>(async item =>
    {
        await SendPOSTAsync(item);
    }, new ExecutionDataflowBlockOptions()
    {
        MaxDegreeOfParallelism = 1000
    });

    foreach (var item in geos)
    {
        await block.SendAsync(item);
    }
    block.Complete();

    await block.Completion;
    return geos;
}

Your SendPOSTAsync method just returns the same GeoJSON that receives as argument, so the GetAddressesInParallel can also return the same List<GeoJSON> that receives as argument.

The ActionBlock is the simplest of the blocks available in the library. It just executes a sync or async action for every item, allowing the configuration of the MaxDegreeOfParallelism among other options. You could also try splitting your workflow in multiple blocks, and then link them together to form a pipeline. For example:

  1. TransformBlock<GeoJSON, (GeoJSON, string)> that serializes the GeoJSON objects to JSON.
  2. TransformBlock<(GeoJSON, string), (GeoJSON, string)> that makes the HTTP requests.
  3. ActionBlock<(GeoJSON, string)> that deserializes the HTTP responses and updates the GeoJSON objects with the received values.

Such an arrangement would allow you to fine-tune the MaxDegreeOfParallelism of each block, and hopefully achieve the optimal performance.

like image 146
Theodor Zoulias Avatar answered Mar 24 '26 03:03

Theodor Zoulias



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!