I have implemented an EventGrid Trigger to respond to Blob Storage Events the logic of which is simplified below :
public static async void Run(
JObject eventGridEvent,
TraceWriter log,
ExecutionContext context)
{
string eventContent = ParseEvent(eventGridEvent);
HttpClient client = GetProxyClient();
HttpResponseMessage response = await client.GetAsync("blabla/" + eventContent);
string responseContent = await response.Content.ReadAsStringAsync();
log.Info("Here is the response :" + responseContent);
}
The external API does not take long to respond (1 second or less) and my configuration for the host is set to default (so an unbounded number of concurrent calls is allowed).
I am getting a lot of duplicated events in the logs when adding multiple blobs (starting at just 2 blobs) at the same time (a script is quickly uploading the blobs one by one with no wait time in between).
I feel that this might be due to the fact that I never acknowledge receiving the events and I don't know if I am supposed to do this in my code or whether the EventGrid Trigger does that automatically.
Is the logic for acknowledging the processing of an event supposed to be implemented within an EventGrid Trigger (Http 200 response) or is this handled automatically?
If not should I still be getting duplicated events? Typically, when uploading a single blob I receive the event for it 3-4 times.
The reason I ask this question is that when using a Http Trigger and returning a 400 response I also get duplicated events which makes sense since I am not acknowledging having correctly processed the event. However, when I return a 200 response then I do not receive duplicated events.
Thanks
You don't need to do anything special to indicate success to Event Grid. If your function execution succeeds (does not throw exception), the trigger will respond with success status code automatically.
You may try using an EventGrid Advanced Filter
of data.api String ends with FlushWithClose
. The reason my Azure Function was executing multiple times upon blob upload was because an EventGrid message was created for every AppendFile
action being performed for the blob upload.
I found out (by trial and error) that Azure Data Factory uses a series of API calls to write a single blob to Blob Storage.
Ends up looking something like this:
CreateFilePath
LeaseFile
AppendFile
AppendFile
AppendFile
(each append puts a chunk of the blob until the blob is complete)FlushFile
(this is the actual indication that the file has finished; hence the Advanced Filter shown above)LeaseFile
Here is a sample query to view this upload flow yourself:
Uri
of a sample file uploaded to the blob container//==================================================//
// Author: Eric
// Created: 2021-05-26 0900
// Query: ADF-to-Blob Storage reference flow
// Purpose:
// To provide a reference flow of ADF-to-Blob Storage
// file uploads
//==================================================//
// Assign variables
//==================================================//
let varStart = ago(10d);
let varEnd = now();
let varStorageAccount = '<storageaccountname>';
let varStatus = 'Success';
let varSampleUri = 'https://<storageaccountname>.dfs.core.windows.net/<containername>/<parentfolder1>%2F<parentfolder2>%2F<samplefilename.extension>'
//==================================================//
// Filter table
//==================================================//
StorageBlobLogs
| where TimeGenerated between (varStart .. varEnd)
and AccountName == varStorageAccount
and StatusText == varStatus
and split(Uri, '?')[0] == varSampleUri
//==================================================//
// Group and parse results
//==================================================//
| summarize
count() by OperationName,
CorrelationId,
TimeGenerated,
UserAgent = tostring(split(UserAgentHeader, ' ')[0]),
RequesterAppId,
AccountName,
ContainerName = tostring(split(tostring(parse_url(url_decode(Uri))['Path']), '/')[1]),
FileName = tostring(split(tostring(parse_url(url_decode(Uri))['Path']), '/')[-1]),
ChunkSize = format_bytes(RequestBodySize, 2, 'MB'),
StatusCode,
StatusText
| order by TimeGenerated asc
Its interesting to upload samples from different sources (Azure Data Factory, Azure Storage Explorer, Python/C# SDK, Azure Portal, etc.) and see the different API methods they use. In fact, you'll likely need to do this to get your logging and alerting dialed in.
Its too bad the methods aren't standardized across tools as this particular issue is a great pain to discover on your own!
Again, EventGrid Advanced Filters
are your friend in this case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With