Service Bus triggered Azure Function AutoComplete false, duplicates

As you read here the default setting for AutoComplete is true.  When you create a Service Bus triggered Azure Function you install an extension named Microsoft.Azure.WebJobs.Extensions.ServiceBus as shown in Figure 1.

image

Figure 1, Service Bus Azure Function AutoComplete

The NuGet package is here and the source code here.  The point of this article is explain what happens when you set AutoComplete to false, which is supported in C# only.  What this does is it removes the responsibility of managing the message life cycle from the extension, to you!  Are you ready for that?  Do you know what this means?  For starters, have a look at the ServiceBusListners.cs file.  I linked that specifically to a line which call the receiver.CompleteAsync() method.  That won’t happen if you set AutoComplete to false, you have to do it.  Look around in that code and you will find AbandonAsync() and CompleteAsync().  You might also need to consider when to send a message to the Dead Letter Queue (DLQ).

An issue happens when for some reason you do not call the CompleteAsync()  method.  This can happen if your code throws an unhandled exception or even a handled exception and you don’t call the CompleteAsync() method in you catch{} block.  This also has some to do with timing and where your code throws the exception.  If you code throws an exception before the message is process by you code and you do not call the CompleteAsync() method, then after the Lock Duration threshold you set on the Service Bus is breached, the message gets place back into the queue, with an incremented Delivery Count.  But in that scenario, no problem, because the exception happened before the message was processed by your code.  However, if the message did get processed, and you code threw an exception afterwards, before you called the CompleteAsync() method, then, same like before, after the Lock Duration threshold is reached, the message will get placed back into the queue and you will reprocess it, which probably isn’t desired.

To avoid this scenario, make sure your code handles exceptions by including all code within try{}catch{} blocks and calling the CompleteAsync()  method from within the catch{} block.

Let’s take a closer look.  I wrote a test Service Bus triggered Azure Function and placed the code here and I used my Azure Function Consumer to send some messages to it, you can find that here.

In the code I wrote this catch{} snippet.  I would consider an exception like this to be a transient one, and in that case it would be prudent to take another shot at processing it again after a moment or so.  But, at some point you need to determine that the code just isn’t running and you need to stop so you don’t make the situation worse by making the attempt 1000’s of times, but how often you want to retry is very dependent on your requirements and what you are trying to do.

catch (TimeoutException toex)
{
     //wait a few seconds and retry or circuit breaker
     log.LogInformation($"A transient exception happened: {toex.Message}");
     await messageReceiver.AbandonAsync(lockToken);
}

What I do in that snippet is, instead of wrongly calling the Complete method, because it is not complete, instead I want to abandon it, so it gets placed back into the Service Bus queue which gives it a chance to be reprocessed again later.  I added a few messages to the queue and this is what I saw from the console logging in the Azure portal, Figure 2.

image

Figure 2, Service Bus Azure Function AutoComplete, Abandon

You can see that the MessageId (red) and the Message (orange) remained the same in all 3 invocations.  The DeliveryCount (yellow) increased by 1 for each invocation, and stopped when the transient exception (green) stopped happening.  One point to call out here is that the Function Invocation succeeded, even though the message was not processed successfully, until the final attempt.  If you check in Log Analytics you will also see 3 invocations with the same MessageId, Figure 3.

image

Figure 3, Service Bus Azure Function AutoComplete, Abandon, duplicates

But we know, and you know this was intentional, it was caused by.  To look at the information you logged in your Function using the ILogger, you take the FunctionId from the customDimensions.FunctionId from the requests query and use it in a traces query, Figure 4.

image

Figure 4, Service Bus Azure Function AutoComplete, Abandon, duplicates

The next snippet I tested out was the DeadLetterAsync() method.  I thought about a scenario where a message producer creates a message which is corrupt, invalid or makes the function code not work.  I considered catching this in a FormatException catch{} block.  Notice that I checked the DeliveryCount before sending it directly to the DLQ, I did this just to make sure the message didn’t have a chance of being processed.  I.e. try a few times and if it stills doesn’t work, stop and move on.  NOTE: you will need to build some process or code to monitor and review the messages which get placed into the DLQ, there is no system process or feature which does this for you.

catch (FormatException fex)
{
    if (message.SystemProperties.DeliveryCount > 2)
   {
     log.LogInformation($"Sending message; {message.MessageId} to DLQ");
     await messageReceiver.DeadLetterAsync(lockToken, fex.Message + " sending to DLQ");
   }
  else
   {
     log.LogInformation($"An format exception happened: {fex.Message}, DeliveryCount: …");
     await messageReceiver.AbandonAsync(lockToken);
    }
}

I added some messages, if you take a look at the code I wrote here, I purposefully triggered this exception.  Otherwise, I couldn’t have tested it out so easily.  I think it is kind of a good troubleshooting technique.  Anyway, here is what I experienced.  It did however attempt to reprocess it once the Lock Duration timed out.  I didn’t catch this originally and thought there was a problem and the message was getting hung in the queue.  But that wasn’t the case.

I also realized that a way to simply retry the processing of the message, if you do not want to wait for the Lock Duration to expire, is to abandon it.  When I first tested this, I didn’t have the else block and nothing was happening.  I wasn’t seeing the message being retried once the exception was thrown in my code which caused the Function Invocation to complete. 

Once I got that going, I progressed.  See Figure 5.

image

Figure 5, Service Bus Azure Function AutoComplete, DeadLetterAsync, duplicates

The executed as expected, the function was invoked, it tried to process the same message twice and on the third time it put the message into the Dead Letter Queue (DLQ).  You can manually check the contents of the DLQ via the Azure Portal.  Navigate to the Service Bus Queue and click on the Service Bus Explorer navigation link.  The output resembles that shown in Figure 6.

image

Figure 6, Service Bus Azure Function AutoComplete, DeadLetterAsync, duplicates

Again, you would see duplicate Function Invocations in Log Analytics and in the Logging Azure Portal Console, but the messages wouldn’t have processed because of an exception.  You can also notice that the Message ID in the DLQ matches the one you see in Figure 5.  You can perform that same Log Analytics queries on the requests and traces tables which you see in Figure 3 and 4, for this kind of analysis too.

Finally, I wanted to check to see what happened to the messages which threw an exception, but there was no CompleteAsync() method called.  I basically commented out that call from my catch-all catch{} block.

catch (Exception ex)
{
  log.LogInformation($"An exception happened: {ex.Message}, not sure why…");
  await messageReceiver.CompleteAsync(lockToken);
}

What I experienced was exactly what what is expected.  If you navigate to the Service Bus queue Overview blade in the Azure Portal you will see the Message Lock duration with its default value of 30 seconds.  See Figure 7.

image

Figure 7, Service Bus Azure Function AutoComplete, CompleteAsync, duplicates

When I added a message to the Service Bus Queue which threw an exception, it looked like this in the portal console logging, Figure 8.

image

Figure 8, Service Bus Azure Function AutoComplete, CompleteAsync, duplicates

Notice that the message triggered a Function Invocation which threw an exception (green).  As expected, 30 seconds later, after the Lock Duration threshold was breached, the message lock was released and, because it was not completed it appeared as eligible for processing and it did.  Where in your code and what implications this has on your solution depends on many scenarios.  You might consider adding some logic to check the DeliveryCount which will tell you if this message has had an attempted Function invocation on it.

What happens if the exception is unhandled

If there is an exception on the host/runtime/platform it is possible that none of the Message Receiver methods , complete, abandon or deadletter get called.  Let’s test this out and see what happens, see Figure 9.

image

Figure 9, Service Bus Azure Function

This behavior would be the same regardless of the AutoComplete setting.  What I experienced was that the Function Invocation happened 10 times, which is the Max Delivery Count you see set in Figure 7.  After those 10 attempts, the message was placed into the DLQ, similar to what you saw in Figure 6.

Recommendation

By default, the Run() for a Service Bus triggered Azure Function is synchronous. 

image

In order to call the asynchronous methods you need to change that to an async method, like this.

image

I highly recommend that you do not place .Result or .Wait on the Task instance.  Read more about that here.

Links of interest