A Cancellation Token, an Azure Function and managing disruption

I wrote this post “Batching a Service Bus triggered Azure Function” as a prefix to the one here.  That post shows how to configure a batch of messages so that they get processed in a single Function Invocation.  What is a ‘”Function Invocation” you might ask?  Well…, when you setup an Azure Function you bind it to some resource which triggers (aka notifies) the Function that it needs to run your code.  Each notification (aka trigger) and the execution of the Function is an Invocation.  What the Function does and what information (aka data) it receives is flexible and up to you, the programmer to configure and implement.  Figure 1 shows how to configure the Run() method to accept a batch of Service Bus messages and a cancellation token.

image

Figure 1, batching Service Bus messages, Azure Function

Notice the Message[] parameter in the Run() declaration.  It is an array which means you are going to receive multiple messages per Function Invocation.  Take note that Microsoft.Azure.ServiceBus is being replaced with Azure.Messaging.ServiceBus.  Notice in Figure 2 that you can also configure an Event Hub triggered Azure Function using the same, similar pattern.

image

Figure 2, batching Event Hub messages, Azure Function

Notice the EventData[] parameter in the Run() declaration.  It is an array as well and can receive more than a single message per invocation.  Again, take note that Microsoft.Azure.EventHubs is being replaced with Azure.Messaging.EventHubs.  If you are starting a new project, consider taking the necessary steps to target the new assemblies.  Have a look at these 2 articles which might help you get onto the most current path.

At this point, you kind of know how to configure an Azure Function so that it can batch process messages.  Simply, make the parameter in the Run() an array instead of a single string, for example.  Now, let’s take a focus on cancellation tokens, and in addition the following.

  • Why use Cancellation Tokens
  • What does a Cancellation Token have to do with Batching
  • How I would handle / implement cancellation tokens into my  Service Bus
    Azure Function

Why use Cancellation Tokens

First thing, make sure your mindset is comfortably targeting the concept of the serverless compute model.  Take a second to internalize which this means and what differentiates serverless from PaaS or IaaS.  It means that the compute (CPU/Memory) your code is running on, changes from server/VM to server/VM whenever the logic that manages the compute resource decides to.  Your code might need more CPU one minute and a few moments later it might need less.  What it means is that the architecture/platform your application, aka your code is living and running on, can be very dynamic.  Your code needs be written to live in such a dynamic environment.

Imagine that you get a burst of messages, in most scenarios, that means you need more compute, which you, of course, get allocated.  At some point, you no longer need that compute, which requires the shutdown of your application on that VM.  The question is, when should that shutdown happen?  How could the platform know that your code is in a state which is ready to be shutdown?  It would mean that somehow the platform (aka Azure Functions) needs to know that all the messages you received in the batch have been processed and your code is not doing anything.  It would be cool to write some code that can mutate and infiltrate another process to see what is going on inside of it.  The code could check to see if there is code running and if not, notify the platform that it is ok to shutdown.  However, there are lots of issues and complexities with that and its feasibility is questionable.  Instead, it would make much more sense to generate a cancellation token and send it to your Run() method, to inform you, that the VM is going to be deallocated, or shutdown for some reason.  You can see in Figure 1 and Figure 2 that the System.Threading.CancellationToken has been added to the parameter list of the Run() method.

In summary, a cancellation token is helpful to manage the shutdown of your code on a server/VM when the server/VM are being shutdown.  It gives you, the programmer, a chance to make sure nothing is running which could corrupt data or cause unexpected happenings.

What does a Cancellation Token have to do with Batching

One of the challenging aspects of IT and working in computing is numerous names for very similar scenarios but are different per context.  Such term is cardinality, which is tightly bound into the concept of batching (in this context).  I am sure there are subtle differences, technical differences between cardinally and batching but, in my mind, right now, in this context they are the same.   I wrote an article here that discussed some about cardinality and it’s setting of ‘many’ or ‘one’.  When set to ‘many’, the Run() method needs to be configured to accept an array, for example a Message[], EventData[] or a string[], where you can then loop through and process the batched messages.  In contrast, when set to ‘one’, you would expect a single Message, EventData or a string, of which contain only a single message.

So why the link between cancellation tokens and a cardinality of ‘many’?

Think about it this way, firstly, you, the programmer must implement code to handle the cancellation request.  This can commonly be implemented with this kind of pattern.

if (cancellationToken.IsCancellationRequested)
{
   //Take some precautionary actions
  break;
}
else
{
   //Process as usual
}

If you have batched together, for example, 1000 messages and are looping through them, in a single function invocation, then it does make sense to check for existence of a cancellation token before each loop.  If the platform has not instigated a shutdown, then your messages just get processed as normal.  Otherwise, if the there is indeed a cancellation token then you need to take some precautionary actions to make sure none of the data is lost and that you perhaps save your place, log some details to document what happened.  Keep in mind that when a server/VM is about to shutdown, there is a draining process, or, a lag in the timeframe in which the notification of shutdown comes and the actual shutdown.  This is intentional and gives you some time to take necessary actions to minimize disruption to your business processes.  Consider the fact that as the amount of time your code takes to process a single message the risk of being impacted by a shutdown increases.

In a high majority of cases, > or = the SLA, you will be successful, but yes, when you are processing 100,000 messages per second and only achieve 99.99% success, at the end of the day, that is a large number of failures.

The point is, regarding batching/cardinality and a cancellation token, is that is makes most sense to implement it when running in batches because you are looping through and processing each message in a single invocation.  Therefore it makes sense to check before each processing of a message that the server/VM is going to shutdown.  In contrast, where cardinality is one, it doesn’t make so much sense because you are processing a single message per invocation and therefore there is not necessity to check for a cancellation token before processing, since the likelihood of there being one at that precise moment is low.  Note that there is a mechanism which will stop the flow of messages to a server/VM before the server/VM is going to be shutdown, this works ok when cardinally is one, but maybe not so well when cardinally is many.  It depends greatly on the time it takes your code to process a message and also on the requirements of your application.

I am not saying you cannot add a cancellation token when cardinality is one…, if your solution requires such precision and perfection, then add it, it might make you sleep better…  This is why I brought in batching into this discussion of the cancellation token, simply because it fits best in the context of cardinality of many, when compared to processing single messages.  Let’s look at how I would implement this into an Azure Function.  You could conceivably implement this pattern into any code which processes batches of data, assuming the platform on which you run delegates this to you

How I would handle / implement cancellation tokens into my  Service Bus Azure Function

Remember this, my conclusion is that, unless you have a business need or requirement to check for a cancellation token when processing a single message, I am leaning towards only implementing this into a Function which is processing batches of messages.  I mentioned this already, but did it again, let me know if you think differently, I check my messages on LinkedIn

Your method will receive a cancellation token if the host (aka server/VM) on which it is running is about to shutdown.  When this notification comes, you will have a short period of time (10 seconds) to perform some kind of precautionary action.  Once those precautionary actions are completed, you break out of the code path and stop.  When your code is placed onto another host (aka server/VM), you can access the ‘notes’, for example, you wrote during your precautionary action, take the required resume actions and carry on, almost like nothing happened.  If you are only processing a single message, like I already wrote, then you would probably have enough time to complete the invocation before the shutdown happens, because of the the pause between shutdown notification and the actual shutdown.  Here is a snippet of the code, I have placed it on my GitHub here.
public static class servicebustopic
{
[Function("servicebustopic")]
public static void Run([ServiceBusTrigger("batch", Connection = "SB_CONN")]
    Message[] messages, CancellationToken cancellationToken, ILogger log)
{
  if (messages.Length > 0)
  {
    foreach (var message in messages)
    {
      if (cancellationToken.IsCancellationRequested)
    {
      log.LogInformation("A cancellation token was received.");
      log.LogInformation("Taking precautionary actions.");
      Thread.Sleep(2000); //time lag for taking the actions
      log.LogInformation("Precautionary activities --complete--.");
      break;
    }
    else
    {
      Thread.Sleep(1000); //time lag for actually processing the message
      log.LogInformation($"Message: {message} was processed.");
    }
  }
  }
  else
  {
    log.LogInformation($"The function was invoked, but there were 0 messages. (???)");
  }
}
}

You see that I am looping through all the messages using a foreach loop, within that loop, the first thing I do is check to see if I have received a cancellation token, if yes, I take some actions, if not, I process the message as per business requirements.  You might be asking, what kind of precautionary actions should I take?  Well, that’s a hard question to answer because it depends on lots of factors.  You need to know what your code is doing, what data it is inserting, updating or deleting and think about unwanted scenarios that can happen if your code does not get a chance to finish.  You do need to stop your code execution, I.e. the Function Invocation, once you get the notification because the last thing you want is to get shutdown in the middle of the execution of a business critical code path.

That’s it.  Two things in closing.  First, Azure Functions, Microsoft’s serverless product offering is great, have no doubts about that.  Second, before you choose the serverless hosting model, make sure you know its use case.  Compare serverless with Azure App Services (PaaS) and an Azure Virtual Machine (IaaS) and choose the right model for your code, Azure Functions are very cost effective, but it exists for a specific use case, which is different than PaaS and IaaS.  Human resources (aka Support) cost too, so what you save by choosing serverless, might be lost by trying to run code which is better targeted to PaaS.  Once you are sure that your code is best suited for serverless, you then need to decide between Consumption, Elastic or Dedicated plans.  Consciously make your decisions about this instead of just hoping everything works out.