The MSBTS_ServiceInstanceSuspendedEvent WMI event is fired by the BizTalk Engine whenever a "message" is suspended. This happens (amongst other reasons) when all send retries fail (for example when a connected system is down) or whenever exceptions occur during the pipeline execution (for example when message validation fails).
For this little article I'd like to focus on the first category: outbound suspended messages caused by systems being temporarily unavailable. Usually admins like to receive notifications whenever one of their systems goes off-line. You can set up a NT service that consumes BizTalk MSBTS_ServiceInstanceSuspendedEvents and notifies the admins for example by e-mail (to the Biztalk admin and/or appropriate system admin). They can then fix the problem and bring their failed system back up.
Now the problem is that you still have to deal with those suspended messages that sit in the message box as a result of the failure. They simply have to be resubmitted and this can be done using the HAT tool. This can be a quite repetitive task that's very prone to errors (try resending the wrong message). In many cases an automatic solution would make a better one. Let's see what can be done:
One could react by upping the retry count to its maximum value? Ok, this would automate things a lot but has 1 very big disadvantage: you lose your notifications because the message will never suspend :-( Who will bring the system back online? You NEED these notifications so this isn't a viable solution: you have to lower the retry count back to an acceptable level. Is there another solution? Yes there is: add your own custom WMI events to BizTalk Server!
What if we could add extra custom events that create the functionality to receive notifications upon the retry itself - without requiring a message to be suspended? This sounds very complicated but is in fact a very easy task when you use the Microsoft Enterprise instrumentation Framework to create a custom event class.
For those that never heard of it, this is what EIF
can do for you! From the EIF README:
'The Microsoft Enterprise Instrumentation Framework (EIF) enables you to instrument .NET applications to provide better manageability in a production environment. EIF is the recommended approach for instrumenting .NET applications. It provides a unified API for instrumentation that uses the existing eventing, logging, and tracing mechanisms built into the Microsoft Windows® operating system, such as Windows Event Log, Windows Trace Log, and Windows Management Instrumentation (WMI). Members of an operations team can use existing monitoring tools to diagnose application health, faults, and other conditions.
An application instrumented with EIF can provide extensive information such as errors, warnings, audits, diagnostic events, and business-specific events.'
The EIF also forms the basis/prerequisite for the Microsoft .NET Logging Application Block. This block represents the 'new' way to do logging /exception management - replacing the older EMAB (exception management application block).
Let us - by means of example - add the following events to our BizTalk Server 'event source':
- TransportLostEvent: a system goes down for the 1st time = First failure
- TransportRecoveredEvent: a system comes back up again = A retry succeeds
I have used very basic sample classes for these events. Feel free to e-mail me to get the sample - I haven't got uploads working yet :-(
The context of the message will provide us with our basic building blocks and determine when these events have to be triggered. The 'http://schemas.microsoft.com/BizTalk/2003/system-properties' namespace contains 3 unpromoted properties named ActualRetryCount, RetryCount and RetryInterval.
- RetryInterval is a static value representing the value set on the Send Port.
- RetryCount - on the other hand is - very confusing - not a static value but represents the number of retries still available - and is lowered by 1, by the Engine for each failed transmission attempt.
- ActualRetryCount, which BTW returns no results when you look for it in the help, is also a dynamic value that is incremented by 1 for each transmission attempt.
So how can we make this all fit together, wherefrom can we trigger our events? There are 2 scenarios possible: a custom adapter and a custom pipeline component (for integrating with existing adapters).
Here's a sample piece for a custom adapter:
'If message transmission succeeded
oProperty = msg.OriginalMessage.Context.Read("ActualRetryCount","http://schemas.microsoft.com/BizTalk/2003/system-properties
If Not oProperty Is Nothing Then
'A previously failed send operation has now succeeded actualretrycount>0 and msg.TransmitSuccesfull is true)
If System.Convert.ToInt32(oProperty) > 0 Then
myTransportRecovered = New TransportRecoveredEvent
obj = msg.OriginalMessage.Context.Read(InterchangeIDProperty.Name.Name, InterchangeIDProperty.Name.Namespace)
If Not IsNothing(obj) Then myTransportRecovered.InterchangeId = System.Convert.ToString(obj)
'Continue to build and finally raise the event
'If message transmission failed
'If actualretrycount is 0 then this means that system failed for the 1st time: raise TransportDownEvent
oProperty = msg.OriginalMessage.Context.Read("ActualRetryCount", RetryIntervalProperty.Name.Namespace)
If Not oProperty Is Nothing Then
If System.Convert.ToInt32(oProperty) = 0 Then
myTransportLost = New TransportLostEvent
and so on...
A very nice idea would be to develop an "EIF enhanced" version of the adapter base classes that trigger additional events. Anyone interested?
I've also succeeded into making a sample pipeline component that raises the TransportLostEvent from inside a pipeline - creating the opportunity to integrate this functionality with existing adapters. I tested this very rapidly/effectively thanks to the pipeline component wizard
by Martijn Hoogendoorn...
The downside for pipelines is that - if I'm correct - we have no notion of the current transmission attempt result from inside (the result) so we cannot raise the TransportRecoveredEvent. What we do have, is the result of the previous attempt and the actualretrycount - creating the possibility to trigger the TransportLostEvent upon the first message retry.
As a final note always remember that WMI events do have a performance impact. So you may want to just send them to the WTE (trace events) sink if you don't need them and performance is an issue. The nice thing about the EIF is that this can be done dynamically and doesn't require a restart of the 'Event Source' or changing the pipeline. See the configuration section in the readme of the EIF for more info.
Here are some stats for the curious - Event sink Average events per second:
- Windows Event Log 220
- MSMQ 120
- WMI 520
- SQL Server Basic Log 300
- SQL Server Flexible Log 70
- EMAB with the Windows Event Log 120