Geeks With Blogs
Devdutt's Blog My ramblings on just about everything...

This post is an old one from my previous blog.

This week was Load Testing week for our latest set of Interfaces. Incidentally, this was our first set of Interfaces which implemented Uniform Sequential Convoys in their Orchestrations. The implementation was more or less like my previous post. Meaning the convoys had a timeout mechanism, which lead the Orchestration to completion, if no new messages would arrive for a period of time.

So far - so good. But under a realistic amount of load, we observed that almost 1 out of every 5-6 messages would fail to go through to the destination. The HAT would show the following:

Completed with discarded messages

Of course the 1/5 is a very subjective number, depending upon the actual convoy implementation in question. After some snooping around (mostly at Lee Graber’s blog), I figured out the source of the problem. But no amount of the aforementioned snooping yielded any easy and/or elegant solution to this little pickle.

What happens is that the control might pass to the delay branch of the Listen shape where we implement some logic to escape the loop. During this time interval, if a message arrives at the MessageBox, Biztalk sees our Orchestration as “Running” and since our Orchestration is a match for the Message’s subscription, Biztalk assigns this orchestration to take care of this message. Now our orchestration is blissfully unaware that it has a message or messages to take care of, and runs to completion. Hence, these message(s) are left in the message box with no one to process them or in other words in a Zombie state.

In my opinion, the first thing to do in this situation is (and I am not being facetious here) to verify that your correlation set is actually exact enough for your business scenario. For e.g. if we are transmitting patient medical records, we need to correlate only the same patients and not ALL patients. In other words don’t correlate JUST on MessageType if you can correlate on MessageType AND PatientID.

Secondly set your delay interval to an appropriate value. Setting it too low creates more such race timespans, per unit time. This will reduce your Zombie chances, but won’t give you a 100% success guarantee.

Thirdly have a proper plan to deal with these Zombies according to your business scenario. Retransmit? Discard? What?

Finally ponder about the “Receive Draining” pattern mentioned in Lee Graber’s blog, and if you have a nice elegant way of doing it, tell me about it FIRST!

Update: For retransmitting Zombies, you might want to automate the retrieval of Zombie messages. “The ZombieWMI console application can help solve zombie related problems. This application detects zombie orchestrations and writes the zombie messages¬† a File Location.“¬† You can get an application that does just that here.

Posted on Friday, August 5, 2005 12:33 PM BizTalk Server 2004 | Back to top

Comments on this post: On Zombies and Convoys

No comments posted yet.
Your comment:
 (will show your gravatar)

Copyright © Devdutt Patnaik | Powered by: