Geeks With Blogs

News View Michael Stephenson's profile on BizTalk Blog Doc View Michael Stephenson's profile on LinkedIn
Michael Stephenson keeping your feet on premise while your heads in the cloud

Article Source: http://geekswithblogs.net/michaelstephenson

I have often come across situations where I have been asked to look at a process (usually in BizTalk) where it isn’t quite running as the customer would like. I have decided to start a series of posts which I will call refactoring tales. These posts will discuss a the process implementation and the problems encountered along with it. I will then discuss the approach taken to improve things and what the benefits were.

Background
This particular process had been implemented as a sort of splitter pattern. Each day a file would be received which was then to be split and loaded into two different systems. The split was made by checking each record against a third system which would indicate which system the record should be loaded into.
This example is relatively common and the implementation which was developed was as follows:
1.       The file was received as a complex XML file via the FTP adapter which had a service window applied to it A normal file would contain about 5000 records and was approximately 10MB. On exception there have been files as large as 40MB.
2.       A pipeline component was applied in the decode stage which would remove a pre-processing instruction from the file
3.       The file was delivered to the Messagebox
4.       An orchestration would subscribe to the message and start processing
5.       Some validation logic is applied to ensure the message is in sequence. The logic involves using the SQL adapter to check a database to ensure the message is in sequence.
a.       If the message is not then an event is logged and a manual recovery process is initiated and the orchestration suspended
6.       In the orchestration it would iterate through records in the file extracting each record.
7.       In each iteration of the loop the extracted record is checked against an external system to see where to send it by using the oracle adapter
8.       If the message is sent to system A then a web service call is made using the WSE 2 adapter to load the message
9.       If the message is to be sent to system B then an in memory message is built up for all of the messages for system B which will then be sent in batch
10.   When all or the records have been processed then the records for system B are send via FTP to that system. The batch message (about half of the original file) is transformed in the send pipeline to a flat file format.
11.   The batch management database (a custom database which is very simple and to support the sequence number validation) is updated to mark this batch complete
12.   Finally an email is sent to the business users to confirm the results of the batch processing (how many went to each system)
The Pain
The following pain was experienced with this process:
Length of time it runs
Each record in the file would take approximately 4-5 seconds to process with a little time before and after the loop. It was common for the file to take in excess of 6 hours to fully process. The result of this was a file received in the morning would often take almost the whole business day to be loaded.
Cannot scale
One of the key limitations of this design is that because the process is sequential it cannot be scaled out. If the file were to suddenly become 10,000 records you would expect the processing time to double. Adding more BizTalk servers to the group would have no impact on this duration.
Does not recover well from errors
During the first use of this service there were some network connectivity issues and this ment that sometimes the 3 retries on the web service port would all fail. This meant that the exception was caught and the process stopped until someone looked at the problem. When resolved the orchestration was resumed it would continue. But as you can imagine there are two significant impacts on the amount of time the overall process would take.
The first was the response time from a manual operator. This could add anything from a few minutes to an hour that the whole process would be stopped.
The second was that each port error would initiate one of the port retries about 5 minutes later. In the example here about 10% of the calls failed (250+ because not all of the records went to system A) and you can imagine the impact of these extra minutes on the overall processing time. One file took 2.5 days to fully load.
Persistence Points & SQL Transaction Log Backups
One of the behind the scenes problems was that because of the fact that the orchestration was holding one large message in memory from the start and building up another through the execution of the process it meant that every persistence point was expensive with the orchestration state being between 10-20 MB for the standard message size.
There were approximately 2 persistence points per record so for a standard file (5000 records) that means 10000 persistence points. If you do the calculations here you can see we are persisting somewhere in the region of 100Gb worth of data to the messagebox during the processing of this 10 MB file just to save the state of the orchestration.
The SQL Transaction log backups for the messagebox were eating drive space like crazy.
 
There were a couple of other smaller things too, but as you can see there is a clear business and operational need to refractor this.
There were a couple of lessons to learn from this situation, particularly around performance testing and the difference in how a process will be perceived to run on a developer machine versus how it will run in a live environment, but I will not go into these within this post.
 
The Refactoring Challenge
Refactoring is different to re-writing. If we were doing this from scratch there would be a number of different things you could do, but one of my favourite things about these kind of situations is there are things you can change and things you can’t. The challenge is to make it work much more effectively while minimizing change/cost. Some of the factors which affected how I could refactor this are as follows:
·         No change to external systems
In this case I could not change any of the external message formats or communication mechanisms. There was no opportunity to change the external systems functionality.
 
·         Short timescale
I only had a short amount of time to POC a new idea for this and then it would be implemented by other developers.
 
·         The business users still want their email
The email feature for the users screams out as a BAM opportunity. However in this refactoring there is no opportunity to change this or retrain users on how to use the BAM portal etc.
·         Limitations on technologies to use
This particular solution could potentially use SSIS to help with the implementation, in this particular case it was preferred to keep it as much in BizTalk as possible to keep the supportability of it within what had already been trained to the support team.
 
The New Implementation
After weighing up all of the factors and components I had to work with I decided that the best way to refactor this process would be as follows:
1.       The file was received as a complex XML file via the FTP adapter which had a service window applied to it 
2.       A pipeline component was applied in the decode stage which would remove a pre-processing instruction from the file
3.       An XmlDisassembler was added next in the pipeline to promote some metadata about the batch from the message header to the message context
4.       A new custom pipeline component was added to the pipeline to rip apart the batch.
This new pipeline component would use the XPathReader to read the message and to extract each record from the message. The database which was part of this solution used to manage batches was extended to have a table to contain a temporary storage for the batch records. The pipeline component would save all of the records to the database and then change the message which would come out of the pipeline component to be a very small trigger message which would be sent to the messagebox and contain the ID of the batch and the number of records it contains.
5.       A new orchestration would subscribe to this trigger message and start a controller orchestration which would manage the overall process
6.       The controller would iterate through a loop from 1 to the number of records in the batch and create a small simple command message which would be sent to the messagebox with an instruction to process a specific record from this batch
7.       The controller orchestration would then sit polling the database until all of the records have been processed
8.       For each of the command messages which had been sent to the messagebox an instance of an orchestration would be initiated which would process that particular record. If there were 5000 records you would get 5000 instances of the Record Processor orchestration
9.       In the Record Processor orchestration the message it receives will tell the orchestration which batch and record number to process so it can get its data from the temporary database. This orchestration will then call the system which helps decide if the record is sent to system A or system B
10.   If the record is sent to system A then the web service is called. The orchestration will then update the temporary database to indicate where the record is intended to be sent to.
11.   When all records have been processed the Controller orchestration will detect this and stop polling the database.
12.   The controller orchestration will extract all of the records to be sent to system B and then pass this to the messagebox so it can be mapped and delivered to system B via a port subscription.
13.   The controller orchestration will query the temporary database to get the values to create the email which advises the business users how the batch was split and send it via the SMTP adapter
14.   Finally the controller orchestration will clean out the temporary storage of the batch and update the sequencing ready for the next batch
Some additional points to note here are:
·         If the receive pipeline gets an error then the batch would stop saving to the temporary storage. If the instance was resumed then the import would begin again from scratch
·         I considered using a normal BizTalk debatching approach but encountered problems once the batch size got above 1000 records (can’t remember the error message now). Based on this I chose to implement a custom debatching solution
 
The Benefits
As result of the above refactoring the benefits were:
·         Because the records were being processed in parallel I have been able to process a 10MB normal size file in around 15 minutes compared to the previous 6 hours
·         Any errors with one record do not cause a backlog of all of the other records
·         The improved performance means all of the records are loaded before the business users start their working day rather than the previous solution where they were almost finished the day before the records are all processed
·         Because the messages being processed in the orchestrations are smaller than before (approximately 2kb) the fact that overall there are more persistence points (about 4 per record now) means that the total data persisted to the messagebox to maintain the state of the orchestrations has reduced from somewhere in the region of 100GB to < 40MB
Conclusion
Hopefully this post will give a real world example of how being pragmatic in a refactoring exercise can help you make significant improvements to a process with minimal cost to the project. This article may be a little unclear in how this has been implemented so if there is some interest in it the way I implemented this is fairly generic so I could potentially do a future post with an example.
 
 
 
 
 
 
Posted on Friday, July 25, 2008 10:36 PM BizTalk | Back to top


Comments on this post: Refactoring Tales: Long running splitter pattern

# re: Refactoring Tales: Long running splitter pattern
Requesting Gravatar...
Rather than keep polling database to wait the operation finished, i wonder if a scatter and gather pattern can be used to wait the process result back (use callback port)?
Would that work or any potential issues?
Left by xin zhang on Mar 20, 2009 12:47 AM

Your comment:
 (will show your gravatar)


Copyright © Michael Stephenson | Powered by: GeeksWithBlogs.net