Geeks With Blogs

News
CloudCasts Blog Webcasts in the Cloud
Messaging-Database Aggregator Pattern

 

It’s been a while since I blogged an article, and a long while since I blogged about the “Sequential Convoy Aggregator” pattern. Back then I was fairly new to BizTalk, I was aware of persistence points, and orchestration state, but did not have a really clear understanding of how badly they could affect an orchestration design.

 

Since then I have been thinking about this more, and wondering how well my aggregator implementation would stand up in a high load scenario. After running a few tests, I discovered the design pattern does not scale well to aggregate large batches of messages. The problem then was to come up with a design that would…

 

Pattern Implementation Using an Orchestration

The sequential convoy aggregator pattern is implemented by building a sequential convoy orchestration with two receive shapes correlated to receive all the messages that are to be aggregated. The orchestration maintains the state of the aggregation, by creating a .NET XmlDocument object, and adding each message as a node to this document. When the aggregation is complete, the XmlDocument object is assigned to a BizTalk message, which is then sent from the orchestration to a subscribing send port.

 

Sequential Convoy Aggregator Performance

The performance tests are using the aggregator pattern I created and blogged about last August. The orchestration was modified slightly to use a message count for the completeness criteria so it could be tested with different numbers of messages. The tests were then run, starting with 100 messages then increasing the size of the messages being aggregated.

 

After each test, the “Backup BizTalk Server” SQL Server Agent job was manually started to create log backups for the processing of the message aggregation. The results of the test are shown below.

 

Msg Count

Log Size /KB

KB/Msg

100

7,038

70.38

200

16,446

82.23

300

29,999

100.00

400

47,703

119.26

600

96,976

161.63

800

171,584

214.48

1,000

262,825

262.83

 

 

Msg Count

The number or messages used in the test.

Log Size /KB

The size of the message box database file produced by running the Backup BizTalk Server SQL Agent job after the test.

KB/Msg

The size of the log file divided by the number of messages.

 

From the results of the tests, it can be seen that the size of the message box database log increases dramatically as the number of messages being aggregated increases. As the log file is a log of the database activity on the message box database, this implies that the aggregator implementation does not scale well as the number of messages increases.

 

If there were thousands of messages to aggregate over a 24-hour period for example, the size of the message box database log files would become a significant problem.

 

The Problem or Persistence

In order to provide reliable messaging, the BizTalk Server orchestration engine saves its state to the message box database at specific points during it’s execution. This serialization of the message state is known as persistence, and each saving of the orchestration state is known as a persistence point. Whilst these orchestration persistence is great from the reliability perspective, it can have some detrimental effects in terms of performance if orchestrations are designed without paying a thought to the mechanics of the orchestration engine.

 

In the aggregator pattern, as the orchestration starts to aggregate messages, the size of the XmlDocument object that aggregated the messages is small. At each persistence point, this XmlDocument object is serialized to the message box database to allow recovery form a possible failure. As the orchestration aggregates more and more messages, the size of the XmlDocument object increases.

 

This object is persisted to the database (three times for each message aggregation loop in this case), every time a message is received, so when the orchestration is holding the stare of a few hundred messages, the load on the message box database starts to get heavy. If the patterns is used to aggregate thousands of messages, and the messages themselves are large, the pattern will quickly become unusable.

 

Building a Scalable Aggregator

The problem affecting the scalability of the aggregator was caused by maintaining the state of the aggregated message as the orchestration looped to receive the other messages in the convoy. If a way can be found to escape the state persistence problem, it should be possible to design a scaleable aggregator pattern.

 

Aggregator Design Using Messaging and a Database

One of the possible designs would be to abandon the convoy orchestration, and create a solution that was based purely on messaging. At present, in BizTalk Server 2004, there is no way to aggregate messages in the pipeline of a send port in a similar manor to splitting in a receive pipeline. This pattern uses a SQL Server database to store the messages as they are aggregated, and uses the SQL Server adapter to add the orders to the database, and to receive the aggregated list when the aggregation is complete. The completeness criteria is determined in the database, and in this case, a message count is used.

 

Messaging-Database Aggregator Pattern

 

 

In File Drop

The order messages to be aggregated are dropped here.

Order In Receive Port

The receive port receives the order messages and publishes them in the message box database.

Add Order SQL Send Port

(Map Order to SQL)

Port with subscription to the Order In Receive Port, and  map to map the order messages to the Add Order stored procedure. The port is configured to call the stored procedure using the SQL adapter.

Add Order SP

Stored procedure to add the messages to a table in the Order Aggregation Db, also checks for the completeness criteria, which is based on the number of messages in the batch to be aggregated. When this number is reached, the status of the messages is updated to mark them for aggregation.

Order Aggregation Db

Database containing the table for aggregated orders.

Get Order Aggregation SP

Stored procedure to poll the database for batches ready to be aggregated. When a batch is ready, the data is returned, and the status flag for the orders updated.

Get Order Aggregation

SQL Receive Port

Receive port configured to poll the Get Order Aggregation SP to check for aggregation batches.

Aggregated Order Send Port

(Map SQL to Order List)

Send port with a subscription to Get Order Aggregation

SQL Receive Port. This port also maps the SQL result set schema to the order list schema.

Aggregated Order File Out

File location for the aggregated messages.

 

Test Results

The tests were repeated using the messaging-database aggregator pattern, and the following results obtained. The Messaging log size is a sum of the message box database log file, and the order aggregation log file.

 

Msg Count

Convoy Orch

Log Size /KB

Convoy Orch KB/Msg

Messaging Log Size /KB

Messaging KB/Msg

% Log Size Reduction

100

7,038

70.38

3,064.00

30.64

56.46%

200

16,446

82.23

5,984.00

29.92

63.61%

300

29,999

100.00

8,676.00

28.92

71.08%

400

47,703

119.26

10,977.00

27.44

76.99%

600

96,976

161.63

16,866.00

28.11

82.61%

800

171,584

214.48

21,755.00

27.19

87.32%

1,000

262,825

262.83

26,593.00

26.59

89.88%

 

 

 

Msg Count

The number or messages used in the test.

Convoy Orch

Log Size /KB

The size of the message box database log file produced by the sequential convoy aggregator orchestration.

Convoy Orch KB/Msg

The size of the log file divided by the number of messages.

Messaging Log Size /KB

The size of the message box database log file, and the order aggregation database log file produced by the messaging-database orchestration.

Messaging KB/Msg

The size of the log file divided by the number of messages.

% Log Size Reduction

The percentage reduction in log file size gained by using the messaging-database pattern.

 

 

 

KB Log File Size for Aggregation / Message Count

The effect of the batch size on the sequential convoy orchestration can clearly be seen in the above graph. As the batch size increases, the messaging-database aggregator offers significantly better performance.

 

 

KB Log File Size per Message / Message Count

As the batch size increases, the messaging-database aggregator becomes slightly more efficient, producing less log file activity per message. With the orchestration based aggregator, continues persistence of the increasing aggregation state makes the pattern less efficient.

 

Why is the Message Box Database Log File Size so Important?

The log files created when performing a transaction log backup on the message box database are an indication of the load that has been placed on the database since the last transaction log backup. As a BizTalk server installation is scaled up, and out, the load on the message box database is often a bottleneck. Adding another SQL cluster to handle increased message box activity is costly. It’s much better to design a solution that uses the message box in an efficient way if there it is anticipated there will be a high load on the system.

 

Conclusions

From the results it can be seen that the messaging-database aggregator provides a 90% reduction in message box database load compared with the sequential convoy aggregator in this scenario. If the message count were to increase, this design could perform 20 or ever 100 times better than the orchestration based design, in a production system, this could result in significant cost reductions. Your database administrator would also appreciate the difference in log file sizes used for the backups :-p.

 

I tried to keep the tests as scientific as possible, and had planned to include execution times in the results as well. Unfortunately, Virtual PC is not a great platform for performance testing, and does not seem to give the speed that running on bare metal would do, so I have not included those figures. The log file sizes should provide an accurate indication of performance, (and I did remember to add the log file sizes for the order aggregation database as well ;-).

 

I think the option of using a messaging-database aggregator should be considered when building an aggregator in BizTalk. Apart from the performance advantages, it also provides the option to recover better from failure, as the messages are present in a database, and the aggregation can be triggered manually.

 

There’s a few improvements that could be made to the design, I’m not happy with creating a database table that contains the fields in the schema, with a hierarchical schema, this would involve multiple tables, and references. It would be much better to save the XML content of the message in one column.

 

I’ve herd a few people discussing the option of using a pure messaging based solution to improve performance on high load systems, and it would be great to see more examples of this.

 

To sum it up…

 

  • Sequential Convoy Aggregators are fine for handling aggregations with a low number of messages.
  • Sequential Convoy Aggregators do not scale well with higher message traffic if the aggregation state is held within the orchestration.
  • Building a messaging-database aggregator provides a design that scales well with large numbers of messages.
  • It is also possible to retain the sequential convoy orchestration design, and use a database to store the aggregation state if the need arises.

 

 

 

 

Posted on Monday, June 6, 2005 5:34 PM | Back to top


Comments on this post: Messaging-Database Aggregator Pattern

# re: Messaging-Database Aggregator Pattern
Requesting Gravatar...
Alan - this is precisely the pattern we used in our project to batch messages (batch size of 8k), and it worked very well. One small thing worth mentioning is that in some cases a 'timeout' completeness flag is required, so that if you're batching 1000 records, and have only 999 in the database, they don't get stuck there forever waiting for the final one.
Thanks, yet again, for doing the hard work and adding some concrete numbers to the pattern.
Left by Hugo Rodger-Brown on Jun 07, 2005 10:23 AM

# re: Messaging-Database Aggregator Pattern
Requesting Gravatar...
I think it should be possible to provide a messaging only system using a tempory file based store.

I think you would need to use a flat file schema, and append messages onto this flat file.

It would then be down to setting up biztalk to receive the flat file at given intervals, and broker this information on to another destination as a batch.
Left by Campbell McNeill on Jun 09, 2005 9:27 AM

# re: Messaging-Database Aggregator Pattern
Requesting Gravatar...
Hi Campbell,

It would be worth testing this option, using a file and appending. I've never used that option before, and it may be a problem with if you have 2 or more servers updating the file at the same time (not sure what whould happen here).

There also needs to be a completeness condition, e.g. how do we determine when to send the batch. You could use a rcv port with a service window for this, but you may also get file lock problems.

Definatly worth taking a look at though...
Left by Alan on Jun 09, 2005 12:42 PM

# re: Messaging-Database Aggregator Pattern
Requesting Gravatar...
From the above two posts if you are referring to a file send port and setting the option of Copy Mode to Append. Then, this works for cases when the final message is small (maybe <= 10 MB). If the resultant file is larger than this, then each append onto the larger file slows down, until utlimately the CPU on the server is at 100%. I would not recommend this method because of this and from the problems that Alan noted above.
Left by Matt Meleski on Jun 17, 2005 2:20 PM

# re: Messaging-Database Aggregator Pattern
Requesting Gravatar...
none of the pictures appear on this page or is it just my browser
Left by Benjy on Jun 28, 2005 9:17 PM

# re: Messaging-Database Aggregator Pattern
Requesting Gravatar...
I am not able to see any of the graphics either!!!

Thanks,
Cosmas
Left by Cosmas on Jun 28, 2005 11:21 PM

# re: Messaging-Database Aggregator Pattern
Requesting Gravatar...
Can't see the graphics even in the bloggers guide? any other way?
thanx
Left by Ghost on Jul 29, 2005 2:23 AM

Your comment:
 (will show your gravatar)


Copyright © Alan Smith | Powered by: GeeksWithBlogs.net