XML Transformation: Xslt vs .NET. Part 3.

Part 1.
Part 2.
Part 3.

Max Toro commented on the Part 1 and his suggestions changed a lot.

So, I have added this Part. It shows, how a small optimisation could change the test results.

Actually, I have added just two lines of code, that’s it.

Updated Test Results


Updated Conclusions:

  • Xslt transformation is on a par with Object transformation on broad spectrum of the message sizes.
  • If you use Json instead of Xml, Json is almost always faster.
  • The speed differences between three mentioned technologies are not significant, so choose on base of the development skills and speed. The development skills and knowledge beats the technology.

XML Transformation: Xslt vs .NET. Part 2

Part 1.
Part 2.
Part 3.

In the first part, I have compared the Xslt transformations of the XML documents with the Object transformations. It happens the Object transformations are faster in most cases. They are also simpler from the developer point of view.

The next natural question is “If we use Json instead of XML, how faster the transformations would be?

If we got a freedom to choose Object transformations instead of Xslt transformations, it might be, we got even more freedom and choose Json or other contemporary serializing formats instead of XML. If this is a case, we should know possible improvements, right?

Changes in the Test Project

The test project code is on the GitHub.

I have added NetSerializer Json serializer. You can get it as a NuGet “NetSerializer” package. It is one of the fastest Json serializer in .NET. The Object transformations now executed in two modes: with XML serialization and with Json serialization.

Test Results


Practical Conclusions:

It is worth to use Json instead of XML in Object transformations, if we have a choice

  • Json gives us a speed improvement:
    • for small documents of the several KByte size the improvement could be >100%
    • for documents of the several MByte size the improvement could be >30%
  • The code for object transformation itself is the same, so there is no additional development effort to use Json.

XML Transformation: Xslt vs .NET. Part 1.

Part 1.
Part 2.
Part 3.

The Xml transformation is an important part of system integration. The Xml documents are everywhere despite surging JSON.

When we need to transform [or map] one Xml document to another Xml, we have several options. Two of them prevail. The first is the Xslt language. The second is the object transformation.

Xslt Transformations

The Xslt language was created exactly for this purpose, to transform one Xml document to another Xml document. I am copying the Abstract of the Xslt standard here: 
“This specification defines the syntax and semantics of XSLT, which is a language for transforming XML documents into other XML documents...”

In reality to make the Xslt map we possibly need the XML Schema for source and target Xml documents. The XML Schemas are not mandatory but many Xslt editors use them to create Xslt maps. The BizTalk Server Mapper is such example.

The Xslt map is defined as the Xml document itself. The Xslt operators and expressions defined with help of the XPath, another Xml related standard.

Object Transformations

An Object transformation is a transformation, when Xml is converted to the object graph of any programming language like Java, or C#. Then objects mapped to other objects, which converted back to Xml. Those two conversions could be performed by XmlSerializer which is part of .NET. The mapping written and executed as the generic C# code.

The Object transformation is not an official term. Usually in development you see terms “mapping”, “transformation”, “converting”.


Theoretically, the more specialized tools should always beat the less specialized. The Xslt designed for exactly this purpose, so the question is how better is it? Transformation speed is the most important feature, so I tested the speed.

Test Project

The test project code is on the GitHub.

Tests have two parameters: the number of repetitions and the number of the nested objects.

The repetitions should stabilize the test measurements, make them statistically more correct. Number of the nested objects implicitly defines the size of the transformed Xml document.

Test Data

The test Xml document is created by XmlSerializer from the Person class.


There are two transformers:

  • XsltTransformer
  • NetTransformer

The XsltTransformer uses the PersonToEmployee.xsl and PersonToPerson.xsl Xslt stylesheets. If you want to use the third-party mappers to create or change the stylesheets, I have generated Xml schemas for you; they are in the XmlSchema.xsd file.

NetTransformer uses the same XmlSerializer. The transformation code is simple and boring, nothing to say about it.

The transformers do not try to produce the same transformations. Small differences in transformations don’t matter for our case.

Transformation and Enrichment

I have chosen two transformation types:

  • Enrichment, when the source and target Xml documents have the same schemas. It used to change content of the documents without changing the document structure.
  • Transformation, when the source and target Xml documents have different schemas. It creates a new target document using data of the source document. Target document has a schema, different from the source document schema.

For enrichment we operate with the same document structure, so theoretically enrichment should be simpler and faster.

How We Test

The test document created and tested for Xslt and Object transformations, and for both Enrichment and Transformation types. It is one test cycle.

A new test document created for each test cycle. This eliminates the possibility of the optimizations, which could be performed on the OS, the memory management level. The data for the test document created randomly by Randomizer class.

The test classes are initialized on the first cycle, so it takes much more time to perform the first test. I measure the maximum time, because it is important for the real life situation, when we need only one transformation.

To measure the average time the 5% of maximum and 5% of minimum values are removed from calculation.

I also measure the size of the transformed Xml document. It shows the importance of the spaces and new line symbols for the document size.

Note about Result Xml document

The sizes of the result Xml documents for Xslt and Object transformations should not be very different. Yes, you read me right. The result Xml documents should not be the same in each symbol for both transformations, and still documents can be recognized as equal. It is because of the ambiguity of the Xml standard. For example, the namespace prefixes could be different for the same namespace. In one result we can get “ns0:” prefix and the “abc12:” prefix in another, but both resolve the same namespace. As result the Xml documents got different size, but both are equal in terms of data values and structure. As result of this, we could not compare the Xml documents as the strings. We could converted Xml documents to the objects graph and compared the result object set. If all objects are equal, Xml documents are equal. I decided do not compare results because it is not the test goal. I just output the target Xml documents, so they could be easily compared, if needed.

Test Results


The result is surprising. 
The Object transformation won in both Transformation and Enrichment tests.

For this test the size of the Xml document was about 10K. I have change the size of the tested documents. When size grows, the difference between Xslt and Object transformations start to decrease. The average times were almost equal for the documents with 1M size.



It worth to mention the unstable result times. I write all measured times into Trace output.


As you can see, the measured times are stable but sometimes they grow significantly. Possibly, it is result of the garbage collection. Please, take this into account.

Practical Conclusions:

  • Xslt transformation is not faster than Object transformation, at last for the Xml documents smaller than 1M.
  • Object transformation significantly faster for Xml documents smaller than 100K.

Results are unexpected. Aren’t we the MythBusters :) ?

Is it worth to use Xslt, if we have a choice?

To use Xslt we have to study a new language, the Xslt. We have to study XPath, XML and maybe XML Schema standards. All this require time and effort. There are the special skills for testing and debugging Xslt code.

The Object transformation created in pure, generic programming languages. There is almost zero new knowledge we need to do it.

What about time spent on the creating transformations? There is not too much good Xslt mappers on market. The BizTalk Server Mapper one of the best. It is very performant in simple cases. But if there is a little bit more complexity, the mapper is not booster anymore but stopper. There are several books with tips and tricks how to do things with this Mapper. Are there such books for the Object mapping? Not sure they are, because the Object mapping is just generic programming, ABSOLUTELY NOTHING SPECIAL.

So, the conclusion is: use Xslt only for very, very, very good reason. Is it not faster at execution, It might be faster but without significant margin. It is not faster at development in most cases, especially if we claim all the time we spent on studying. Any programmer could develop Object transformations and only skilled Xslt developer could develop Xslt transformations.


Feel free to upgrade source code on GitHub. Please, report any issues and propose any ideas here.


The code is not optimized hence do not trust results! Arghhh...

[Update 2015-03-29]

I've added the tests for very small and very big XML documents. Results fit the above conclusions.

For tests for:    Document Size:
0 records                  0.5K
100,000 records  ~30M


Serializers in .NET

The project code is on GitHub.

Any distributed system requires serializing to transfer data through the wires. The serializers used to be hidden in adapters and proxies, where developers did not deal with the serialization process explicitly. The WCF serialization is an example, when all we need to know is where to place the [Serializable] attributes. Contemporary tendencies bring serializers to the surface. In Windows .NET development, it might have started when James Newton-King created the Json.Net serializer and even Microsoft officially declared it the recommended serializer for .NET.

There are many kinds of serializers; they produce very compact data very fast. There are serializers for messaging, for data stores, for marshaling objects. 

What is the best serializer in .NET?

No, no, no, this project is not about the best serializer. Here I gather the code which shows in several lines of code, how to use different .NET serializers. Just copy-past code in your project. That is the goal. I want to use serializer in the simplest way but it good to know if this way would really hit your code performance. That is why I added some measurements, as the byproduct.

Please, do not take these measurements too seriously. I have some numbers, but this project is not the right place to get decisions about serializer performance. I did not spent time to get the best results. If you have the expertise, please, feel free to modify code to get numbers that are more reliable.

Note: I have not tested the serializers that require IDL for serialization: Thrift, new Microsoft Bond, Cap'n Proto, FlatBuffers, Simple Binary Encoding. Those sophisticated beasts are not easy in work, they needed for something more special than straightforward serialization for messaging. These serializers are on my Todo list. ProtoBuf for .NET implementation was upgraded to use attributes instead of IDL, kudos to Marc Gravell


Most of serializers installed with NuGet package. Look to the “packages.config” file to get a name of the package. I have included comments in the code about it.


The test data created by Randomizer. It fills in fields of the Person object with randomly generated data. This object used for one test cycle with all serializers, then it is regenerated for the next cycle.

If you want to test serializers for different object size or for different primitive types, change the Person object.

The measured time is for the combined serialization and deserialization operations of the same object. When serializer called the first time, it runs the longest time. This longest time span is also important and it is measured. It is the Max time. If we need only single serialization/deserialization, this is the most significant value for us. If we repeat serialization / deserialization many times, the most significant values for us are Average time and Min time.

For the Average time I calculated three values: 

  • For the Average 100% all measured times are used.  
  • For the Average 90% the 5% slowest and 5 % fastest results ignored. 
  • For the Average 80% the 10% slowest and 10 % fastest results ignored. 
If we see significant difference between 80% and 90% average times, probably we need to increase the number of tests to get more stable and correct results. 

I also provide two result sets for different test repetitions, so we can make sure the tests show stable results.

Some serializers serialize to strings, others – just to the byte array. I used base64 format to convert byte arrays to strings. I know, it is not fair, because we mostly use a byte array after serialization, not a string. UTF-8 also could be more compact format.

Test Results

Again, do not take test results too seriously. I have some numbers, but this project is not the right place to get conclusions about serializer performance. You'd rather take this code and run it on your specific data in your specific workflows.

The test results below are for the 100 and 200 repetitions.

The winner is… not the ProtoBuf but NetSerializer by Tomi Valkeinen. Jil and MsgPack also show good speed and compacted strings.

  • The classic Json.Net serializer used in two Json.Net (Helper) and Json.Net (Stream) tests. Tests show the difference between using the streams and the serializer helper classes. Helper classes could seriously decrease performance. Therefore, streams are the good way to keep up to the fast speed without too much hassle.
  • The first call to serializer initializes the serializer that is why it might take thousand times faster to the next calls.
  • For Microsoft Avro I did not find a fast serializable method but its serialized string size is good. It has some bug preventing it from passing serialized type to the class (see the comments in code). I am really frustrated by Avro, it cannot run fast in my extremely simple code. It cannot fit in my simple serializing interface. I would appreciate the Avro experts, to optimize my code on GitHub.
  • Json and Binary formats bring not too much difference in the serialized string size.
  • Many serializers do not work well with Json DateTime format out-of-box. Only NetSerializer and Json.Net take care of DateTime format.
  • The core .NET serializers from Microsoft: XmlSerializer, BinarySerializer, DataContractSerializer, NetDataContractSerializer are not bad. They show good speed but they not so good for the serialized string size. The JavaScriptSerializer produces compact strings but not fast. The DataContractJsonSerializer is more compact than DataContractSerializer.
  • The NetDataContractSerializer, BinarySerializer, and Json.Net show the smallest Max times. That means they are optimal choice for cases, when we need only single serialization / deserialization cycle.
  • Test prints the test results on the console. It also traces the errors, the serialized strings, and the individual test times, which can be seen in DebugView for example.

BizTalk Server and Agile. Can they live together?

So far I saw only the waterfall methodology in the BizTalk project development. I worked for small and big companies and everywhere I saw only waterfall. Is there something special in the BTS project that Agile is never used with it?

So far we, BizTalk developers, have all disappointments of the waterfall development: the long stages of the project, huge and unusable documentation, disagreements between users, stakeholders and developers, bloated code, unsatisfactory quality of the code, scary deployments and modifications. Many such problems described here by Charles Young.

Recently our team decided to use Agile principles to address these issues. We had our victories and our defeats, but right now we feel better on our journey to the “Agile world”.

The business goal is simple, we desperately need faster development. When we deploy and test new code in hours not weeks, we make a lot more iterations, we make and find and fix a lot more errors. Which is just great. Now error is not a catastrophe, it is a small thing. That means the more reliable code. Now our applications are more reliable and we fix errors very fast.

The main reason to use Agile is because of economics. It is not only faster and more reliable, it is cheaper.

We decided to use Agile together with SOA and microservice architecture.  First we thought, the BizTalk is too heavy tool set to be used with Agile. But it happens the BizTalk Server has very special set of attributes that suits microservice architecture very well out of the box. If you think about orchestrations and ports as the microservices, this part of BizTalk fits perfectly to SOA.


Three main things keep BizTalk developers from using Agile. It is the artifact dependencies, “niche, unnecessary tools”, and manual deployment.

Here I am talking only about technical side of problem. The management side will be touched later.

Artifact Dependencies

Dependency is the other side of the code reuse. And here lays one of the main differences between BizTalk applications and generic applications. The later are created as a set of dll-s on C#, Java or any other programming language. In many cases we prefer to simplify things with code reuse, which creates some dependency problems, but usually this is not a big problem. What about the BizTalk applications? BizTalk Server keeps strict control of the working artifacts. Reliability is the king. We cannot just replace one buggy artifact if there is a dependency on it. If hurts redeployment but, remember, reliability is the king. Anything else is not so important. So for us, as the BizTalk developers, the cost of dependency is really high.

Moreover, dependency is a stopper for the SOA application. Keep services independent, and it is easy to modify then, to add new one, to the service versioning.

Niche, Unnecessary Tools

The BizTalk Server is a big toolset. It is impossible to hire a full team of the expert BizTalk developers. Most of the folks are not specialized in BizTalk too much.

So if we keep the technology stack limited, the time to make a new developer productive would be short. Any tool could be replaced by C# code and the decision was to use a bare minimum of the BizTalk tools: Schemas, Maps, Orchestrations.

Completely prohibited are BRE, BAV, ESB Toolkit.

The custom pipelines, direct binding and xslt are limited to the very special cases.

The differentiator was a question “Is this tool 100% necessary and does it require a special skill set?”.

I am not going to start another holy war. Our decisions are based only on our use cases. In your company, in your zoo the decisions would be distinct.


How to make development iterations fast? One part of development cycle is deployment. It is not a problem at all if you write your program on Python or Go. It is not so big problem if you write your app on .NET. But in BizTalk development it is the Problem. So any methods to keep deployment quick are very important.

BTDF was a necessary tool in our case but the BizTalk PowerShell Provider is also used.

And deployment is always about dependencies. Penalize more dependencies and prize less dependencies, that is the idea.

Technology Rules

We started with defining rules about technology side of the projects. We forced several SOA and microservice rules:

  • Service size: Service encapsulates a single business function and exposes only a single interface. This rule effectively cuts a BizTalk application into a single orchestration (or couple ports) in most cases.
  • Shared Contracts, API: The services must communicate only through contracts. The only permitted dependences between services are the contracts. We never share maps, orchestrations, ports, pipelines between applications. We only share schemas and API-s.
  • Versioning: the service upgrade is published as a new service. Once published service is never changed and could be only removed, not changed.
  • Tests: Tests are important part of application. Application without tests is not approved. We need only user acceptance tests. The minimum test set should cover: Successful tests and Failure tests. Performance tests and Unit tests are not mandatory. The special test data is part of design. We tried to design our application in such a way that test data can be used in production together with production data.
  • Automation: Automated deployment is important part of application. Application without automated deployment is not approved.

As you can see this list has some specifics for the BizTalk projects. BizTalk is oriented to XML that’s why we tell an XML Schema when we mean a Contract. In BizTalk projects it is really hard to test endpoints, so we tried to be easy with testing. The deployment of the BizTalk application is the complex and long process, so we force the deployment automation.

These rules were not easy to push in development. There were many unanswered questions on the way: What size is “small” and what is “big”? What modification is considered as a new version? What test coverage is enough?

These rules are not ideal. We had discussions, we added and removed rules, we changed them. We are still in the process.

Team Rules

Those are the “technological” rules. But we need the management, team rules, because Agile is about the team structure and communications, right? The Conway’s law cannot be ignored if you think about Agile. Also we cannot avoid all this hype with DevOps movement. So we put in practice some additional “team” rules:

  • One service implemented by one developer.
  • One service implemented in one week sprint.
  • Service developer is DevOp, he/she is responsible for deployment and all operations of the service in all environments, including production.

“One week” rule is just happened after couple months of experiments. We tried 2-3 days and 2 weeks. One week works because we want to keep all projects in one team. All developers work remotely, we never met in team together. And our team is not big, so “one week” was cause of those factors.

One of the key issues with Agile and DevOps is the service knowledge is tightly coupled with single developer. Enterprises cannot tolerate this issue because it drastically increases risks to lost service if developer is not accessible. So we added a new role (the senior developer) and two more “team” rules:

  • A Senior developer approves the service design. This senior developer performs tests and deployment of the service in production and sign service to production.
  • If service developer or service senior developer is not accessible, a new developer should take responsibility for this role.

The “Senior Developer” rule is tricky. With this rule a senior developer cannot just sign documents, approve something and voila. No way. This rule effectively forces senior developer to make good code review and monitor all development steps. Officially this rule is cover two short tasks, but they are short only if this team of two invests good amount of time to communicate all details of the service.

Dependency Rule

The true hart of the SOA and microservice is in limited dependencies. With BizTalk applications the dependency problem is even more important than in the standalone applications, because the BizTalk controls dependencies and prohibits many shortcuts permitted in the standalone applications. So eliminating dependencies simplifies development and makes possible to make simple services, the microservices, which is our goal.

But we cannot just remove all dependencies. The service dependencies are the necessary evils. We should share schemas, dll-s, services. So there are special “dependency” rules:

  • Any changes in the shared resources, any changes that could go outside of the service boundaries must be approved by the whole service team. All team members should agree on this change. Any team member can veto the change.

For example, we implemented the shared infrastructure for logging. The first approach was to create the shared library (dll) and force everybody to use it. It was vetoed. The second approach was to create a special logging services and expose it as a single endpoint. It was also vetoed, because the code to use this service was too big. The next try was to use log4net or NLog and standardize the log format only. Then there was the next attempt and more. Now we discuss the InfluxDB, but what is matter, now we know much-much more what we need in reality and what we don’t need.

Complete Rule Set

Now we have this rule set:

  • Single Interface
  • Shared Contract
  • Never Change [Published]
  • Test inside Application
  • Automated Deployment
  • DevOp
  • Senior Developer
  • New Developer
  • New Dependency

These rules are not ideal. For example, we still struggling with the Wrong Requirements problem. How to fix this problem?

Our rule set is too long. Now we consider to join the Test and Deployment rules.

This rule set works for our team. What is special about our team? A half of the team members are the full-time employee, which enables the DevOp and Senior Developer rules. Not sure these rules would work if all members are contractors. Our team has a big list of projects to develop and the big list of applications to support. If you have mostly applications to support, our approach is possibly not the best fit for you.

I did not describe the communications with our customers and stakeholders, which is important part of the whole picture.

I did not describe the Agile practices we use (but we definitely use the Kanban board), it is not the point of this topic.

We were happy with our management, which took a risk of changes in processes and the team. Now the management is happy (hmm… almost happy) with BizTalk development and operations. Now we are SUPERFAST team! We are not sharks, not yet, but we are not jellyfishes, not anymore.

See also:

  1. Microservices by Martin Fowler
  2. Integration and Microservices by Charles Young.

NServiceBus vs. MassTransit

I have got an interesting request to compare and choose the right integration technology for one of my customer.

By the end of the day the technologies were limited by NServiceBus and MassTransit systems only.

The comparing process is simple. We create a sieve and filter the technologies through it. In the end we have got several, one or even zero winner.

The filters are the mix of technology, life cycle or old man questions. The whole filter process looks unscientific and unsystematic but it is sane (I hope) and simple.

Here are the filters:

  • What systems do we integrate?
  • What are your development resources: the team skills, the team agility, the team size?
  • Java, .NET or both?
  • What is the life horizon of the integration? 1, 2, 5, 10 years?
  • The nonfunctional requirements:
    • reliability
    • sustainability
    • scalability
    • performance: throughput (messages per sec/day); message size; latency
  • etc.

The integration technology could fail just on one filter and it will be enough to filter it out of competition.

So how NServiceBus and MassTransit compete there? They are very similar in technology aspects and a non-technology factor should give us the winner. It is the life horizon factor.

Here is the development activity on the code base of both systems:




Do I need to comment on those graphs? Probably not.

OK Let’s do one more check with the Google Trends:


There is a company that supports the NServiceBus, it is the Particular Software. The MassTransit is supported by open source community only, that had shown a lot of enthusiasm at early years but pretty much stopped by now. The key developers of the MassTransit have moved on some other projects.

I did not work with any of this system and, by any means, I do not choose the “best” integration system. Some of the features of one system could be much better then other, it doesn't matter now. I do not choose the better system, I do chose the failed system only for my specific requirements.

It happens the customer wants the integration code works and constantly improves at last next 5 years. From this perspective the MassTransit failed.

[Update: 2014-07-16
 BTW One of the core developer of the NServiceBus, Udi Dahan, made a comment on this post about the first picture. Seems a good proof of the healthy system.]

Domain Standards and Integration Architecture

The domain standard schemas are the repositories of domain knowledge. The message schemas are schemas for the real data transferring between real systems. Use the domain standard schemas as a reference model for your schemas. Do not use the domain standard schemas as your message schemas.

EDI to XML or EDI to SQL?

    In the EDI processing we need to transform data from the EDI format to the data formats of our applications and back. App to EDI to App - ERD

    What is the best way to do this?

    The most popular transformation is using an intermediate XML format. We use the XML Schema and the XSLT standards to transform one XML format to another XML format. Is it the best way?

    Let’s look at the whole data processing chain:

    EDI to XML to SQL - ERD


    1. EDI to XML transformation;

    2. data transformation (with XML Schema and XSLT);

    3. XML to SQL transformation.

        Now I’m going to step back and look at the EDI document structure a little bit.

        Does the EDI structure resemble the XML structure or the SQL structure?

        Definitely it resembles the SQL. The EDI segments are like the SQL table records. The EDI elements are like the table columns. (Sometimes the elements are compounded from several subelements but it is unimportant now.)

        Does the EDI structure resemble the XML structure? It is not. The XML relations are expressed by nesting structures. For example in XML the Order Detail records are nested inside an Order record. In EDI the Order Detail and Order segments are not nested, the relations are defined the same way as it is in SQL by some correlated IDs in related segments.

        So, EDI structure resembles the SQL structure not the XML structure.

        Also remember our final goal, which is the application data in the SQL format. Can we just bypass the XML format and transform EDI directly to SQL format?

        This is more natural. We throw out two complex EDI to XML and XML to SQL transformations and replace them by single EDI to SQL transformation. Why the EDI to XML and XML to SQL are so complex? Because we have to map the reference relations to the nesting relations or vise versa. It is not simple. There are the whole books teaching us tips and tricks for these transformations.EDI to XML OR XML to SQL - ERD

        Why EDI to SQL transformation is simple? Because it is one to one mapping, a segment to a record and an element to a field, the SQL-like EDI structures to the SQL application structures.

        EDI to SQL - ERD

        The is one problem with this new approach. We have to create a code for the EDI to SQL transformation. It is not a hard problem if we use the contemporary techniques like Linq or Entity Frameworks. Those techniques look competitive even against the adapters implemented in specialized integration systems like the BizTalk Server of Mule ESB. Code is pretty straightforward if we use two step transformation.

        EDI to SQL to SQL - ERD

        1. EDI to SQL transformation;

        2. data transformation (SQL to SQL)

          1. The first step is to transform EDI segments and elements to the SQL records and fields that mirror of the EDI structure (EDI SQL), and the second step is to map those SQL records to the SQL records of our application database (App SQL).

            This sample shows us how we could simplify our solutions, i.e. how to do some real architecture.

          BizTalk: Deployment Hell & BizTalk Deployment Framework

          Is there something special in the BizTalk Server application deployment? Why is it so special?

          BizTalk Deployment Hell

          For the .NET applications the live is simple. There is an exe file and maybe several additional dll-s. Copy them and this is pretty much all deployment we need.

          The BizTalk Server requires the dll-s placed in GAC and registered in the special Management database. Why this is required? Mostly because the BizTalk automatically hosts applications in the cluster and because of the reliability. BizTalk application maybe is not easy in deployment but it also not easy to break. For example, if the application A is related to the application B, BizTalk will prevent us from removing B accidentally.

          This Why? question is a big theme and I will not cover it here.

          Another factor is the BizTalk has many pieces that require a special treatment in deployment and run-time.

          Yet another factor is the BizTalk application integrates the independent applications / systems, that means the application should take care of many configuration parameters, as endpoint addresses, user names and passwords and so on.

          As a result the BizTalk application deployment is complex, deployment is error prone, slow, unreliable, requires a good amount of resources.

          And look here, it is a savior, the BizTalk Deployment Framework.

          The BizTalk Deployment Framework - the Champion

          The BizTalk Deployment Framework (BTDF) is an essential tool in arsenal of the BizTalk Server developers and administrators. It solves many problems, it speeds up the development and deployment enormously.

          It is a definitely a main face in the BizTalk Server Hall of Fame. 

          BTDF was created by Scott Colestock and Thomas Abraham.

          It is an open source project despite the fact, that BTDF is more powerful, reliable and thorough than the most of commercial BizTalk Server third-party tools. I think it is fair to donate several dollars to those incredible guys on the Codeplex. Just think abut days and months we save in our projects, in our private life.

          BTDF is an integration tool and it is created by guys with pure integration mindset. It integrates the whole bunch of open-source products in one, beautiful BTDF. Below I copy the "Contributors and Acknowledgements" topic from the BTDF Help:

          Thanks also to:

          • Tim Rayburn for contributing a bug fix and unit tests for the ElementTunnel tool
          • Giulio Van for contributing ideas and bug fixes.
          • The hundreds of active users across the world who have promoted the Deployment Framework, reported bugs, offered suggestions and taken the time to help other users! …”

          And how exactly BTDF saves our life?

          BTDF is installed and tuned up. Now it is time to deploy an application.


          The deployment was successful and I’ve got the long output, what exactly was done in this deployment.

          Here is a log. I've marked my comments in the deployment log with “+++”. The full redeployment lasted 2 minutes.

          If I perform the same amount of tasks manually, I would spent at last  3-4 additional minutes fully concentrated on this long list of deployment tasks. The probability of missing some step would be quite high as the probability of errors. With BTDF I start deployment and free to do another tasks.

          So my gain is 3+ minutes on each deployment. Risk of an error is zero, everything is automated.

          There is one more psychological problem with manual deployment. It is complicated and it requires the full concentration. As a developer I am concentrated on the application logic and a manual deployment task breaks my concentration. Each time after deployment I have to concentrate on my application logic again. And here is where the development performance goes down to the hell.

          There are other helpful BTDF commands:


          • Restarts all host instances, only those are used in this application; restart IIS, if I deploy web-services, all with the "Bounce BizTalk" command.
          • Terminate all orchestration and messaging instances remained after tests.
          • Install the .NET component assemble into GAC.
          • Include the modified configuration parameters into a binding file.
          • Import this binding file.
          • Update SSO with modified configuration parameters.

          All this with one click.

          You do not have to use the slow BizTalk Administrative Console to do those things anymore.

          Check my comments in the BTDF deployment log. Several automated tasks are like blessing! We do not create the drop folders anymore, do not assign permissions to those folders. BTDF makes this automatically.

          We do not care about undeployment and deployment order, BTDF makes everything right.

          We do not stop and start ports, orchestrations, applications, host instances, IIS. Everything is automated.

          Look at the list of the top level parameters in BTDF:


          Remember, I told you the BizTalk application deployment is a little bit complicated? All those application components in this list require a little bit different treatment in deployment. In a simple application we do not use all those components, a typical application uses maybe 1/3 or 1/2 of them, but you have an idea.

          How to tune up BTDF?

          One thing I love in BTDF is the Help. It is the exemplary, ideal, flawless help.

          If you never tried BTDF, there is a detailed description of all parts, all tasks. Moreover there is the most unusual part, the discussions of the BTDF principles and the BizTalk deployment processes. I got there more knowledge about the BizTalk Server deployment than in the Microsoft BizTalk Server Help - official documentation.

          BTDF Help helps if you are a new user or if you use BTDF several years in row. Descriptions are clear, they are not stupid and arranged in clear hierarchy. BTDF Help is one of the best. You are never lost.

          Of course there is a detailed tutorial in Help and the sample applications.

          OK Now we have to start with tuning.

          Typical BTDF workflow for setting up a deployment of a BizTalk application:

          1. Creating a BTDF project.
          2. Setting a deployment project.
          3. Creating an Excel table with configuration parameters
          4. Setting a binding file

          All configuration parameters are managed inside an Excel table:

          Forget about managing different binding files for each environment. Everything is inside one Excel table. BTDF will pass parameters from this table into binding file and other configuration stores.

          Excel helps when we compound parameters from several sources. For example we keep all file port folders under one root. The folder structure below this root is the same in all environments, only the root itself is different. So there is a parameter "RootFolder" and we use it as a part of the full folder paths for all file port folders. For example, we have a "GLD_Samples_BTDFApp_Port1_File_Path" parameter which defined in the cell with formula: = "RootFolder" + ""GLD_Samples_Folder" + "BTDFApp_Folder" + "Order\\*.xml" (of course in the Excel cell it would be something like = C22 & C34 & C345 & "Order\\*.xml"). If we modify the RootFolder path, all related folder paths will be modified automatically.

          OK We work hard setting up the application development. Now surprise, we have to create a new environment. The best part of BTDF configuration fiesta is there: all configuration parameters for ALL ENVIRONMENTS for an application are here just in one table. (If you are tenacious enough you could keep a single table for ALL applications, but you have to ask me how.)

          Settings for different environments: In our Excel table we copy-past a column of one of the existed environment to a new column for a new environment. Then we modify the values in this new column. And again, this is an Excel table, and if we define single RootFolder variable for a new environment and voila, all file port paths for this environment are modified.

          Now we have to pass those configuration parameters to the binding files, right?

          We replace all values in the binding file which are different for different environments, like the Host names, NTGroupName, Addresses, transport parameters like connection strings, etc. We replace these values with the variables, like this:BindingFileParameters

          Now we save this binding file with the PortBindingsMaster.xml name.

          That is pretty much everything we need. Now execute the Deploy BizTalk Application command and application is deployed.

          Deployment into a Production environment is different. It is limited in the installed tools and we have to do additional installation steps. BTDF creates an msi and command file. This msi includes all additional pieces we need to install. We don’t have manually add resources to the BizTalk application in the Administrative Console anymore.

          Conclusion: BizTalk Deployment Framework is a mandatory tool in the BizTalk development. If you are a BizTalk developer you must use it, you must know it.

          Complex XML schemas. How to simplify?

          [Sample code is here: ]

          The XML Schemas are used for two main tasks: 

          • for processing XML documents (for the XML document validation and for the XML document transformation); 
          • for defining the domain specific standards.

          XML Schemas and Domain Standards

          Let's talk about the domain standards. EDI, RosettaNet, NIEM, ebXML, Global Justice XML Data Model, SWIFT, OpenTrave, Maritime Data Standards, HIPAA, HL7, etc. If we look at those standards, we see that schemas embrace the domain knowledge in form which can be formally and officially validated. [In this article I discussed those standards in more details: Domain Standards and Integration Architecture] The XML Schemas are very helpful for such tasks.

          Compare standards which defined in form of the XML schemas and in form of the documents. It is almost impossible to verify if the data satisfy the standard or not if we use the text document where this standard is defined. And it is possible to validate it and validate it automatically, if we use the XML Schemas.

          The domain specialists use XML Schemas to define standards in unambiguous form, in machine verifiable format.

          Those schemas tend to be large, huge and very detailed. And it is for very good reasons.

          But if we start to use XML Schemas for the first task, for processing XML documents in our programs, we need something different, we need the small schemas. In the system integration we need small schemas.

          We don't need an abundance of HIPAA schemas in most applications. We only need a small portion of schema to validate or transform the significant for this application part of schema.

          We upload the megabyte size schemas, we perform mapping for these huge schemas, and it lasts for eternity and it consumes huge amount of CPU and memory.

          For the most integration projects we don’t want to validate data to satisfy the standard. We want to transfer data between systems as fast as possible with minimal development effort.

          How to work with those wealthy schemas? How to do our integration fast at run-time and in development?

          First we have to decide, does our application require the whole schemas or not?

          If the answer is "No" we could read further to solution.

          How to Simplify?

          Solution is to simplify the schema. Cut out all unused parts of schema.

          The first step in our simplification is to decide which parts of original schema we want to transfer further, want to map to another schema. We keep these parts unchanged and we simplify all other unnecessary schema parts.

          The second step is to research if the target integrated system perform the data validation of the input data or not. Good system usually validates input data. Validation includes the data format validation (is this field integer, date type or does it match a regex expression?), the data range (is this string too long or is this integer too big?), the right encoding (is this value belong to the code table?), etc.

          If the target system performs this validation, it doesn't make sense to us perform the same data validation on the integration layer. We just pass the data without any validation to the target system. Let this system validate data and decide what to do with errors: send errors back to the source system or try to repair or something else. Actually it is not good architecture, if an intermediary (our integration system) is trying to do such validations and decisions. It means spreading the business logic between systems where target system delegates the data validation logic to intermediary. The integration system deals with data validation only if it needed.

          Example: HIPAA Schema Simplification in the BizTalk Server

          Now let's be more technical. The next example is implemented in the BizTalk Server and the HIPAA schemas, but you can use the same principles with other systems and standards.

          The first step in the schema simplification is the structural modification. It is pretty simple. We replace the unused schema parts with <any> tags [http://www.w3.org/TR/xmlschema-0/#any]. If we are still want to map this schema part but without any details, we can use the Mass Copy functoid.

          The second part of the schema simplification is the type simplification.

          For the HIPAA schemas I use these regex replacements:

          Open your schema with XML (Text) Editor mode:


          Click Ctrl-Shift-H (Find and Replace in Files) and check “Use Regular Expressions” option:


          Make two replacements:

          • type="X12_.*"  --> type="xs:string"
          • <xs:restriction base="X12_.*">.*\n.*\n.*\n.*</xs:restriction>  --> <xs:restriction base="xs:string"/>

          Save and close.

          Open schema again with Schema Editor, make any small change and undo it. Editor will recalculate type information and pops up the Clean Up Global Data Types window. Check all types and click OK.


          This cleans up all unused Global Data Types.

          Previously we replaced all those types with “xs:string” type and those types are not used anymore.

          It takes 5 minutes for this replacement. What is the result?


          The modified schema is twice smaller.

          image - is the dll size with original schema.

          image - is dll size with modified schema.

          The assembly for modified schema also cut twice in size.

          Result is not bad for 5 min job.

          How these simplified schemas change our performance?

          All projects with schemas and maps are compiled in Visual Studio notably faster. I like this improvement as a developer.

          How about the run-time performance?

          I have made a simple proof of concept project to check the performance changes.

          Test project

          The project compounded of two BizTalk applications and two BizTalk Visual Studio projects. Do not do this in production projects! One Visual Studio solution should keep one and exactly one BizTalk application.

          Each project keeps one HIPAA schema, one very simple schema, one “complex” map (HIPAA schema to HIPAA schema), and one simple map (HIPAA schema to the very simple schema).

          The first project works with original HIPAA schema and the second project with simplified HIPAA schema.

          Build and Deploy one project.

          Each BizTalk application compounded of a receive file location and a send file port. The receive location uses the EdiReceive pipeline to convert the text EDI documents into the XML documents. So we need to add a reference to the “BizTalk EDI Application”:


          After deployment import the binding file which you find in the project folder. Create the In and Out folders and apply necessary permissions to those folders. Change the folder paths in the file locations for your folders.

          There is also a UnitTests project with several unit tests. Change folder paths in the test code.

          Perform tests.

          Then delete the application and deploy second BizTalk project and perform tests again.

          Do not deploy both projects side by side.

          Performance results:

          Note: Before each test start the Backup BizTalk job to clean up a little bit the MessageBox.


          Tests for 1, 10 and 100 messages did not show visible difference. The difference could be noticeable in my environment in 1000 message and 3K message batch tests. The above table shows the test result for 3K batch tests.

          The performance gain is about 10%. It is not breathtaking but anyway it is not so bad for the 5 minutes effort.

          Conclusion: The schema type simplification is worth to do if the application expects the sustainable high payloads, the high peak payloads, and everywhere you want to get the best possible performance.

          BizTalk: Complex decoding in data transformations

          Sometimes we need to make complex decoding in the data transformations. It could happen especially in the big EDI documents as HIPAA.

          Let’s start with examples.

          In one example we need to decode a field. We have the source codes and the target codes for this field. The number of both codes is small and the mapping is one to one or many to one (1-1, M-1). One of the simplest solution is to create a Decode .NET component. Store the code table as a dictionary and decoding will be fast. We could hard code the code table, if codes are stable, or extract it in cache, reading it from a database or a configuration file.

          The next example is on the opposite side of complexity. Here we need to decode several fields. Target codes related to several source codes/values (M-1). It is not the value-to-value decoding but this decoding also includes several if-then-else conditions, which opens a can of worms with 1-M and M-M cardinality.
          Moreover the code tables are big and cannot be placed in the memory.

          We can implement this decoding with numerous calls to the database to get the target codes and perform these calls inside a map or inside a .NET component. As a result for each document transformation we calls database many times.

          But there is another method of implementing this without flooding the database with these calls. I call this method the “SQL Decoding”.

          We might remember that SQL operations are the set operations working with relation data directly. The SQL server is very powerful in executing these operations. Set operations are so powerful, we might decode all fields in a single operation. It is possible, but all source values should be in database at this time. All we have to do is to load the whole source document to the SQL data. Hence our method is:

          1. Load the source message to SQL database.
          2. Execute encoding as a SQL operation or series of operations.
          3. Extract the target message back from the SQL.

          We can do all structure transformations in maps (XSLT) and perform only decoding in SQL form. Or we also can do some structure transformations in SQL. It is up to us.

          The Pros of this implementation are:

          • It is fast and highly optimized.
          • It does not flood database with numerous calls.
          • It nicely utilizes the SQL engine for complex decoding logic.

          The Cons are:

          • Steps 1 and 3 can be not simple.

          In real life we usually don’t have a clear separation between our scenarios and the intermediate solutions can be handy. As an example we can load and extract not a whole message but only part of it, related to decoding.

          Personally, I use the SQL Decoding in the most complex cases where mapping takes more than 2 days of development.


          • If you are familiar with LINQ, you can avoid steps 1 and 3 and execute set operations directly on the XML document. I personally prefer to use LINQ. But if XML document is really big, SQL approach works better.

          Conclusion: If we need a complex decoding/encoding of the Xml document, consider to use the set operations with SQL or LINQ.

          BizTalk Integration Development Architecture

          You can find some architecture information into the BizTalk documentation. You find out several tutorials and good amount of samples. Almost all of them related to the infrastructure architecture, i.e. how to create high available systems with clusters, how to scale out the BizTalk systems, etc. 
          I covered several development aspects of architecture in series of articles:

          Copying a new build to all environments

          I am doing this task again and again, so maybe this code will be helpful not only for me.

          That is a standard routine. I am developing a BizTalk Server applications and use the BizTalk Deployment Framework (BTDF) for all my deployments. When an application is ready for testing and, at the end, for production, the build files have to be deployed. Usually the BizTalk installation has several environments. For example, the environments can be: Development, QA, Staging, Production. Sometime less, sometime more. The best practice is to keep all environments isolated of each other. So each environment keeps its deployment packages separately. That means, in my case, the build files should be copied to all environments.

          A good practice is to save the old builds in case of rollback.

          The folder structure for the builds looks like this:


          The Current folder keeps the currently deployed build. The [YYYYMMDD_hh_mm_ss] folders keep the old builds.

          What is interesting in this code?

          The Copy performs two nested loops. One through the EnvironmentName and the second through the NewBuildToCopy files.

          Also the code generated the folder name in [YYYYMMDD_hh_mm_ss] format.

          Here is the code:

          <!-- Copy a new deployment build to all environments and to a Personal share. 
            Before this rename a Current folder to the [CurrentDateTimeTime] to save an old build. 
          <Target Name="AfterInstaller" AfterTargets="Installer">
            <!-- Rename Current shares to the [CurrentDateTime]: -->
              <EnvironmentName Include="QA;STG;PROD"/>
              <CurrentShare Include="$(Shares)%(EnvironmentName.Identity)$(SourceCodeShare)" />
              <CurrentShare Include="$(PersonalShare)" />
            <Exec Condition="Exists('%(CurrentShare.Identity)\Current')"
                   Command='Rename "%(CurrentShare.Identity)\Current" "$(CurrentDateTime)"'/>
              <NewBuildToCopy Include="$(NewBuild)\**\*.*">
            <!-- Copy the last build to the Current shares: -->
            <Copy Condition="@(NewBuildToCopy) != ''"
                  DestinationFiles="@(NewBuildToCopy->'%(Destination)\Current\%(RecursiveDir)%(Filename)%(Extension)')" />

          This target could be a part of the Deployment.btdfproj file (which is a file from the BTDF Deployment project). Also you can add it to the BizTalkDeploymentFramework.targets file.

          BizTalk: Custom API: Promoted Properties

          How to get the Promoted Properties within .NET code? This sample exposes very simple API to access all Promoted Properties, currently deployed into a BizTalk group.

          The Best Application Server from Microsoft

          -Are you stupid? The BizTalk Server is an Integration Server. It is nothing to do with Application servers.

          That’s what you are probably thinking now…

          Application Types

          Let’s discuss different application types.

          Sequential Processing Application

          These applications work as single-threaded processes. An application gets the data potions and processes them one after another.
          . BatchProcessing

          One exemplar of such kind is a file processing application. An application reads a file and processes it, then it reads another file, etc. An application starts with a new file only after finishes with a current file. It works with data in sequence; it processes one data portion then starts with another.

          OS works as an application server for such applications. The applications are started by users of when OS is started. Widows Service is such application. OS provides the additional management features like an automatic restart after failure, etc.


          The Service [Application]

          An example of this kind of applications is a web-site. An Application Server here is the Internet Information Server (IIS). A Service Application is constantly waiting a request from clients. A separate service instance is started for each client request.


          If there are too many requests, the application server places the waiting requests in a queue, so the requests are not over flood the server. For the sequential application only a single application instance is working at the time, here for the service application many application instances are working simultaneously, one instance per a client request.

          Usually a service does not store the client data to a durable storage. If the service instance is crashed for any reason, a client just repeats a request and a new service instance will be created. Reliability is good enough, if the service instance works for a short period of time. But if a service instance works for hours or days, the possibility of the crash increases and reliability is not good.

          The Long-Running Service

          In the simplest case a service instance gets a client request, processes it, returns the result data, and that’s it. A little bit more complex case, when a service instance itself requests data from an external service. In this case the life duration on this service instance is unpredictable. We cannot manage the external services, we don't know when they respond. Our service instance can possible wait hours and days, and the Application server can move the service instance snapshot to the durable storage and move it back when it gets the result from an external service. This process is called dehydration-hydration, and it increases the service reliability. Also it frees the computer resources, so a large number of dehydrated service instances do not clog the computer.


          Another problem with long-running services is a request correlation.

          Everything is simple if a new independent service instance is created for every client request, a service instance process a request, and returns a result to client. But now the service instance itself requests an external service. Imagine hundreds service instances are working simultaneously and all of them send requests to an external service. The external service processes these requests and returns the responses back. As a result we have hundreds service instances and hundreds responses. How to route a response to the right service instance, that requested exactly this response? Each such response must have some information which links this response exactly with the correct service instance. The process of linking is called the correlation. For example, the service instance Id is included in the request, and then this Id is moved into a response and is used for routing this response back, to correlate this response with the right service instance.

          The correlation data should be unique, so each response can be correlated with correspondent service instance.

          Ideally the application server provides the correlation mechanism.

          Another problem with long-running services is the “dead instance cleanup”. For example, a service instance makes a request to an external service and doesn’t get a response back in predefined period of time. The external service could be down, the network could be overloaded, etc.; there could be many reasons in the distributed world. In another example, the service instance is hanged and does not respond in any way. It would be clever to declare such service instances as “dead” and remove them from memory.

          One popular method for the dead instance cleanup is the “heartbeat measure”. A special heartbeat service periodically requests all working service instances and if some instance does not respond, it is pronounced dead and removed from memory.

          The requests to the external services can return errors for different reasons. Sometimes the external service returns legitimated business error and we cannot do anything with it. A request is processed with error and it will be an error again, if we repeat the request. But most frequently the error is temporary. For example, the network is overloaded, the database is overloaded, the I/O queue is overcrowded, the external service is busy, etc. In such cases the request could be repeated after a small delay and after several times we get the good response back. The error will be raised only, if the request is retried too many times.

          So we have a list of additional functionality for the application server which works with the long-running services:

          • a durable storage for the service instances
          • a correlation mechanism
          • a system for monitoring and cleanup of the dead instances
          • a retry system for the requests to the external systems

          And it happens, the Microsoft has such application server in its arsenal. It is the BizTalk Server.

          The BizTalk database, which is called the Message Box, performs as the reliable data storage. The service instance could dehydrate in this storage and wait here infinite time. Thousand, hundred thousand service instances could wait and the performance of the whole system will not decrease.

          The correlation mechanism is a part of the Message Box. Requests are analyzed and routed to the service instance, which is waiting exactly this request, otherwise a request starts a new service instance.

          There is a heartbeat and the dead instance cleanup mechanisms.

          There is a retry system which automatically resends requests to the external services.

          And here is the main feature of the BizTalk Server, its famous reliability. The BizTalk Server processing power is spreads between several computers and data is stored on the SQL clusters. Deployed applications are working for years without any attention. Power grid could be down, the servers could crash, the network went down, the hard drives could fail. But the applications are not affected. The BizTalk Server automatically restarted, hundreds or thousands interrupted service instances are restored and work as nothing had happened.