I was recently asked if I would provide some insight into solving the following problem. It's some time since I have been blogging on BizTalk in depth, and the question gave me an excuse to write something on my favourite integration tool. Advanced BizTalk developers will know all about this subject, but if your are fairly new to BizTalk, this might be of help.
In the given scenario, data stored in flat files in being delivered to the BizTalk message box via a pass-through pipeline. An orchestration subscribes to these messages and creates XML output. Each time a file is processed, the developer wants to 'dump' the contents of the flat file into an element in the output. Other information, including the file name, is also included in the XML.
Here is an example of the flat file input:
Name: John Doe
Address: 111 Yellow Brick Rd., OZ
Telephone: 555-5555
The output message will look something like:
<MessageA>
<fileName>inputfile.txt </fileName>
...other info...
<fileContentDump>
Name: John Doe Address: 111 Yellow Brick Rd., OZ Telephone: 555-5555
</fileContentDump>
</MessageA>
There are a number of ways this requirement could be met. Perhaps the most efficient mechanism would be to avoid using orchestrations and to perform all the work in a pipeline. A custom pipeline component could be created to output the required XML. An alternative might be to use the flat file disassembler. However, this would present some complications. For example, the disassembler has no way of including the FileName property value in its output. It could be used in conjunction with an additional custom pipeline component to modify the XML output, or a custom pipeline component could be built around the flat file parser. It may also be necessary to perform further transformation on the disassembler output to create the required results.
There are many reasons why orchestration might be required in this scenario. Orchestrations provide a relatively simple environment in which to code your logic. By contrast, pipeline development is easy to do badly and often difficult to do well. If we assume that the XML will be created in an orchestration, there are one or two challenges we must overcome. Chief amongst these is how to access the contents of a non-XML message within the orchestration Seasoned BizTalk developers are used to solving these problems, but it is not always as easy as we might wish. Some aspects of BizTalk are fairly arcane, and it takes a bit of time to get your head around the techniques involved. Here is one (fairly simple) approach you can use. There are more advanced techniques that can be utilised, but I'll leave that for another time.
First, let's investigate some issues concerning subscription. If you are using a pass-through pipeline, your orchestration cannot subscribe to the flat file message by its MessageType property. This property is set by pipeline components such as the XML Disassembler and the Flat File Disassembler. A pass-through pipeline does not contain any pipeline components. Hence the flat file message in our scenario will have no MessageType property for you to subscribe to.
The MessageType property is a simple mechanism used extensively within BizTalk development to tag messages with an attribute indicting the content type of the message for the purposes of subscription. It is a property, and as such, represents a very loose approach to typing. It has a degree of specialised handling in BizTalk; primarily this concerns BizTalk orchestration ports subscribing to, or setting this property as a result of the message types you specify inside the orchestration, and also a rather restrictive feature in which BizTalk orchestrations attempt to prevent developers from setting this property directly. The orchestration engine spots any attempt to do this and throws an exception. This is one of the more significant restrictions in BizTalk, and one which I have always found questionable, as it prevents a number of useful and desirable design patterns from being implemented. I've blogged previously on ways to circumvent this.
In order to subscribe within an orchestration to a message that does not have a MessageType property, you have to set the MessageType property of the 'Operation Message' in your orchestration's receive port to either System.Xml.XmlDocument or the 'Any' schema. Either of these options prevents the port from including MessageType in its subscription criteria. In this case you should choose the former option, even though the incoming message is not XML! System.Xml.XmlDocument is a .NET class. When this option is used, the orchestration engine represents incoming messages using message part classes designed for content that represents serialised .NET classes. If you use the 'Any' schema, you are explicitly saying to the engine that the content of the message is XML, which is not true. The orchestration engine uses a different class to represent the message part. I realise the semantics of this is crazy. BizTalk is generally a very coherent product, but this is one area where the approach is rather obscure.
Moving on from subscription, I will assume that we have already created an XSD schema for the XML we wish to output from the orchestration. In most cases, it is a good idea to place BizTalk schemas in a different BizTalk assembly to the orchestrations which use those schemas.
In the Receive shape in our orchestration, we need to assign the incoming message to an orchestration message variable of the same type as the message. In our case, this is System.Xml.XmlDocument. BizTalk insists that the message types of our message and 'Operation Message' match. Immediately after the receive shape, we add an Expression shape and add a line of code something like the following:
xDocOut = MyProject.Helper.GetXML(ffInMsg);
ffInMsg is the message we receive in the Receive shape. We define xDocOut as an orchestration variable typed as System.Xml.XmlDocument. This variable will end up holding the XML we want to output. MyProject.Helper.GetXML is a static helper method which we need to create in a custom assembly. Don't forget to strong-name the assembly and place it in the GAC. The code for this helper method will accept our input message variable (ffInMsg) assigned in the Receive shape. It will extract the contents of the message 'body' part. We can assume, here, that the message will only contain this one part. The code also gets the value of the ReceivedFileName property. This property is assigned by the File adapter, which I am assuming is being used to deliver the message to BizTalk.
When creating the helper code, we need to reference the following two assemblies in the custom assembly:
C:\Program Files\Microsoft BizTalk Server 2004\Microsoft.XLANGs.BaseTypes.dll
C:\Program Files\Microsoft BizTalk Server 2004\Microsoft.BizTalk.GlobalPropertySchemas.dll
The code for the static method will look something like this:
public static XmlDocument GetXML(XLANGMessage msgIn)
{
StreamReader sr = new StreamReader((Stream)msgIn[0].RetrieveAs(typeof(Stream)));
string ffContent = sr.ReadToEnd();
string fileName = (string)msgIn.GetPropertyValue(typeof(FILE.ReceivedFileName));
// Build the output XML
StringBuilder sbOut = new StringBuilder();
sbOut.Append("<MessageA><fileName>");
sbOut.Append(fileName);
sbOut.Append("</fileName><fileContentDump>");
sbOut.Append(ffContent);
sbOut.Append("</fileContentDump></MessageA>");
XmlDocument xDocOut = new XmlDocument();
xDocOut.LoadXml(sbOut.ToString());
return xDocOut;
}
The code takes a fairly simplistic approach, assuming that the contents of the stream can be read as a string and dumped into the outbound XML without any further processing. If the flat file content can contain '<', '>' or '&', this assumption will not hold true. The code will need to be extended to escape these characters, or alternatively wrap the content in CDATA section.
You will need to add a project reference to the BizTalk assembly containing your orchestration in order to reference your custom assembly.
Now place a Construct Message shape immediately after the Expression shape. This will construct the outbound message over the XmlDocument returned by the helper code. Create a second orchestration message and set the Message Type property of this message to your custom XSD schema. Assign the message to the Construct Message shape as the message it will construct. Add a Message Assignment shape to the Construct Message shape and enter an expression similar to the following:
xmlOutMsg = xDocOut;
...where xmlOutMsg is the name of the second orchestration message. xDocOut is the XmlDocument returned by the helper code
All that remains in to use a Send shape and orchestration send port to return the outbound message to the message box and route it on to its destination end point.
This technique uses the RetrieveAs method of a BizTalk message to obtain the data stream containing the body part of the message. Within orchestrations, you can also use LoadFrom in order to write content out to a message part stream from a given source. These two methods are quite powerful because they both support several different ways of inputting or outputting message part content. For example, RetrieveAs can output part content as a stream, an XmlDocument, an XmlNode, an XmlReader or a .NET class. You pass a Type parameter to specify the type of output your require. The XML-based types can obviously only be used if the content is XML. Only 'Stream' will work in this example. You could alternatively create a custom .NET class and associated 'formatter' (serialisation/de-serialisation) code to handle the flat file content.
Novice BizTalk developers understandably expect that they should be able to extract the content of a message part directly within their XLANG/s expressions within a BizTalk orchestration, and are generally surprised to find that helper code written in a language such as C# or VB.NET is required. Why is this? Well, that's a good question. It's not really for me to speculate why Microsoft does not support this in XLANG/s. However, it's worth noticing that in the longer term, XLANG/s will be phased out. I first saw an early version of XLANG/s in 2000, and at the time it raised a good deal of interest. XLANG/s is a script DSL (Domain-specific language) which is used to drive a C# code generator. At the time it was being used to generate orchestration code hosted within ASP.NET applications. When BizTalk 2004 shipped a few years later, an additional graphical DSL had been layered on top of XLANG/s. This, of course, is the BizTalk orchestration designer, and it provides a very thin layer of abstraction over XLANG/s. Over time, it has become obvious that the real value lies in the graphical DSL, rather than the script DSL. Very few BizTalk developers code XLANG/s directly, except when creating simple expressions or assigning to messages, and frankly there is little need to understand XLANG/s in depth. Microsoft has yet to publish a definitive description of the language to the general public. There is an internal Microsoft document that describes a beta version of the language in some depth, but Microsoft has never released it.
The future of orchestrations can be see in the Windows Workflow Foundation (WF) which ships as part of .NET Framework 3.0. WF was created by the BizTalk team, and provides many improvements over the current BizTalk orchestration model. We can expect that a future release of BizTalk will use WF technology to replace the existing orchestration features. WF provides a graphical DSL similar to the orchestration designer in BizTalk, but has no equivalent to XLANG/s. Instead, the graphical DSL directly generates C# or VB.NET code, rather than script. This is a cleaner model than the current BizTalk approach. I would recommend, as a general rule, that you treat XLANG/s as a temporary necessity, and try to exploit it as a thin scripting layer for performing a few simple actions in expression and message assignment shapes and for calling into custom code to do the heavy lifting. In this respect, it should be used rather in the same way as many developers used the old ASP as a thin layer for invoking custom COM components.
It may be helpful to point out that, within BizTalk, message content is always associated with message parts rather than the message itself. The MessageType property set in a pipeline is set at the message level, but is typically used to indicate the type of content stored in the 'body' part of the message. Multi-part messages may store different content types in its different parts, and the two orchestration message methods (RetrieveAs and LoadFrom) can be used to manipulate the content of these individual parts.
As I said earlier, there are more advanced alternatives. Using .NET classes with suitable attributes, it is possible to create code which will be used By BizTalk to de-serialise message content into runtime objects. These objects can be manipulated directly within your orchestrations in order to perform transformations and apply business logic. As this article was written to answer a specific query from someone fairly new to BizTalk development, I won't discuss this further on this occasion.