Geeks With Blogs

News
Charles Young

A few days ago I posted an article on certain behaviours we had observed and investigated in relation to BizTalk receive pipelines.   If anyone has been reading the feedback to the post, you will be aware that since then I’ve managed to make some further headway in this matter.   My previous article simply described the behaviour and offered some speculation.   I now know more about the causes.   This new article is intended to replace the previous article, which you can still read here.

The problems we encountered resolved themselves into two distinct issues.   One is an issue with the XmlDisassembler component, and the other is a ‘feature’ of BizTalk’s handling of inbound maps on receive ports.  This article describes these two issues, their characteristics and causes and how to avoid them or implement workarounds.   It also provides some additional background information on inbound maps and some general guidelines on designing and implementing custom pipeline components.

Inbound Maps
BizTalk allows maps to be assigned to receive ports.   You can assign multiple maps to a single receive port.  Maps are assigned at port level, rather than at the level of receive locations.   One of the great benefits of in-bound maps is that you can use them to transform messages before they hit the message box.  This makes it easy to ensure that different messages, submitted to different receive locations, are transformed into canonical formats before being routed to service instances by BizTalk’s subscription mechanism.

BizTalk decides, on a per-message basis, which available inbound map, if any, to use.   It does so by inspecting the context of each message, looking for a promoted property called MessageType.   This property is central to a number of BizTalk functions, and specifies the type of the message. The value of the property concatenates the target namespace (if any) of the schema that describes the message type, and the name of the document (i.e., ‘root’) element.   The document element name is included as a URI fragment specifier.

The MessageType property is chiefly used by the BizTalk subscription mechanism to route messages.  It is marked as a promoted property within the message context in order to allow the subscription mechanism to operate on it.   The property is generally set by a disassembler component, but could potentially be set by any pipeline component in another pipeline stage.

When one or more inbound maps are assigned to a receive port, BizTalk uses the MessageType property to select a map by comparing the value of the property to the source schema of each map.   When a match is found, the map is executed over the content of the body part of the message, and the results are assigned back to the body part.   If no match is found, no map is executed.   This is not treated as an error, and the un-transformed message is delivered to the message box.

Stream Processing
Maps are executed after pipeline processing has been completed.   They operate on the contents of the body part of the message delivered by the pipeline back to BizTalk.  This message may be different to the message initially provided to the pipeline.   For example, it is sometimes necessary for pipeline components to replace an existing stream with a new stream (possibly within a new message and message part), although this should generally be avoided wherever possible.  

Replacing one stream with another is often a sign of lazy pipeline component programming and tends to be inefficient because it generally involves saving away the contents of a stream in a stateful manner in order to subsequently write the content to another stream.   The whole point of employing a streaming approach within pipeline processing is to promote an efficient, stateless model of message content processing.  The chief philosophical difference between the EPM (messaging) and XLANG/s (orchestration) host instance sub services is the issue of statefulness.   The EPM sub service, which manages BizTalk receive and send handlers, is designed to function in a stateless manner.   Orchestrations are stateful.  For example, EPM services do not persist state to the message box.  Orchestrations, by contrast, persist state to the message box every time the process flow reaches a persistence point (denoted by bold borders in the orchestration designer).  Orchestrations support dehydration and recoverability models that are not available in pipelines.

If the contents of a stream need to be amended or changed by a pipeline component, this is generally best done by wrapping the original stream in a new stream object that processes the data on the fly during read operations.   In the most efficient pipeline designs, this can mean that all content processing is done just once when BizTalk reads the stream after the pipeline has completed its work.   At this stage, BizTalk reads the entire stream, and each stream wrapper can perform its work.   This approach is naturally stateless, although some temporary buffering is often required during a read operation.

The design and implementation of well designed, efficient pipeline components often requires thought and hard work.   Too often, it proves easier to adopt a simpler, less efficient approach.   However, beware.   Cutting corners in development generally does not pay in the longer term.  Interestingly, the two issues we have discovered are both most likely to be encountered if you attempt to short-circuit good design.

Seekable streams
[Update: This issue was fixed in Service Pack 1 for BizTalk 2004] The first of the two problems is that BizTalk 2004 fails to execute inbound maps over seekable streams.   It is as simple, and as brutal, as that.   If, after a pipeline has completed its work, it returns a message whose body part contains a seekable stream, and if BizTalk matches an inbound map to the MessageType property of the message, the map will always fail with a “The root element is missing” error.   If the stream is non-seekable, and positioned at the beginning of the stream, the map will succeed (unless, of course, there is some other problem).   If, for example, you create a pipeline component that replaces the body part stream with a new stream using the .NET MemoryStream class, any inbound map will fail (assuming this is the stream passed to BizTalk at then end of the pipeline).   The MemoryStream class provides a seekable stream.

BizTalk calls the CanSeek property of the stream object in order to determine if the stream is seekable.   This property is inherited from an abstract member of System.IO.Stream, and returns a Boolean value.   If the CanSeek property returns true, BizTalk calls the Length property which should return the byte length of the stream.   The Length property is not called on non-seekable streams.   In either case, BizTalk then repeatedly calls the Read method until the entire contents of the stream has been read.  This happens regardless of the value returned by the Length property, even if the length is reported as 0.

If no inbound map is selected, the message and its contents is delivered to the message box without any problem, regardless of the seekability of the stream.   If a map is selected, the transform will only succeed if the stream is non-seekable.

In passing, please note that I did wonder if the inbound map issue might be related to stream encoding.   I therefore experimented extensively with the Length property, for example by encoding the stream contents in Unicode and returning the number of characters in the stream instead of the number of bytes.   This made no difference at all.

Guideline for Custom Pipeline Components
If you are creating custom pipelines using custom pipeline components, take care that the last component in the pipeline returns a body part with a non-seekable stream.   More generally, I would recommend that, as a guideline, any component you create should either return the same stream it received, or wrap that stream in a non-seekable stream wrapper class.   Only return seekable streams in the rare case it is truly necessary to allow some other pipeline component to randomly access the stream contents.   Avoid, therefore, returning ‘raw’ MemoryStream objects.   If you are tempted to return a seekable stream, ask yourself if you might better implement your functionality within an orchestration or by using functoids and/or script in a map, rather than within a pipeline component.   Remember that seekability always involves a stateful approach because the contents of the stream must be written out to some backing store.  This may compromise the efficiency of your pipelines.

Streams are provided by message parts.  The IBaseMessagePart interface defines a method called GetOriginalDataStream, together with a property called Data.   In the BizTalk documentation, there is a bold statement that says:

“In custom pipeline components, Data clones an inbound data stream, whereas BodyPart.GetOriginalDataStream returns the original inbound stream.”  

As far as I can tell, this is only true if you implement IBaseMessagePart on your own custom message part class and write your own code to create a clone!   BizTalk provides a message factory object (via the pipeline context) which has a CreateMessagePart() method.  Although the message factory is designed for use in custom pipeline components, the message part created by this method does not clone your stream.  The references of the streams returned from Data and GetOriginalDataStream() are identical (i.e., they are one and the same stream object).

As a general guideline, you should normally use the message factory to create new message objects, and then return these from your pipeline component.   You may also wish to use the unsupported PipelineUtil class to copy property bags (for part properties) and clone message context from an existing message.   However, you may wish to avoid using message parts created by the message factory.   Instead, consider creating custom message part classes that return non-seekable streams using GetOriginalDataStream() and seekable cloned streams using the Data property.

Christof Claessens has written a first-class article on a rather different stream cloning technique for efficiently processing stream data within a pipeline component.   One potential use of his approach is to allow the processing of a seekable cloned stream while still returning a non-seekable stream within the body part.   Note that if the cloned stream is writeable, any changes you make to its content will not be handed on to subsequent pipeline components or to BizTalk.   The cloned stream is processed on a secondary ‘worker’ thread, which avoids blocking the main thread when it reads the stream.   Also note that reading the cloned stream is synchronised to the reading of the original stream.   The worker thread cannot read and process stream content until that content has been read on the main thread.   This approach allows you to avoid seriously compromising the efficient streaming of data through your pipeline while allowing complex processing of stream content in a multithreaded fashion.

XML disassembler issues
The second of the two issues concerns the XML disassembler component.   This component creates and returns messages with non-seekable, read-only streams for the body part content.   It does not provide clones via the Data property of the message part object.

The XML disassembler has several uses.  Perhaps its most basic function is to inspect the document element of inbound XML messages to determine their message type, and to promote the MessageType property within the message context.   The component may also be used to validate XML and to disassemble XML into multiple messages.

The XML disassembler uses a stream wrapper class called XmlDasmStreamWrapper to wrap the body part stream.   This, in turn, utilises an instance of the XmlDasmReader which is a specialised XmlReader object.   The Read method of the XmlDasmReader class maintains an internal flag to indicate if the reader has detected a new ‘document’ or not.  By ‘document’ we mean an Xml node whose content is to be returned as a new message by the GetNext method of the XML disassembler.  The new document flag is exposed via the internal IsNewDoc property of XmlDasmReader.   This property is read/write.

The XmlDasmReader instance is referenced directly by the XML disassembler component, and therefore, because both classes are in the same assembly, the XML disassembler has access to the IsNewDoc property.   The XML disassembler uses this property directly in its GetNext method.   The GetNext method of a disassembler returns the next disassembled message, and is called repeatedly by BizTalk until it returns a null value.   In this way, a disassembler can convert a single inbound message into multiple messages.   The implementation of GetNext (the relevant code is actually in a private method called GetNext2) tests the IsNewDoc property.   If the value is true, the code constructs and returns a new message containing the new document content.   If the value is false, the code tests a status flag to ensure that all processing is complete and then returns null.   This in turn causes BizTalk to stop re-executing the GetNext method.

Unfortunately, the GetNext method fails to use the read/write IsNewDoc property to reset the new document flag to false.   Instead, the flag is reset within the Read method of the XmlDasmReader class.   The code in this method then goes on to check if there is a new document, and changes the flag back to true if necessary.

This constitutes a very common type of logical error (I have made similar mistakes countless times myself).   For GetNext to work correctly, it is entirely reliant on the Read method of XmlDasmReader being called at least once.   This, of course, happens when the Read method of the XmlDasmStreamWrapper is called, typically either by a subsequent pipeline component, or by BizTalk at the end of pipeline processing.   The implicit assumption is that, after calling GetNext, the stream is always read before BizTalk makes a subsequent call to the GetNext method.

This assumption is not valid.   Consider a scenario where you use a disassembler to disassemble an Xml message into many messages, and then build a custom validation component.   Your validator tests, say, some context property of your message, and based on its value, decides to either discard or keep the disassembled message.  If the message is discarded, it may be replaced by a different message.   Because the validator makes its decision based on contextual data, it has no reason to read the body part stream.   Perhaps it returns null, instead of a message, or perhaps it creates a new stream and assigns this to the body part.   Because the original stream provided by the XML disassembler is never read, the new document flag is never reset to false.   The next time BizTalk calls the GetNext method, the XML disassembler creates and returns a new message, regardless of there actually being a genuine new document or not.  Because the stream has not been read, the new message is given exactly the same body part content as the previous message.  This effectively sets up a never-ending loop in which BizTalk hogs the processor cycles by repeatedly calling GetNext and generating a never-ending series of identical messages.   Everything else slows to a crawl.   Your only hope is to disable the BTSNTSvc service and kill the process.  You can then create work-around code in your custom validator component, recompile and deploy, and re-enable and start the BTSNTSvc service.

The workaround is simple.   Before discarding the body part stream of a message provided by the XML disassembler, always read the stream.   Now you can happily discard the stream without going into a never-ending loop.   Another possibility is to create a custom disassembler that inherits, or wraps, the XmlDasmComp class.   You could then write your own code to reset the new document flag.  One problem with this approach is that the instance of XmlDasmReader is held in a private field of the XmlDisassembler.   You would have to use reflection to get at this instance in order to set its IsNewDoc property.

Conclusions
It doesn’t pay to take shortcuts in designing and implementing pipeline components.   Implementing good pipeline component design can actually be quite difficult, so consider using orchestrations or maps instead, especially where you must process message content in a non-streaming fashion.   If you do create custom pipelines, design them to be as efficient and stateless as possible, and follow the guidelines outlined above.   In particular, take account of the two issues described in this article.   They are generally easy to avoid, once you know they are there, but can cause hours of headaches if you are unaware of them.  

Posted on Monday, October 4, 2004 5:59 PM | Back to top


Comments on this post: BizTalk Server 2004: Receieve Pipeline woes v2

# re: BizTalk Server 2004: Receieve Pipeline woes v2
Requesting Gravatar...
Hi Charles,

I have a functionality like I need to expose my orchestration as web service and add request/response soap headers to the webservice. And based on the value of soap headers I need to do different things.

Now my query is: Where is best place for accessing soap headers, transform soap headers to our own format and route the message, either from orchestration or
from Pipelines ?

Please bear in mind size of incoming xml schema is quite big.

If your answer is orchestration: I am able to successfully read and write soap headers in orchestration and can do everything I want.

If your answer is Pipelines: I am able to read soap headers from the pipelines as well but I couldnt figure out how to route the message or call the particular orchestartion. I can see only 2 options, please advice me what is the best way.

1. Set the value of promoted properties (property schema) in Pipelines and access their values in orchestartion and call a particular orchestration.
2. Set the value of promoted properties (property schema) in Pipelines and send them to messageBox database and call different orch based on filter property set in receive port.

Is there any better way of doing it.

Please advice me.

Thanks
Anuj
Left by Anuj Mittal on Oct 05, 2004 3:00 PM

# re: BizTalk Server 2004: Receieve Pipeline woes v2
Requesting Gravatar...
Sorry I have not been keeping up with your blog. :) I usually try to see what you are up to periodically. As to one of your major focuses in here, the issue with maps failing for seekable streams, I checked in out and this is fixed in SP1 which is due shortly. So then you can be happy again. :) I need to read through some of your other stuff. In general, though, building a clean and truly streaming pipeline component is not easy as you have found. In most cases, though, you probably don't need it. It only gets interesting with messages which are not particularly small. Keep up the good stuff.

Lee
Left by Lee on Oct 16, 2004 1:53 AM

# re: BizTalk Server 2004: Receieve Pipeline woes v2
Requesting Gravatar...
re. Lee's comments, it also gets interesting if you have a high throughput of messages, regardless of their size. I've been working on a production-quality diagnostic pipeline component that will trace message details and write to a file. There was lots to think about in terms of guarding the high through-put characteristics of BizTalk pipeline streaming while doinf file I/O over message content.
Left by Charles Young on Nov 16, 2004 11:15 AM

# re: BizTalk Server 2004: Receieve Pipeline woes v2
Requesting Gravatar...
I am getting the following error when trying to map xml to a flat file format:

The Messaging Engine failed while executing the outbound map for the message going to the destination URL "\\Benefitsxmlone\D\BeneFix\Mapping Tests\BCBS\BCBS%MessageID%.txt" with the Message Type "http://ns.hr-xml.org/2004-08-02#Enrollment". Details:"The root element is missing."

I am using a pipeline component and setting the document schema property in the flat file assembler component.

The map works correctly in the test environment, but when I deploy the solution, I receive the above error.

I don't even know where to begin to debug this issue. The error message is not very descriptive.
Left by Bob C. on Jan 12, 2005 4:15 PM

# re: BizTalk Server 2004: Receieve Pipeline woes v2
Requesting Gravatar...
I'm having the same kind of problem with an endless loop in my GetNext method using a flat file disassembler. Is this the same problem with the new document flag?
Thanks,

Evan
Left by Evan Bonnett on Mar 24, 2005 9:38 PM

# re: BizTalk Server 2004: Receieve Pipeline woes v2
Requesting Gravatar...
I need to split a big XML which contains multiple records into multiple files which contains x number of records. Here x is not 1. (If x =1, i can do this with Documents & Envelopes). Also, I want to have this feature within a Receive pipeline itself. How to accomplish this.

Please help
Left by Gopakumar R on May 27, 2005 3:53 PM

# re: BizTalk Server 2004: Receieve Pipeline woes v2
Requesting Gravatar...
I have a receive pipeline component that updates a field in the message in the pipeline. however I am getting an "Object reference not set to an instance of an object" error if the schema is outside the project of my pipeline even if the schema was used as reference and deployed to the server. My pipeline works if the schema is in the project. Do you have any idea why this is so and what shld be done?

Thanks in advance!
Left by Vincent on Nov 17, 2005 4:42 AM

# re: BizTalk Server 2004: Receieve Pipeline woes v2
Requesting Gravatar...
Thanks for great exploration!
I've used similar approach to split flat file by records, but wrapped inside one custom component FFDasComp and XmlDasmComp.
Unseekable streat class resolved "The root element is missing" error, but I receive exception from
XmlDasmReader.CreateReader()
when perform a call on XmlDasmComp.Disassemble()
the the only change I do - replace bodypart stream from FFDasmComp to new unseeakble....
What's wrong?
Left by er on Dec 12, 2005 12:38 PM

# re: BizTalk Server 2004: Receieve Pipeline woes v2
Requesting Gravatar...
plz i want to process multiple XML schema in one pipeline, what's the best way to do that??? Thanks
Left by Arrad Tarik on Jan 04, 2006 2:14 PM

# handling multiple components
Requesting Gravatar...
Hi Chris:

your article was great and I enjoyed it throughly.

I also have a question for you,

I have a custom receive pipeline. I am using the flat file disassembler
component to convert flat file data to xml. I developed a custom component in
order to change the structure of the xml that is outputed by flat file
disassembler component. I have placed the components in the pipeline as
follow,

==== decode stage ====

====disassemble stage ====

[FLAT FILE COMPONENT]

[CUSTOM COMPONENT]

==== validate stage ====

[XML VALIDATOR COMPONENT]

I would like to take the output from flat file component and use it as an
input to the custom component. I am trying to validate the new XML structure against a schema using xml validator component. Now is it possible to achive what i need using this approach? or is there any better way of doing it.

Thanks,

Rick
Left by Rick on Sep 22, 2006 1:35 AM

# re: BizTalk Server 2004: Receieve Pipeline woes v2
Requesting Gravatar...
It looks like you have placed your custom component in the disassemble stage, just after the flat file disassembler component. Is your component a disassembler (does it implement the IDisassemblerComponent interface, and preferably the IProbeMessage interface as well)? If not, it should not be placed in the disassemble stage. Anyway, BizTalk will only ever execute a single component in this stage, which in this case will be the flat file disassembler, I guess. I don't think your component is getting called.
So, it looks like the first thing you need to do is put your custom component in the validate stage, just before the XML Validator component. Don’t worry about the fact your component is not validating. The names of pipeline stages aren’t really that relevant. In fact, if you dig deep into BizTalk, you find that it looks like Microsoft originally intended to allow developers to create as many stages as they wanted, and to name them as they wanted. In the end, though, they have effectively restricted you to just the two ‘templates’ (receive and send).
Personally, I would be tempted to do this another way. Would it be acceptable to validate the XML produced directly by the flat file disassemble, and then perform your transformation? It depends what you are really validating. If so, you could use a map on the receive port to perform the transformation. This gets invoked just after the pipeline. Your way could potentially be more optimal, but it will be harder to maintain.
Left by Charles Young on Sep 22, 2006 5:25 AM

# re: BizTalk Server 2004: Receieve Pipeline woes v2
Requesting Gravatar...
Hi,
Nice article.
I have a question.
I created a Custom Receive pipeline. I published HTTP adaptor in IIS.
When I post XML message I get 202. If I want to get 400s status code ( same as we get in web apps.) for invalide message, how can I do that?
Is there any where in custom receive pipe, where I can decide what code to return after Validation is done?

Thank you,
Siva
Left by Siva on Oct 19, 2006 6:39 PM

# re: BizTalk Server 2004: Receieve Pipeline woes v2
Requesting Gravatar...
Did you use Reflector to decompile the code??
I had to use it to discover internals of Biztalk.

Please blog about it if you use it so that people can know.
Left by hung on Mar 20, 2007 11:15 PM

# re: BizTalk Server 2004: Receieve Pipeline woes v2
Requesting Gravatar...
hi,
its realy unnacessary to apriciate ur article its already been several times n it has helped a lot to everyone.still let me say i realy agree to them.
well i guess i face one of the simmilar problem while disassmbling a flat file.FFDASM.exe command on command promt gives me the proper xml the i required but while debugging it in my actual project it enters into getnext once comes out n then hangs up doesnt even throw exception though i hv try catch for it.n i hv to stop debugging n BTSNTSC service.dealy it should go thrice into the getnext as i havt three Detail type msg.at the and it should return all three xml i need.Even while using FFdasm.exe it is giving error in trailer line so probably it is checking trailer against etail schema,so one more question how does FFDasm identify which one is Detail msg and which one trailer,i have set the tag identifier in schema.
Left by parin on Mar 22, 2007 10:26 AM

# Help Needed : Disassembling in BTS 2006
Requesting Gravatar...
I am receiving a flat file with a disassembler having "Recoverable Interchange Processing = True" in custom pipeline. Now what i exactly want is to process only those records which are valid within a file and pass the invalid/incomplete records to another orchestration (Receiving only those messages which has Error Report associated with them by using filters in Receive shape).

However it runs correctly when i m using the same logic in some other orchestration just for disassembling the individual record and then Dumping both the errorous records and correct records to some physical location. Only difference in live project and the dummy is that in live i am again making a combined xml from all valid records and sending a mail notification for errorful once. When I run it with a file containing errors, I am neither getting the valid records nor the errorous records and the orchestration also shows the status as suspended.

Can u suggest how to put a parsing logic before disassembling for this kind of a scenario or is there something i am missing? Please suggest something....
Left by Jitender on Dec 10, 2007 11:58 AM

# re: BizTalk Server 2004: Receieve Pipeline woes v2
Requesting Gravatar...
Hi,
i have some specific requirement in my project. i need to validate the incoming flat file is pipeline and need to notify orchestration about the failure.
shall i go for creating custom component ot is there any way which can help me?

thanks in advance.
Left by Pandurang on May 09, 2008 6:36 AM

Your comment:
 (will show your gravatar)


Copyright © Charles Young | Powered by: GeeksWithBlogs.net