Domain Standards and Integration Architecture

In the integration projects I have worked with several domain standards such as HL7, HIPAA, Oagis, OpenTravel Alliance, NIEM, GJXDM, IFX.

In many cases the complexity of the projects was enormous because of wrong usage of those standards. Is this complexity inevitable because of the complexity of the standards or it is because of the flaws in the project architecture and design?

How to use those standards in integration?

In the most cases we see the same crazy approach again and again. The schemas of those standards are used as canonical schemas for respective messages. Doh...

Is it the right approach?

Domain Reference Models

The domain standards were created as the domain reference models. They work as the domain knowledge source, as the domain language dictionaries. Schemas of the domain standards works as the encyclopedia, describing all and everything.

Let’s look into one such schema. 

A Schema Example

As an example I have taken a SalesOrder XML schema of the Oagis standard (the “http://www.openapplications.org/oagis/9" namespace). This SalesOrder schema is compounded of more than 10 thousand elements and thousands attributes.

Look at the ShipToParty parameter of the sales order. Of course, it has an Id of the party. What is interesting, it gives us not a single Id but six Id-s. 

and each Id is not just a single value but it is a value with seven attributes. Wow!

Standard makes a good effort to cover all possible scenarios. Because the standard schema is an encyclopedia of knowledge, right? More scenarios means more parameters and more complexity.

Standards are implemented with a sophisticated type design, where types and code tables are used for almost every node of the schemas. The mentioned SalesOrder schema imports 10 more schemas with type definitions and code tables. And this is typical for the domain standards.

Let’s create an example of the SalesOrder document based on the standard schema. I am using a “Generate Instance” command of the BizTalk project in Visual Studio.  It creates an XML file. Size of this file is about 18 MB. Why is it so large? Because the command generates all possible elements and attributes.

The standard schema structure is highly hierarchical, with dozens nesting levels. Nesting follows the object hierarchy, the relations between objects.

Let's emphasize, the schemas of the domain standards work as the encyclopedia and store all known use scenarios.

How to Use Domain Standard Schemas?

The integration projects use messages to transfer data between integrated systems. Usually in the BizTalk Server projects the messages are defined by the XML schemas. We can use schemas of the domain standard in two ways:

  • as a reference model for our messages. Let’s call it the “Hand-made schema” design.

  • directly as schemas for the messages. Let’s call it the “Schema from standard” design.

  • In the “Hand-made schema” design we work out a schema for messages manually from scratch. A standard schema works as a maximum maximorum sample, which of course is never represented in a real life. The standard schema shows us all possible options and use cases. We create our hand-made schema manually and use the standard schema as a reference model not as a template. We take only parts which are related to our case and throw out all other parts.

    The result schema is small and simple. The next development steps (as schema transformations/mappings) are simple, fast and reliable. The run-time message processing is simple, fast and reliable.

    In the “Schema from standard” design we take a standard schema and use is as a schema for our messages without changes. There is no manual work and it seems the right way. Is it?

    The standard schema is huge and complex. Hence the next development steps would be complex, slow and error prone. The run-time message processing would be slow and error prone.

    Why is development complex?

    Imagine you are doing mapping with a schema compounded from 10 thousand nodes. It takes a good amount of time just to find a single mapped element. 

    We definitely need only small number of the schema nodes because the real data model much simpler than a standard schema. The unused schema nodes will disturb your development and operations all the time till the end of the application life cycle.

    We have to deal with the over-complicated hierarchical structure. The mapping logic is never simple in domain schema. Say a mapped element is on the 5th level inside /Root/Record_1/Record_12/Record_123/Record_1234 record. The schema tells us the Record_12 and Record_1234 are the arrays, and Record_1, Record_12, Record_1234 are optional. Are you ready to start mapping? 

    Also the standard schema elements are defined with many additional small details (facets), as the element length, encoding, format. In the hand-made schema we will use the default type (string) for almost any element, but it is not so in the domain standard schemas, where we have to take care of all those details.

    Now we created mapping and have to test it. Big source schema means an enormous set of all possible test data combinations. It is almost impossible to create a good test coverage for domain schema.

    Take the same example with the shipping order: Let’s assume we need a shipping order for a local retail book store. We don’t want to ship food produce, coal, grain, liquids products, dangerous gods, medicine. We don’t want to ship to the international addresses and use only local currency. We don’t want to use FedEx and UPS, we use only a small delivering company. The domain document standards covers all those use cases but do we care about them? 

    Conclusion

    The domain standard schemas are the repositories of domain knowledge. The message schemas are schemas for the data transferring between real systems.

    Use the domain standard schemas as a reference model for your schemas. Do not use the domain standard schemas as your message schemas.

    Domain Standard Schema as a System Interface

    There is one case when we must deal with the messages using domain standard schemas. It is a case when an integrated system exposes its interface with such schema. It is not happened in contemporary systems but usually with archaic EDI-based systems. Here is only one advice. Convert this data to the simple format on the first steps, as soon as possible. Do not use domain standard schemas inside your integration system. In this article "Complex XML schemas. How to simplify?" I have shown how to simplify work with EDI schemas a little bit.

    See also:

    The Unreasonable Effectiveness of Dynamic Typing for Practical Programs by Robert Smallshire: Here Robert explores the disconnect between the dire outcomes predicted by advocates of static typing versus the near absence of type errors in real world systems built with dynamic languages. There is an interesting similarity between almost religious typing in domain standards and strong typing in several programming languages. 
    Print | posted on Tuesday, April 29, 2014 3:34 PM

    Feedback

    # re: Domain Standards and Integration Architecture

    left by Kumaraguru at 5/2/2014 5:44 PM Gravatar
    I agree with you.
    Post A Comment
    Title:
    Name:
    Email:
    Comment:
    Verification: