Geeks With Blogs
David Douglass .NET on My Mind

.NET framework 2.0 includes a change so subtle it might appear insignificant.  A new property, XmlReader.SchemaInfo, provides access to the entire schema definition of the current node.  This offers a world of possibilities for making software more dynamic by acting on the schema; reflection for XML if you will.

For example, XML schema includes an annotation element.  Within an annotation you can specify documentation and appinfo (conceptually like a processing instruction but specific to a node or type).  Both of these can include mark up, as complex as you need (just use a separate namespace).  Thus, when processing an instance document, you can discover how to handle the document by getting the appinfo from the document's schema.  Anything else you might want (datatype, regular expression restrictions, maximum length, etc.) is also available.

One word of caution, this is a tough object model to work with.  The hierarchies are deep and many of the properties are typed as base classes which need to be downcast to the actual type to get at the information you need.  Of course, the MSDN documentation is quite skimpy.  I found the best way to understand this was to examine objects at run time using the Visual Studio debugger.

Below is a sample program with a schema and instance document that demonstrates accessing XML schema information during validation.

My thanks to Shawn Curlis and Koce Ivanov of Microsoft for pointing out XmlReader.SchemaInfo to me.

using System;
using System.Collections.Generic;
using System.Text;
using System.Xml;
using System.Xml.Schema;
using System.IO;

namespace SchemaDemo {

   /// <summary>
   /// Program to demonstrate getting XML schema information during document validation
   /// </summary>
   class Program {

      /// <summary>
      /// Standard program entry point
      /// </summary>
      /// <param name="args">command line argumemnts (not used)</param>
      static void Main(string[] args) {
         string innerText = null;
         XmlSchema schema = new XmlSchema();

         //   get the schema and instance document from the project directory
         FileStream schemaStream = new FileStream(@"..\..\sample.xsd", FileMode.Open, FileAccess.Read);
         FileStream docStream = new FileStream(@"..\..\sample.xml", FileMode.Open, FileAccess.Read);

         try {

            //   create a validating reader
            schema = XmlSchema.Read(schemaStream,
               delegate(Object sender, ValidationEventArgs e) {throw new Exception("document validation failed: " + e.Message);});
            XmlReaderSettings settings = new XmlReaderSettings();
            settings.ValidationType = ValidationType.Schema;
            settings.Schemas.Add(schema);
            settings.ValidationEventHandler +=
               delegate(Object sender, ValidationEventArgs e) {throw new Exception("document validation failed: " + e.Message);};
            XmlReader reader = XmlReader.Create(docStream, settings);

            //   loop through the documnent
            while (reader.Read()) {

               //   dump information about any attributes
               if (reader.HasAttributes) {
                  while (reader.MoveToNextAttribute()) {
                     string data = reader.Value;
                     describe(reader, data);
                  }
               }

               //   Capture the an elements text.  This is necessary because
               //   the schema information isn't available when the reader is
               //   on the text node; it is available only when the reader is
               //   on the element start or element end tag.  Note that this
               //   simple method only works with data oriented documemts
               //   (no mixed content).
               if (reader.NodeType == XmlNodeType.Text) innerText = reader.Value;

               //   dump information about the element
               if (reader.NodeType == XmlNodeType.EndElement) {
                  describe(reader, innerText);
                  innerText = null;
               }
            }
         } catch (Exception e) {
            Console.WriteLine(e.Message);
         }
         Console.WriteLine();
         Console.WriteLine("Press Enter to Exit...");
         Console.ReadLine();
      }

      /// <summary>
      /// dump information about the current node and the related schema information
      /// </summary>
      /// <param name="reader">the XML reader validating the document</param>
      /// <param name="data">the data of the node; either the value of an
      /// attribute or the text of an element</param>
      private static void describe(XmlReader reader, string data) {
         if (reader.NodeType == XmlNodeType.EndElement || reader.NodeType == XmlNodeType.Attribute) {

            //   dump the type of element and its name
            Console.WriteLine(reader.NodeType.ToString() + ": " + reader.Name);

            //   dump the type of the node as it's known in the XSD type system
            string xmlDataType = getXmlDataType(reader);
            if (xmlDataType != null) Console.WriteLine("\t" + xmlDataType);

            //   dump the type of the node as it's known in the CLR type system
            //   and the node value as formatted by ToString() using the correct
            //   CLR type
            Type clrType = getClrType(reader);
            if (clrType != null) {
               Console.WriteLine("\t" + clrType.FullName);
               if (reader.NodeType == XmlNodeType.EndElement && data != null) {
                  Console.WriteLine("\t" + getTypedData(reader, data, clrType).ToString());
               } else if (reader.NodeType == XmlNodeType.Attribute) {
                  Console.WriteLine("\t" + getTypedData(reader, data, clrType).ToString());
               }
            }

            //   dump the schema appinfo associated with the node
            List<string> appInfo = getAppInfo(reader);
            if (appInfo != null && appInfo.Count > 0) {
               Console.WriteLine("\tAppInfo");
               foreach (string info in appInfo) Console.WriteLine("\t\t" + info);
            }

            //   dump the schema documentation associated with the node
            List<string> documentation = getDocumentation(reader);
            if (documentation != null && documentation.Count > 0) {
               Console.WriteLine("\tDocumentation");
               foreach (string doc in documentation) Console.WriteLine("\t\t" + doc);
            }

            //   dump all the regular expressions the restrict the node, if any
            List<string> patterns = getPattern(reader);
            if (patterns != null && patterns.Count > 0) {
               Console.WriteLine("\tPatterns");
               foreach (string pattern in patterns) Console.WriteLine("\t\t" + pattern);
            }

            //   dump the maximum length of the node, if specified
            List<int> maxLengths = getMaxLength(reader);
            if (maxLengths != null && maxLengths.Count > 0) {
               Console.WriteLine("\tMax Lengths");
               foreach (int max in maxLengths) Console.WriteLine("\t\t" + max.ToString());
            }
            Console.WriteLine();
         }
      }

      /// <summary>
      /// get the type of the node as a CLR datatype
      /// </summary>
      /// <param name="reader">the XML reader validating the document</param>
      /// <returns>CLR type</returns>
      private static Type getClrType(XmlReader reader) {
         if (reader.SchemaInfo.SchemaType == null || reader.SchemaInfo.SchemaType.Datatype == null) return null;
         return reader.SchemaInfo.SchemaType.Datatype.ValueType;
      }

      /// <summary>
      /// get the data of the node as a CLR type
      /// </summary>
      /// <param name="reader">the XML reader validating the document</param>
      /// <param name="data">the value of the node as it appears in the document</param>
      /// <param name="dataType">the CLR type of the node</param>
      /// <returns></returns>
      private static object getTypedData(XmlReader reader, object data, Type dataType) {
         if (reader.SchemaInfo.SchemaType == null) return null;
         return reader.SchemaInfo.SchemaType.Datatype.ChangeType(data, dataType);
      }

      /// <summary>
      /// get the appinfo information associated with the node
      /// </summary>
      /// <param name="reader">the XML reader validating the document</param>
      /// <returns>all the appinfo information</returns>
      private static List<string> getAppInfo(XmlReader reader) {
         XmlSchemaObjectCollection annotations = getAnnotations(reader);
         if (annotations == null) return null;
         List<string> list = new List<string>();
         foreach (XmlSchemaObject annotation in annotations) {
            if (annotation is XmlSchemaAppInfo) {
               foreach (XmlNode appInfo in ((XmlSchemaAppInfo) annotation).Markup) list.Add(appInfo.InnerText);
            }
         }
         return list;
      }

      /// <summary>
      /// get the documentation information associated with the node
      /// </summary>
      /// <param name="reader">the XML reader validating the document</param>
      /// <returns>all the documentation information</returns>
      private static List<string> getDocumentation(XmlReader reader) {
         XmlSchemaObjectCollection annotations = getAnnotations(reader);
         if (annotations == null) return null;
         List<string> list = new List<string>();
         foreach (XmlSchemaObject annotation in annotations) {
            if (annotation is XmlSchemaDocumentation) {
               foreach (XmlNode doc in ((XmlSchemaDocumentation) annotation).Markup) list.Add(doc.InnerText);
            }
         }
         return list;
      }

      /// <summary>
      /// get all the annotations associated with a node
      /// </summary>
      /// <param name="reader">the XML reader validating the document</param>
      /// <returns>collection of annotations</returns>
      private static XmlSchemaObjectCollection getAnnotations(XmlReader reader) {
         XmlSchemaObjectCollection annotations = null;
         switch (reader.NodeType) {
            case XmlNodeType.EndElement:
               if (reader.SchemaInfo.SchemaElement == null) return annotations;
               combine(ref annotations, reader.SchemaInfo.SchemaElement.Annotation);
               if (reader.SchemaInfo.SchemaElement.ElementSchemaType == null) return annotations;
               combine(ref annotations, reader.SchemaInfo.SchemaElement.ElementSchemaType.Annotation);
               break;
            case XmlNodeType.Attribute:
               if (reader.SchemaInfo.SchemaAttribute == null) return annotations;
               combine(ref annotations, reader.SchemaInfo.SchemaAttribute.Annotation);
               if (reader.SchemaInfo.SchemaAttribute.AttributeSchemaType == null) return annotations;
               combine(ref annotations, reader.SchemaInfo.SchemaAttribute.AttributeSchemaType.Annotation);
               break;
            default:
               return null;
         }
         return annotations;
      }

      /// <summary>
      /// combine annotations from both the type definition and the node declaration
      /// </summary>
      /// <param name="collection">returned collection of annotations</param>
      /// <param name="annotations">collection of annotations to combine into first argument</param>
      private static void combine(ref XmlSchemaObjectCollection collection, XmlSchemaAnnotation annotations) {
         if (annotations == null) return;
         if (collection == null) collection = new XmlSchemaObjectCollection();
         foreach (XmlSchemaObject annotation in annotations.Items) collection.Add(annotation);
      }

      /// <summary>
      /// get all the restrictions associated with a node
      /// </summary>
      /// <param name="reader">the XML reader validating the document</param>
      /// <returns>collection of restrictions</returns>
      private static XmlSchemaSimpleTypeRestriction getRestriction(XmlReader reader) {
         XmlSchemaSimpleTypeRestriction restriction;
         XmlSchemaSimpleType simpleType;
         switch (reader.NodeType) {
            case XmlNodeType.EndElement:
               if (reader.SchemaInfo.SchemaElement == null) return null;
               simpleType = reader.SchemaInfo.SchemaElement.ElementSchemaType as XmlSchemaSimpleType;
               if (simpleType == null) return null;
               restriction = simpleType.Content as XmlSchemaSimpleTypeRestriction;
               break;
            case XmlNodeType.Attribute:
               if (reader.SchemaInfo.SchemaAttribute == null) return null;
               restriction = reader.SchemaInfo.SchemaAttribute.AttributeSchemaType.Content as XmlSchemaSimpleTypeRestriction;
               break;
            default:
               return null;
         }
         return restriction;
      }

      /// <summary>
      /// get all the regular expression patterns associated with a node, if any
      /// </summary>
      /// <param name="reader">the XML reader validating the document</param>
      /// <returns>list of regular expressions</returns>
      private static List<string> getPattern(XmlReader reader) {
         XmlSchemaSimpleTypeRestriction restriction = getRestriction(reader);
         if (restriction == null) return null;
         List<string> result = new List<string>();
         foreach (XmlSchemaObject facet in restriction.Facets) {
            if (facet is XmlSchemaPatternFacet) result.Add(((XmlSchemaFacet) facet).Value);
         }
         return result;
      }

      /// <summary>
      /// get the maximum length of a string, if specified
      /// </summary>
      /// <param name="reader">the XML reader validating the document</param>
      /// <returns>maximum length</returns>
      private static List<int> getMaxLength(XmlReader reader) {
         XmlSchemaSimpleTypeRestriction restriction = getRestriction(reader);
         if (restriction == null) return null;
         List<int> result = new List<int>();
         foreach (XmlSchemaObject facet in restriction.Facets) {
            if (facet is XmlSchemaMaxLengthFacet) result.Add(int.Parse(((XmlSchemaFacet) facet).Value));
         }
         return result;
      }

      /// <summary>
      /// get XML schema datatype of a node
      /// </summary>
      /// <param name="reader">the XML reader validating the document</param>
      /// <returns>XML schema data type</returns>
      private static string getXmlDataType(XmlReader reader) {
         if (reader.SchemaInfo.SchemaType == null) return null;
         return reader.SchemaInfo.SchemaType.TypeCode.ToString();
      }
   }
}

SAMPLE SCHEMA

<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="Sample" targetNamespace="urn:test" elementFormDefault="qualified" xmlns="urn:test" xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:simpleType name="stringRegEx">
      <xs:annotation>
         <xs:documentation>only letters A through z (type documentation)</xs:documentation>
      </xs:annotation>
      <xs:restriction base="xs:string">
         <xs:pattern value="[A-z]*" />
      </xs:restriction>
   </xs:simpleType>
   <xs:simpleType name="string20">
      <xs:annotation>
         <xs:documentation>only letters A through z</xs:documentation>
      </xs:annotation>
      <xs:restriction base="xs:string">
         <xs:maxLength value="20" />
      </xs:restriction>
   </xs:simpleType>
   <xs:complexType name="RootType">
      <xs:sequence>
         <xs:element name="empty" type="xs:string" minOccurs="0"  maxOccurs="unbounded"/>
         <xs:element name="StringElement1" nillable="true" type="string20" />
         <xs:element name="StringElement2" default="ABC" type="xs:string" />
         <xs:element name="StringElement3">
            <xs:simpleType>
               <xs:restriction base="xs:string">
                  <xs:pattern value="[A-Z]*" />
               </xs:restriction>
            </xs:simpleType>
         </xs:element>
         <xs:element name="StringWithAttr">
            <xs:complexType>
               <xs:simpleContent>
                  <xs:extension base="xs:string">
                     <xs:attribute name="attr1" type="xs:integer" />
                     <xs:attribute name="attr2" type="xs:duration">
                        <xs:annotation>
                           <xs:appinfo>assemblyName, DotNetClass</xs:appinfo>
                        </xs:annotation>
                     </xs:attribute>
                     <xs:attribute name="attr3" type="stringRegEx">
                        <xs:annotation>
                           <xs:documentation>declaration documentation</xs:documentation>
                        </xs:annotation>
                     </xs:attribute>
                  </xs:extension>
               </xs:simpleContent>
            </xs:complexType>
         </xs:element>
      </xs:sequence>
   </xs:complexType>
   <xs:element name="Root" type="RootType" />
</xs:schema>

SAMPLE INSTANCE DOCUMENT

<?xml version="1.0" encoding="utf-8" ?>
<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:test">
   <empty />
   <empty></empty>
   <StringElement1>first string</StringElement1>
   <StringElement2>second string</StringElement2>
   <StringElement3>UPPERCASEONLY</StringElement3>
   <StringWithAttr attr1="1" attr2="PT24H" attr3="Oz">third string</StringWithAttr>
</Root>

Posted on Tuesday, May 2, 2006 8:46 AM | Back to top


Comments on this post: Accessing XML Schema Information During Document Validation

# re: Accessing XML Schema Information During Document Validation
Requesting Gravatar...
Great artical with complete information.
Left by Mahesh on Jun 04, 2008 3:03 PM

# re: Accessing XML Schema Information During Document Validation
Requesting Gravatar...
Good article.

But its very hard to copy paste into some actual files so one can try and run the code because of the formating.

But thanks anyway.
Left by Jeppe on Sep 15, 2009 7:10 AM

# re: Accessing XML Schema Information During Document Validation
Requesting Gravatar...
I was hoping to do something very similar to this but only in the ValidationEventHandler (where I could pick up error text from the app info (or maybe a custom attributes) in my schema). But although I get the reader object as the sender in the event handler the SchemaInfo is not populated for attributes, seems to work as expected for elements but not attributes.

Can you think of any work arounds?

Thanks

Bruce
Left by Bruce on Oct 13, 2009 1:03 AM

# re: Accessing XML Schema Information During Document Validation
Requesting Gravatar...
This is an excellent utility - very useful. Well done!
Left by Andy on Nov 10, 2010 5:26 PM

Your comment:
 (will show your gravatar)
 


Copyright © David Douglass | Powered by: GeeksWithBlogs.net | Join free