Problems Parsing DTD's

Thursday, January 08, 2009 6:44 PM

I recently had put a DTD parser together, there are a few open source ones out there but only for perl, java and C++, and our target platform is C#, so I had to start over. I initially thought the parser would be pretty straight forward, until I hit upon a little jem in the DTD standard that allows for text substitution.

This puts a whole new complexion on the problem, because the DTD standard can define substitutions (defined using the ENTITY tag), it can take a number of parses to determine a documents true meaning. This makes parser design much more difficult, slows validation and is error prone, see the following example.


Using the following example.

1: <?xml version='1.0'?>
2: <!DOCTYPE test [
3: <!ELEMENT test (#PCDATA) >
4: <!ENTITY % xx '&#37;zz;'>
5: <!ENTITY % zz '&#60;!ENTITY tricky "error-prone" >' >
6: %xx;  
7: ]>
8: <test>This sample shows a &tricky; method.</test>

 

The first pass through, expands the entity %xx; defined in line 4, into its definition in line 6 giving (&#37; expands to %)

 

1: <?xml version='1.0'?>
2: <!DOCTYPE test [
3: <!ELEMENT test (#PCDATA) >
4: <!ENTITY % xx '&#37;zz;'>
5: <!ENTITY % zz '&#60;!ENTITY tricky "error-prone" >' >
6: %zz;  
7: ]>
8: <test>This sample shows a &tricky; method.</test>

 

The second pass through expands %zz; defined in line 5 into line 6 (&#60; expands to <)

 

1: <?xml version='1.0'?>
2: <!DOCTYPE test [
3: <!ELEMENT test (#PCDATA) >
4: <!ENTITY % xx '&#37;zz;'>
5: <!ENTITY % zz '&#60;!ENTITY tricky "error-prone" >' >
6: <!ENTITY tricky "error-prone" >  
7: ]>
8: <test>This sample shows a &tricky; method.</test>

 

And finally the &tricky; in the XML is expanded to "error-prone"

 

1: <?xml version='1.0'?>
2: <!DOCTYPE test [
3: <!ELEMENT test (#PCDATA) >
4: <!ENTITY % xx '&#37;zz;'>
5: <!ENTITY % zz '&#60;!ENTITY tricky "error-prone" >' >
6: <!ENTITY tricky "error-prone" >  
7: ]>
8: <test>This sample shows a error-prone method.</test>

 

Useful tools

 



  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

Feedback

No comments posted yet.


Post a comment