Thursday, January 08, 2009 6:44 PM
I recently had put a DTD parser together, there are a few open source ones out there but only for perl, java and C++, and our target platform is C#, so I had to start over. I initially thought the parser would be pretty straight forward, until I hit upon a little jem in the DTD standard that allows for text substitution.
This puts a whole new complexion on the problem, because the DTD standard can define substitutions (defined using the ENTITY tag), it can take a number of parses to determine a documents true meaning. This makes parser design much more difficult, slows validation and is error prone, see the following example.
Using the following example.
1: <?xml version='1.0'?>
2: <!DOCTYPE test [
3: <!ELEMENT test (#PCDATA) >
4: <!ENTITY % xx '%zz;'>
5: <!ENTITY % zz '<!ENTITY tricky "error-prone" >' >
6: %xx;
7: ]>
8: <test>This sample shows a &tricky; method.</test>
The first pass through, expands the entity %xx; defined in line 4, into its definition in line 6 giving (% expands to %)
1: <?xml version='1.0'?>
2: <!DOCTYPE test [
3: <!ELEMENT test (#PCDATA) >
4: <!ENTITY % xx '%zz;'>
5: <!ENTITY % zz '<!ENTITY tricky "error-prone" >' >
6: %zz;
7: ]>
8: <test>This sample shows a &tricky; method.</test>
The second pass through expands %zz; defined in line 5 into line 6 (< expands to <)
1: <?xml version='1.0'?>
2: <!DOCTYPE test [
3: <!ELEMENT test (#PCDATA) >
4: <!ENTITY % xx '%zz;'>
5: <!ENTITY % zz '<!ENTITY tricky "error-prone" >' >
6: <!ENTITY tricky "error-prone" >
7: ]>
8: <test>This sample shows a &tricky; method.</test>
And finally the &tricky; in the XML is expanded to "error-prone"
1: <?xml version='1.0'?>
2: <!DOCTYPE test [
3: <!ELEMENT test (#PCDATA) >
4: <!ENTITY % xx '%zz;'>
5: <!ENTITY % zz '<!ENTITY tricky "error-prone" >' >
6: <!ENTITY tricky "error-prone" >
7: ]>
8: <test>This sample shows a error-prone method.</test>
Useful tools