Optimizing Parse Trees Through MGrammar Productions

In my last post, I presented a simple MGrammar definition for a specification of two-dimensional lines. My specification of a Point entity, which, in my language, looks like this:

(4, 3)

is implemented in MGrammar using the following specifications:

// numbers
token Digit = "0".."9";
token WholeNumber = Digit+;

// points
token XCoordinate = WholeNumber;
token YCoordinate = WholeNumber;
syntax Point = "(" x:XCoordinate "," y:YCoordinate ")" => Point[x,y];

What is the “=>” symbol, and what is all of that after the symbol? It’s actually not needed, and you could get by with this:

syntax Point = "(" XCoordinate "," YCoordinate ")";

Let’s take a look as to why this “=>” symbol can be handy.

Remember that running input through MGrammar produces a graph of the input. My last post showed the entire parse tree for the two lines of input I described in that post. The syntax of Point contains five elements:

  • the open parenthesis
  • an integer
  • a comma
  • an integer
  • the close parenthesis

The parse tree for an input of (4, 3) would, by default, look like this (note that, for brevity, a complete parse tree is not shown):

 

image

 

There is nothing wrong with that, but it is more information than the code processing the input really needs. The only two elements of the point that really matter are the two integers. The punctuation can be stripped out, since it is not adding any value. the “=>” symbol defines a production that defines how the parse tree should actually be generated if the default is not needed.

Let’s take a look once again at the definition of Point:

syntax Point = "(" x:XCoordinate "," y:YCoordinate ")" => Point[x,y];

There are a few things to note here:

  • the XCoordinate token is prefaced by a “local name” of x
  • the YCoordinate token is prefaced by a “local name” of y

Those local names are used in the right hand side of the production, which specifies what should actually appear in the parse tree. In our case, we really only want the two integers, and we specify the comma delimited items that should appear in the parse tree within square brackets. The items are specified using these “local names”. The production in this case directs the parse tree to contain only x and y and everything else can be left out. That will leave us with a parse tree like this:

 

image

 

This gives the .NET code that is actually processing the input that much less to wade through when examining the input. Handy.

Print | posted @ Sunday, March 15, 2009 5:21 PM

Comments on this entry:

No comments posted yet.

Post A Comment
Title:
Name:
Email:
Website:
Comment:
Verification:
 
 
Twitter