Jeff Ferguson

Irritating other people since 1967

  Home  |   Contact  |   Syndication    |   Login
  41 Posts | 0 Stories | 49 Comments | 0 Trackbacks

News

I am a Principal Consultant with Magenic (www.magenic.com).

Twitter












Archives

Sunday, March 15, 2009 #

In my last post, I presented a simple MGrammar definition for a specification of two-dimensional lines. My specification of a Point entity, which, in my language, looks like this:

(4, 3)

is implemented in MGrammar using the following specifications:

// numbers
token Digit = "0".."9";
token WholeNumber = Digit+;

// points
token XCoordinate = WholeNumber;
token YCoordinate = WholeNumber;
syntax Point = "(" x:XCoordinate "," y:YCoordinate ")" => Point[x,y];

What is the “=>” symbol, and what is all of that after the symbol? It’s actually not needed, and you could get by with this:

syntax Point = "(" XCoordinate "," YCoordinate ")";

Let’s take a look as to why this “=>” symbol can be handy.

Remember that running input through MGrammar produces a graph of the input. My last post showed the entire parse tree for the two lines of input I described in that post. The syntax of Point contains five elements:

  • the open parenthesis
  • an integer
  • a comma
  • an integer
  • the close parenthesis

The parse tree for an input of (4, 3) would, by default, look like this (note that, for brevity, a complete parse tree is not shown):

 

image

 

There is nothing wrong with that, but it is more information than the code processing the input really needs. The only two elements of the point that really matter are the two integers. The punctuation can be stripped out, since it is not adding any value. the “=>” symbol defines a production that defines how the parse tree should actually be generated if the default is not needed.

Let’s take a look once again at the definition of Point:

syntax Point = "(" x:XCoordinate "," y:YCoordinate ")" => Point[x,y];

There are a few things to note here:

  • the XCoordinate token is prefaced by a “local name” of x
  • the YCoordinate token is prefaced by a “local name” of y

Those local names are used in the right hand side of the production, which specifies what should actually appear in the parse tree. In our case, we really only want the two integers, and we specify the comma delimited items that should appear in the parse tree within square brackets. The items are specified using these “local names”. The production in this case directs the parse tree to contain only x and y and everything else can be left out. That will leave us with a parse tree like this:

 

image

 

This gives the .NET code that is actually processing the input that much less to wade through when examining the input. Handy.

  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati

Some of my previous posts have described a lexical analysis engine called Sotue. This work came from my passion for the business of parsing strings into contextual elements to be used in applications such as the language analysis phase of compilers. I have spent some time with the Jan 2009 CTP of Microsoft “Oslo”, and have found the MGrammar language construction technology particularly fascinating. I thought that I would take some time in blog entries describing my findings.

I have a calculus book in my library, and flipping through the book made me wonder if I could write some sort of domain-specific grammar for the calculus and analytic geometry problems it presented. It seemed like using MGrammar for this work would help me understand how MGrammar worked, and so I set out to write a domain specific language for calculus and analytic geometry, which I dubbed Calculatix. The first pages of the book contains a review of two-dimensional lines, and I decided to start small and build a language to define lines.

The design I came up with for my language (which only supports a single statement type at this point) looks like this:

line lineidentifier (XCoordinate1, YCoordinate1) (XCoordinate2, YCoordinate2);

This syntax allows me to write code like this to define lines:

line LineA (4,3) (2,5);
line LineB (5,6) (5,1);

This simple code defines two lines:

  • a line named LineA that runs from point (4, 3) to point (2, 5)
  • a line named LineB that runs from point (5, 6) to point (5, 1)

The MGrammar definition for this language is as follows:

module JeffFerguson.Calculatix.Grammars
{
    language Calculus
    {

        // ignore whitespace
        syntax LF = "\u000A";
        syntax CR = "\u000D";
        syntax Space = "\u0020";
        interleave Whitespace = LF | CR | Space;

         // numbers
        token Digit = "0".."9";
        token WholeNumber = Digit+;

        // identifiers
        token Uppercase = "A".."Z";
        token Lowercase = "a".."z";
        token Alphabetic = Uppercase | Lowercase;
        token Identifier = Alphabetic (Alphabetic | Digit)*;

        // points
        token XCoordinate = WholeNumber;
        token YCoordinate = WholeNumber; 
        syntax Point = "(" x:XCoordinate "," y:YCoordinate ")" =>Point[x,y];

        // lines
        token LineToken = "line";
        syntax LineExpression = LineToken i:Identifier p1:Point p2:Point => Line[i, p1, p2];
   
        // syntax
        syntax Statement = e:(LineExpression) ";" => Statement[valuesof(e)];
        syntax Main = s:Statement* => Main[valuesof(s)];
    }
}

Remember our input?

line LineA (4,3) (2,5);
line LineB (5,6) (5,1);

This input, processed against the language defined in the MGrammar definition shown above, produces a graph that, at a conceptual level, looks like this (the unshaded boxes represent higher-level constructs that are broken down, while the shaded boxes represent pieces of data):

 

image

 

I will take a deeper look at this grammar in the next few blog posts, which will provide answers to the following questions:

  • “What are those funny => symbols in the token definitions and what good are they doing?”
  • “So, now that I have this language defined, how do I use it? How do I write code in this language, and what processes it at runtime?”
  • Share This Post:
  • Share on Twitter
  • Share on Facebook
  • Share on Technorati