Some of my previous posts have described a lexical analysis engine called Sotue. This work came from my passion for the business of parsing strings into contextual elements to be used in applications such as the language analysis phase of compilers. I have spent some time with the Jan 2009 CTP of Microsoft “Oslo”, and have found the MGrammar language construction technology particularly fascinating. I thought that I would take some time in blog entries describing my findings.
I have a calculus book in my library, and flipping through the book made me wonder if I could write some sort of domain-specific grammar for the calculus and analytic geometry problems it presented. It seemed like using MGrammar for this work would help me understand how MGrammar worked, and so I set out to write a domain specific language for calculus and analytic geometry, which I dubbed Calculatix. The first pages of the book contains a review of two-dimensional lines, and I decided to start small and build a language to define lines.
The design I came up with for my language (which only supports a single statement type at this point) looks like this:
line lineidentifier (XCoordinate1, YCoordinate1) (XCoordinate2, YCoordinate2);
This syntax allows me to write code like this to define lines:
line LineA (4,3) (2,5);
line LineB (5,6) (5,1);
This simple code defines two lines:
- a line named LineA that runs from point (4, 3) to point (2, 5)
- a line named LineB that runs from point (5, 6) to point (5, 1)
The MGrammar definition for this language is as follows:
module JeffFerguson.Calculatix.Grammars
{
language Calculus
{
// ignore whitespace
syntax LF = "\u000A";
syntax CR = "\u000D";
syntax Space = "\u0020";
interleave Whitespace = LF | CR | Space;
// numbers
token Digit = "0".."9";
token WholeNumber = Digit+;
// identifiers
token Uppercase = "A".."Z";
token Lowercase = "a".."z";
token Alphabetic = Uppercase | Lowercase;
token Identifier = Alphabetic (Alphabetic | Digit)*;
// points
token XCoordinate = WholeNumber;
token YCoordinate = WholeNumber;
syntax Point = "(" x:XCoordinate "," y:YCoordinate ")" =>Point[x,y];
// lines
token LineToken = "line";
syntax LineExpression = LineToken i:Identifier p1:Point p2:Point => Line[i, p1, p2];
// syntax
syntax Statement = e:(LineExpression) ";" => Statement[valuesof(e)];
syntax Main = s:Statement* => Main[valuesof(s)];
}
}
Remember our input?
line LineA (4,3) (2,5);
line LineB (5,6) (5,1);
This input, processed against the language defined in the MGrammar definition shown above, produces a graph that, at a conceptual level, looks like this (the unshaded boxes represent higher-level constructs that are broken down, while the shaded boxes represent pieces of data):
I will take a deeper look at this grammar in the next few blog posts, which will provide answers to the following questions:
- “What are those funny => symbols in the token definitions and what good are they doing?”
- “So, now that I have this language defined, how do I use it? How do I write code in this language, and what processes it at runtime?”