Suppose that you have a need for a domain specific language that supports mathematical expressions. Specifically, you want to support the following:
- unsigned integers
- signed integers
- addition expressions
- subtraction expressions
Your first attempt at such a grammar using MGrammar might look like this:
module JeffFerguson.BlogPosts
{
language MathOps
{
// ignore whitespace
syntax LF = "\u000A";
syntax CR = "\u000D";
syntax Space = "\u0020";
interleave Whitespace = LF | CR | Space;
// operators
token PlusOperator = "+";
token MinusOperator = "-";
token StatementEnd = ";";
// numerics
token UnaryOperator = PlusOperator | MinusOperator;
token Digit = "0".."9";
token WholeNumber = UnaryOperator? Digit+;
// expressions
syntax AdditionExpression = t1:WholeNumber PlusOperator t2:WholeNumber =>Add[t1,t2];
syntax SubtractionExpression = t1:WholeNumber MinusOperator t2:WholeNumber => Subtract[t1,t2];
syntax MathematicalExpression =
WholeNumber
| AdditionExpression
| SubtractionExpression;
// language syntax
syntax Statement = e:MathematicalExpression StatementEnd => Statement[valuesof(e)];
syntax Main = Statement*;
}
}
There is a subtle problem with this grammar. Suppose that you offer the following input to the grammar:
This input will give you errors:
[file://untitled1/#4,2-4,4] error 5007: Token JeffFerguson.BlogPosts.MathOps.WholeNumber with text "+5" unexpected.
[file://untitled1/#4,2-4,4] message 5011: Assuming an insertion of value ';' to continue parsing. Could also have inserted '-', '+'.
The problem is with the “4+5” statement. The issue here is that the “+” symbol is being interpreted as the unary operator for the “5” symbol, and MGrammar is attempting to see the “4” and the “+5” as separate symbols, which is not what was intended. Rather, the input most likely meant to offer the “4+5” (as in “9”, for those of you not up on your math) as an additive operator.
The issue has to do with the fact that MGrammar’s lexical analysis engine runs before the grammatical parser, just as lex runs before yacc. This means that token operations in an MGrammar definition are interpreted before the syntax definitions. Since the unary operator is defined as part of a token definition, then that binding will take precedence over the addition syntax.
This can be fixed by specifying the unary operator as a syntax definition with a slight change to the grammar:
module JeffFerguson.BlogPosts
{
language MathOps
{
// ignore whitespace
syntax LF = "\u000A";
syntax CR = "\u000D";
syntax Space = "\u0020";
interleave Whitespace = LF | CR | Space;
// operators
token PlusOperator = "+";
token MinusOperator = "-";
token StatementEnd = ";";
// numerics
token UnaryOperator = PlusOperator | MinusOperator;
token Digit = "0".."9";
syntax WholeNumber =
n:Digit+ => UnsignedNumber[n]
| s:UnaryOperator n:Digit+ => SignedNumber[s, valuesof(n)]
;
// expressions
syntax AdditionExpression = t1:WholeNumber PlusOperator t2:WholeNumber =>Add[t1,t2];
syntax SubtractionExpression = t1:WholeNumber MinusOperator t2:WholeNumber => Subtract[t1,t2];
syntax MathematicalExpression =
WholeNumber
| AdditionExpression
| SubtractionExpression;
// language syntax
syntax Statement = e:MathematicalExpression StatementEnd => Statement[valuesof(e)];
syntax Main = Statement*;
}
}
The abstract syntax tree (AST) for the input now looks like this:
Main[
[
Statement[
UnsignedNumber[
[
"1"
]
]
],
Statement[
SignedNumber[
"+",
"2"
]
],
Statement[
SignedNumber[
"-",
"3"
]
],
Statement[
Add[
UnsignedNumber[
[
"4"
]
],
UnsignedNumber[
[
"5"
]
]
]
]
]
]
All better.