Jeff Ferguson

Irritating other people since 1967

  Home  |   Contact  |   Syndication    |   Login
  41 Posts | 0 Stories | 49 Comments | 0 Trackbacks

News

I am a Principal Consultant with Magenic (www.magenic.com).

Twitter












Archives

Suppose that you have a need for a domain specific language that supports mathematical expressions. Specifically, you want to support the following:

  • unsigned integers
  • signed integers
  • addition expressions
  • subtraction expressions

Your first attempt at such a grammar using MGrammar might look like this:

module JeffFerguson.BlogPosts
{
    
    language MathOps
    {
        
        // ignore whitespace
        
        syntax LF = "\u000A";
        syntax CR = "\u000D";
        syntax Space = "\u0020";
        interleave Whitespace = LF | CR | Space;
        
        // operators
        
        token PlusOperator = "+";
        token MinusOperator = "-";
        token StatementEnd = ";";
        
        // numerics
        
        token UnaryOperator = PlusOperator | MinusOperator;
        token Digit = "0".."9";
        token WholeNumber = UnaryOperator? Digit+;
        
        // expressions
        
        syntax AdditionExpression = t1:WholeNumber PlusOperator t2:WholeNumber =>Add[t1,t2];
        syntax SubtractionExpression = t1:WholeNumber MinusOperator t2:WholeNumber => Subtract[t1,t2];
        syntax MathematicalExpression =
            WholeNumber
            | AdditionExpression
            | SubtractionExpression;
                             
        // language syntax
        syntax Statement = e:MathematicalExpression StatementEnd => Statement[valuesof(e)];
        syntax Main = Statement*;
    }
}

There is a subtle problem with this grammar. Suppose that you offer the following input to the grammar:

1;
+2;
-3;
4+5;

This input will give you errors:

[file://untitled1/#4,2-4,4] error 5007: Token JeffFerguson.BlogPosts.MathOps.WholeNumber with text "+5" unexpected. 
[file://untitled1/#4,2-4,4] message 5011: Assuming an insertion of value ';' to continue parsing. Could also have inserted '-', '+'.

The problem is with the “4+5” statement. The issue here is that the “+” symbol is being interpreted as the unary operator for the “5” symbol, and MGrammar is attempting to see the “4” and the “+5” as separate symbols, which is not what was intended. Rather, the input most likely meant to offer the “4+5” (as in “9”, for those of you not up on your math) as an additive operator.

The issue has to do with the fact that MGrammar’s lexical analysis engine runs before the grammatical parser, just as lex runs before yacc. This means that token operations in an MGrammar definition are interpreted before the syntax definitions. Since the unary operator is defined as part of a token definition, then that binding will take precedence over the addition syntax.

This can be fixed by specifying the unary operator as a syntax definition with a slight change to the grammar:

module JeffFerguson.BlogPosts
{
    
    language MathOps
    {
        
        // ignore whitespace
        
        syntax LF = "\u000A";
        syntax CR = "\u000D";
        syntax Space = "\u0020";
        interleave Whitespace = LF | CR | Space;
        
        // operators
        
        token PlusOperator = "+";
        token MinusOperator = "-";
        token StatementEnd = ";";
        
        // numerics
        
        token UnaryOperator = PlusOperator | MinusOperator;
        token Digit = "0".."9";
        syntax WholeNumber =
            n:Digit+ => UnsignedNumber[n]
            | s:UnaryOperator n:Digit+ => SignedNumber[s, valuesof(n)]
            ;
        
        // expressions
        
        syntax AdditionExpression = t1:WholeNumber PlusOperator t2:WholeNumber =>Add[t1,t2];
        syntax SubtractionExpression = t1:WholeNumber MinusOperator t2:WholeNumber => Subtract[t1,t2];
        syntax MathematicalExpression =
            WholeNumber
            | AdditionExpression
            | SubtractionExpression;
                             
        // language syntax
        syntax Statement = e:MathematicalExpression StatementEnd => Statement[valuesof(e)];
        syntax Main = Statement*;
    }
}

The abstract syntax tree (AST) for the input now looks like this:

Main[
  [
    Statement[
      UnsignedNumber[
        [
          "1"
        ]
      ]
    ],
    Statement[
      SignedNumber[
        "+",
        "2"
      ]
    ],
    Statement[
      SignedNumber[
        "-",
        "3"
      ]
    ],
    Statement[
      Add[
        UnsignedNumber[
          [
            "4"
          ]
        ],
        UnsignedNumber[
          [
            "5"
          ]
        ]
      ]
    ]
  ]
]

All better.

posted on Tuesday, March 24, 2009 9:23 AM

Feedback

# re: Interpreting Mathematical Operators in MGrammar 3/24/2009 2:15 PM Sam
Good illustration of balancing token and syntax rules. Token rules could end up consuming too much input, then your syntax rule gets starved.

When you're in that situation, you need to back-off your token rule into simpler parts, and let the syntax rule sort them out.

Thanks Jeff

Post A Comment
Title:
Name:
Email:
Website:
Comment:
Verification: