Blog Home  Home Feed your aggregator (RSS 2.0)  
ANTLR C# Overview - Part 3 - Manuel Abadia's ASP.NET stuff
 
# Monday, January 8, 2007

Click here to see part 1 of this ANTLR Overview
Click here to see part 2 of this ANTLR Overview

To complete the overview of the main ANTLR runtime classes, we’ll study an example.

The following file contains a grammar for evaluating simple expressions:

 

grammar Expr;

 

options {

    language=CSharp;

    output=AST;

    ASTLabelType=CommonTree;

}

 

prog:   ( stat

                {

                    if ($stat.tree != null){

                        Console.WriteLine($stat.tree.ToStringTree());

                    } else {

                        Console.WriteLine();

                    }

                }

        )+ ;

 

stat:   expr NEWLINE        -> expr

    |   ID '=' expr NEWLINE -> ^('=' ID expr)

    |   NEWLINE                ->

    ;

 

expr:   multExpr (('+'^^|'-'^^) multExpr)*

    ;

 

multExpr

    :   atom ('*'^^ atom)*

    ;

 

atom:   INT

    |   ID

    |   '('! expr ')'!

    ;

 

ID  :   ('a'..'z'|'A'..'Z')+ ;

INT :   '0'..'9'+ ;

NEWLINE:'\r'? '\n' ;

WS  :   (' '|'\t')+ { Skip(); } ;

 

 

In this file, there are parse rules and lexer rules. Parser rules are lowercase and lexer rules are uppercase.

 

The tokens that will be recognized by the lexer are defined at the end of the file and are ID (identifier), INT (integer), NEWLINE (carriage return), WS (white space). Other tokens like ‘=’ don’t have a lexer rule but a rule will be generated implicity by ANTLR as we’ll see later.

 

The grammar start symbol is prog, that represents a program. A program is made of one ore more statements. A statement is:
• an expression following by a NEWLINE token.
• or an ID token, followed by an equal token and then followed by a NEWLINE token.
• or a NEWLINE token.

 

An expression is a multExpr that can be followed by the addition or substraction of more multExpr. A multExpr is an atom that can be followed by the multiplication of more atoms. Finally, an atom is an integer, an identifier or an expression enclosed in brackets.

This way of defining expressions is very common. As the grammar supports addition, substraction and multiplication, and the multiplication has higher precedence, the grammar rules are defined so the multiplication will appear later in a top down parse tree.

When we run ANTLR, it generates a Expr.Tokens file with the integer assigned to each token:

 

NEWLINE=4

ID=5

INT=6

WS=7

'='=8

'+'=9

'-'=10

'*'=11

'('=12

')'=13


 

The indexes 0 to 3 are reserved for special tokens (invalid, EOR, DOWN, UP). The tokens explicitly defined in the .g file are the first 4 shown above, but the others were created implicitly by ANTLR.

 

The first part of the Lexer starts defining this tokens:

 

public class ExprLexer : Lexer

{

    public const int T10 = 10;

    public const int T11 = 11;

    public const int T9 = 9;

    public const int INT = 6;

    public const int EOF = -1;

    public const int WS = 7;

    public const int T12 = 12;

    public const int Tokens = 14;

    public const int T8 = 8;

    public const int T13 = 13;

    public const int NEWLINE = 4;

    public const int ID = 5;

 

 

The tokens created implicitly get a name like TXXX, where XXX is the integer used to represent that token.

The methods present in the ExprLexer class are the constructors (that call a helper method called InitializeCyclicDFAs), the mTokens method and the mXXX methods, where XXX is one of the constants defined at the top of the class.

Remember that the mTokens method is called when the NextToken method is called, and what it does is to select one of the mXXX methods to call based on the input (using a DFA for this). The mTokens method of the ExprLexer is:

 

override public void mTokens() // throws RecognitionException

    {

        // Expr.g:1:10: ( T8 | T9 | T10 | T11 | T12 | T13 | ID | INT | NEWLINE | WS )

        int alt5 = 10;

        switch ( input.LA(1) )

        {

        case '=':

            alt5 = 1; break;

        case '+':

            alt5 = 2; break;

        case '-':

            alt5 = 3; break;

        case '*':

            alt5 = 4; break;

        case '(':

            alt5 = 5; break;

        case ')':

            alt5 = 6; break;

        case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G': case 'H': case 'I':

        case 'J': case 'K': case 'L': case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':

        case 'S': case 'T': case 'U': case 'V': case 'W': case 'X': case 'Y': case 'Z':

        case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'g': case 'h': case 'i':

        case 'j': case 'k': case 'l': case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':

        case 's': case 't': case 'u': case 'v': case 'w': case 'x': case 'y': case 'z':

            alt5 = 7;

            break;

        case '0': case '1': case '2': case '3': case '4':

        case '5': case '6': case '7': case '8': case '9':

            alt5 = 8;

            break;

        case '\n': case '\r':

            alt5 = 9;

            break;

        case '\t': case ' ':

            alt5 = 10;

            break;

            default:

                NoViableAltException nvae_d5s0 =

                    new NoViableAltException("1:1: Tokens : ( T8 | T9 | T10 | T11 | T12 | T13 | ID | INT | NEWLINE | WS );", 5, 0, input);

 

                throw nvae_d5s0;

        }

 

        switch (alt5)

        {

            case 1 :

                // Expr.g:1:10: T8

                { mT8(); } break;

            case 2 :

                // Expr.g:1:13: T9

                { mT9(); } break;

            case 3 :

                // Expr.g:1:16: T10

                { mT10(); } break;

            case 4 :

                // Expr.g:1:20: T11

                { mT11(); } break;

            case 5 :

                // Expr.g:1:24: T12

                { mT12(); } break;

            case 6 :

                // Expr.g:1:28: T13

                { mT13(); } break;

            case 7 :

                // Expr.g:1:32: ID

                { mID(); } break;

            case 8 :

                // Expr.g:1:35: INT

                { mINT(); } break;

            case 9 :

                // Expr.g:1:39: NEWLINE

                { mNEWLINE(); } break;

            case 10 :

                // Expr.g:1:47: WS

                { mWS(); } break;

        }

    }

 

 

In this sample the logic is really simple. It sets the alternative depending on the look ahead (first switch) and then calls to the corresponding mXXX method (second switch).

The code for the methods mXXX is also straightforward. For example, the code for the mT8 method (that has to match an equals character) is:

 

public void mT8() // throws RecognitionException [2]

    {

        try {

            ruleNestingLevel++;

            int _type = T8;

            int _start = CharIndex;

            int _line = Line;

            int _charPosition = CharPositionInLine;

            int _channel = Token.DEFAULT_CHANNEL;

            // Expr.g:7:6: ( '=' )

            // Expr.g:7:6: '='

            {

                Match('=');

            }

 

            if ((token == null) && (ruleNestingLevel == 1)) {

                Emit(_type, _line, _charPosition, _channel, _start, CharIndex - 1);

            }

        } finally {

            ruleNestingLevel--;

        }

    }

 


The code starts incrementing the ruleNestingLevel variable (that is used to emit tokens only in non fragment lexer rules) and setting the information about the token being generated using the current token type and channel and the properties of the Lexer class (CharIndex, Line, CharPositionInLine). Then it calls the Lexer method Match that checks if the current character is what was expected and creates the associated token. Finally the nesting level is left unchanged.

The method mWS (that recognizes white spaces and tabs) is:

 

public void mWS() // throws RecognitionException [2]

    {

        try {

            ruleNestingLevel++;

            int _type = WS;

            int _start = CharIndex;

            int _line = Line;

            int _charPosition = CharPositionInLine;

            int _channel = Token.DEFAULT_CHANNEL;

            // Expr.g:39:9: ( ( (' '|'\\t'))+ )

            // Expr.g:39:9: ( (' '|'\\t'))+

            {

                // Expr.g:39:9: ( (' '|'\\t'))+

                int cnt4 = 0;

                do

                {

                    int alt4 = 2;

                    int LA4_0 = input.LA(1);

                    if ( (LA4_0 == '\t' || LA4_0 == ' ') ){

                        alt4 = 1;

                    }

 

 

                    switch (alt4)

                    {

                        case 1 :

                            // Expr.g:39:10: (' '|'\\t')

                            {

                                if ( input.LA(1) == '\t' || input.LA(1) == ' ' ) {

                                    input.Consume();

 

                                } else  {

                                    MismatchedSetException mse =

                                        new MismatchedSetException(null,input);

                                    Recover(mse);    throw mse;

                                }

                            }

                            break;

 

                        default:

                            if ( cnt4 >= 1 ) goto loop4;

                                EarlyExitException eee =

                                    new EarlyExitException(4, input);

                                throw eee;

                    }

                    cnt4++;

                } while (true);

 

                loop4:

                    ;    // Stops C# compiler whinging that label 'loop4' has no statements

 

                Skip();

            }

 

            if ( (token == null) && (ruleNestingLevel == 1) ){

                Emit(_type, _line, _charPosition, _channel, _start, CharIndex-1);

            }

        } finally {

            ruleNestingLevel--;

        }

    }

 

The method has the beginning and the end part like the previous one, but the recognition logic in the middle is a bit more complex. It has a counter initialized to 0 because the rule must find at least one white space or tab. It starts looking for a white space or tab and when it founds one consumes it. When there aren’t more white spaces or tabs, the code exits the do. If no character was consumed (the counter was 0), an execption is thrown. Otherwise, the code that followed the lexer rule is executed.  The Skip methods assigns the SKIP_TOKEN to the token field so no token is emmited later.

 

The other methods in the lexer class are very similar to the ones we’ve studied.

 

Let’s move to the parser. The parser starts with the following code:

 

public class ExprParser : Parser

{

  public static readonly string[] tokenNames = new string[]

    {

        "<invalid>",

        "<EOR>",

        "<DOWN>",

        "<UP>",

        "NEWLINE",

        "ID",

        "INT",

        "WS",

        "'='",

        "'+'",

        "'-'",

        "'*'",

        "'('",

        "')'"

    };

 

    public const int INT = 6;

    public const int EOF = -1;

    public const int WS = 7;

    public const int NEWLINE = 4;

    public const int ID = 5;

 

 

that declares the tokens used in the lexer.

The class continues like this:

 

public ExprParser(ITokenStream input)

            : base(input)

        {

            InitializeCyclicDFAs();

        }

 

    protected ITreeAdaptor adaptor = new CommonTreeAdaptor();

 

    public ITreeAdaptor TreeAdaptor

    {

        get { return this.adaptor; }

        set { this.adaptor = value; }

    }

 

    override public string[] TokenNames

    {

        get { return tokenNames; }

    }

 

    override public string GrammarFileName

    {

        get { return "Expr.g"; }

    }

 

 

So the constructor is declared and an ITreeAdaptor is created (because we set the option to build an AST) and exposed as a property. The other properties expose information about this recognizer.

 

Then we have a nested class for each parser rule and a method for each parser rule that returns the previous nested class. For the prog rule we have:

 

public class prog_return : ParserRuleReturnScope

    {

        internal CommonTree tree;

        override public object Tree

        {

            get { return tree; }

        }

    };

 

    // $ANTLR start prog

    // Expr.g:9:1: prog : ( stat )+ ;

    public prog_return prog() // throws RecognitionException [1]

    {  

        prog_return retval = new prog_return();

        retval.start = input.LT(1);

 

        CommonTree root_0 = null;

 

        stat_return stat1 = null;

 

 

 

        try

        {

            // Expr.g:9:9: ( ( stat )+ )

            // Expr.g:9:9: ( stat )+

            {

                root_0 = (CommonTree)adaptor.GetNilNode();

 

                // Expr.g:9:9: ( stat )+

                int cnt1 = 0;

                do

                {

                    int alt1 = 2;

                    int LA1_0 = input.LA(1);

                    if ( ((LA1_0 >= NEWLINE && LA1_0 <= INT) || LA1_0 == 12) )

                    {

                        alt1 = 1;

                    }

 

 

                    switch (alt1)

                    {

                        case 1 :

                            // Expr.g:9:11: stat

                            {

                                CommonTree root_1 = (CommonTree)adaptor.GetNilNode();

 

                                PushFollow(FOLLOW_stat_in_prog43);

                                stat1 = stat();

                                followingStackPointer_--;

 

                                adaptor.AddChild(root_1, stat1.Tree);

 

                                                    if (((CommonTree)stat1.tree) != null){

                                                        Console.WriteLine(((CommonTree)stat1.tree).ToStringTree());

                                                    } else {

                                                        Console.WriteLine();

                                                    }

 

 

                                adaptor.AddChild(root_0, root_1);

 

                            }

                            break;

 

                        default:

                            if ( cnt1 >= 1 ) goto loop1;

                                EarlyExitException eee =

                                    new EarlyExitException(1, input);

                                throw eee;

                    }

                    cnt1++;

                } while (true);

 

                loop1:

                    ;    // Stops C# compiler whinging that label 'loop1' has no statements

 

 

            }

 

        }

        catch (RecognitionException re)

        {

            ReportError(re);

            Recover(input,re);

        }

        finally

        {

            retval.stop = input.LT(-1);

 

                retval.tree = (CommonTree)adaptor.RulePostProcessing(root_0);

                adaptor.SetTokenBoundaries(retval.Tree, retval.start, retval.stop);

 

        }

        return retval;

    }

 

 

 

As explained in a previous article, the return classes generated when we have set the option to generate an AST inherit from ParserRuleReturnScope and have a Tree property with the AST generated for that rule.

 

The prog method starts creating the return value and setting the initial token for the AST. The method ends setting the last token for the AST and saves the resulting AST.

 

The core of the method starts after the try. The code starts creating a “nil” node. As the rule is prog : ( stat )+, the generated code is a do…while with a counter to check that at least the stat rule has been entered at least one. The basic structure of the generated code is similar to the one we studied in the lexer method mWS. If the token found is one of the first tokens that appear in a stat rule, the code goes to the case 1: of the switch.

 

A new “nil” node is created, the stat rule is invoked, and the AST generated by the stat rule is added as a child to “nil” node created previously. The BaseRecognizer class uses a stack to store the token types that can follow a rule invocation. Before calling a rule the follow set is pushed in this stack (PushFollow) and after the call, the follow set is removed from the stack(followingStackPointer_--). This is used when there is an error in the parsing process so ANTLR can continue execution adding a token or deleting a token.

 

The code that follows is the one that we included in the parser rule:

 

prog:   ( stat

                {

                    if ($stat.tree != null){

                        Console.WriteLine($stat.tree.ToStringTree());

                    } else {

                        Console.WriteLine();

                    }

                }

        )+ ;

 

 

Finally the first tree created at the top of the method (that contained a “nil” node) is combined with the tree obtained from the stat rule.

 

As the others methods are very similar we’re going to study only the generated method for the atom rule:

 

atom:   INT

    |   ID

    |   '('! expr ')'!

    ;

 

    // $ANTLR start atom

    // Expr.g:31:1: atom : ( INT | ID | '('! expr ')'! );

    public atom_return atom() // throws RecognitionException [1]

    {  

        atom_return retval = new atom_return();

        retval.start = input.LT(1);

 

        CommonTree root_0 = null;

 

        IToken INT16 = null;

        IToken ID17 = null;

        IToken char_literal18 = null;

        IToken char_literal20 = null;

        expr_return expr19 = null;

 

 

        CommonTree INT16_tree=null;

        CommonTree ID17_tree=null;

        CommonTree char_literal18_tree=null;

        CommonTree char_literal20_tree=null;

 

        try

        {

            // Expr.g:31:9: ( INT | ID | '('! expr ')'! )

            int alt6 = 3;

            switch ( input.LA(1) )

            {

            case INT:

                alt6 = 1;

                break;

            case ID:

                alt6 = 2;

                break;

            case 12:

                alt6 = 3;

                break;

                default:

                    NoViableAltException nvae_d6s0 =

                        new NoViableAltException("31:1: atom : ( INT | ID | '('! expr ')'! );", 6, 0, input);

 

                    throw nvae_d6s0;

            }

 

            switch (alt6)

            {

                case 1 :

                    // Expr.g:31:9: INT

                    {

                        root_0 = (CommonTree)adaptor.GetNilNode();

 

                        INT16 = (IToken)input.LT(1);

                        Match(input,INT,FOLLOW_INT_in_atom191);

                        INT16_tree = (CommonTree)adaptor.Create(INT16);

                        adaptor.AddChild(root_0, INT16_tree);

 

 

                    }

                    break;

                case 2 :

                    // Expr.g:32:9: ID

                    {

                        root_0 = (CommonTree)adaptor.GetNilNode();

 

                        ID17 = (IToken)input.LT(1);

                        Match(input,ID,FOLLOW_ID_in_atom202);

                        ID17_tree = (CommonTree)adaptor.Create(ID17);

                        adaptor.AddChild(root_0, ID17_tree);

 

 

                    }

                    break;

                case 3 :

                    // Expr.g:33:9: '('! expr ')'!

                    {

                        root_0 = (CommonTree)adaptor.GetNilNode();

 

                        char_literal18 = (IToken)input.LT(1);

                        Match(input,12,FOLLOW_12_in_atom212);

                        PushFollow(FOLLOW_expr_in_atom215);

                        expr19 = expr();

                        followingStackPointer_--;

 

                        adaptor.AddChild(root_0, expr19.Tree);

                        char_literal20 = (IToken)input.LT(1);

                        Match(input,13,FOLLOW_13_in_atom217);

 

                    }

                    break;

 

            }

        }

        catch (RecognitionException re)

        {

            ReportError(re);

            Recover(input,re);

        }

        finally

        {

            retval.stop = input.LT(-1);

 

                retval.tree = (CommonTree)adaptor.RulePostProcessing(root_0);

                adaptor.SetTokenBoundaries(retval.Tree, retval.start, retval.stop);

 

        }

        return retval;

    }

 

 

The start and the end of the method is similar to the prog method, although this one declares a lot of variables that will be used later.

 

The core of the method starts in the first switch. Based on the next token an alternative is choosen. If it isn’t one of the expected tokens (INT, ID or ‘(‘) and error is thrown.

 

The second switch has the code to execute based on the selected alternative. The first two alternatives are atom : INT | ID, and what the code does is to create a “nil” node, match the token with the expected one, and add the token as a child of the “nil” node.

The last alternative, atom: '('! expr ')'!, create a “nil” node, matches the ‘(‘ token, calls to the expr method, adds the AST generated by the expression as a child of the “nil” node, and matches the ‘)’ token. In this alternative the matched tokens haven’t been added to the AST because the ! operator specifies to not add them.

 

The parse tree generated for the input
a=3
a+5

is:

 

example_parseTree.PNG

 

The atom rule is called 3 times (for 3, for a and for 5).


The AST generation for the atom 3 is (before and after the finally keyword):

 

atom_pre1.PNG

 

atom1.PNG

 

The “nil” node was optimized in the RulePostProcessing method.

 

The expr rule is called twice (for 3 and for a+5).

The optimized AST for the a+5 expression is:

 

expr.PNG

 

The final AST generated is:

 

prog.PNG

 


The images of the trees have been created using the StructsViz DebuggerVisualizers.

 

To finish, lets see the code generated for the tree parser. The input to the tree parser based on the example AST is:

 

example_treeNodeStream.PNG

 

The code generated for the tree parser starts like this:

 

public class EvalTreeParser : TreeParser

{

    public static readonly string[] tokenNames = new string[]

    {

        "<invalid>",

        "<EOR>",

        "<DOWN>",

        "<UP>",

        "NEWLINE",

        "ID",

        "INT",

        "WS",

        "'='",

        "'+'",

        "'-'",

        "'*'",

        "'('",

        "')'"

    };

 

    public const int INT = 6;

    public const int EOF = -1;

    public const int WS = 7;

    public const int NEWLINE = 4;

    public const int ID = 5;

 

 

        public EvalTreeParser(ITreeNodeStream input)

            : base(input)

        {

            InitializeCyclicDFAs();

        }

 

 

    override public string[] TokenNames

    {

        get { return tokenNames; }

    }

 

    override public string GrammarFileName

    {

        get { return "Eval.g"; }

    }

 

 

    /** Map variable name to Integer object holding value */

    public Hashtable memory = new Hashtable();

 

 

If you compare this code with the parser we studied before you can see that is identical to it but it doesn’t has the tree adaptor (because the tree parser does not generate an AST). If we compare the code generated for the rules, we’ll find out that it is very similar to the code generated for the methods studied earlier in the parser (but without the code for AST generation). For example, the code for the expr rule is:

 

    // $ANTLR start expr

    // Eval.g:24:1: expr returns [int value] : ( ^( '+' a= expr b= expr ) | ^( '-' a= expr b= expr ) | ^( '*' a= expr b= expr ) | ID | integer= INT );

    public int expr() // throws RecognitionException [1]

    {  

 

        int value = 0;

 

        CommonTree integer = null;

        CommonTree ID4 = null;

        int a = 0;

 

        int b = 0;

 

 

        try

        {

            // Eval.g:25:9: ( ^( '+' a= expr b= expr ) | ^( '-' a= expr b= expr ) | ^( '*' a= expr b= expr ) | ID | integer= INT )

            int alt3 = 5;

            switch ( input.LA(1) )

            {

            case 9:

                alt3 = 1;

                break;

            case 10:

                alt3 = 2;

                break;

            case 11:

                alt3 = 3;

                break;

            case ID:

                alt3 = 4;

                break;

            case INT:

                alt3 = 5;

                break;

                default:

                    NoViableAltException nvae_d3s0 =

                        new NoViableAltException("24:1: expr returns [int value] : ( ^( '+' a= expr b= expr ) | ^( '-' a= expr b= expr ) | ^( '*' a= expr b= expr ) | ID | integer= INT );", 3, 0, input);

 

                    throw nvae_d3s0;

            }

 

            switch (alt3)

            {

                case 1 :

                    // Eval.g:25:9: ^( '+' a= expr b= expr )

                    {

                        Match(input,9,FOLLOW_9_in_expr109);

 

                        Match(input, Token.DOWN, null);

                        PushFollow(FOLLOW_expr_in_expr113);

                        a = expr();

                        followingStackPointer_--;

 

                        PushFollow(FOLLOW_expr_in_expr117);

                        b = expr();

                        followingStackPointer_--;

 

 

                        Match(input, Token.UP, null);

                        value =  a+b;

 

                    }

                    break;

                case 2 :

                    // Eval.g:26:9: ^( '-' a= expr b= expr )

                    {

                        Match(input,10,FOLLOW_10_in_expr132);

 

                        Match(input, Token.DOWN, null);

                        PushFollow(FOLLOW_expr_in_expr136);

                        a = expr();

                        followingStackPointer_--;

 

                        PushFollow(FOLLOW_expr_in_expr140);

                        b = expr();

                        followingStackPointer_--;

 

 

                        Match(input, Token.UP, null);

                        value =  a-b;

 

                    }

                    break;

                case 3 :

                    // Eval.g:27:9: ^( '*' a= expr b= expr )

                    {

                        Match(input,11,FOLLOW_11_in_expr158);

 

                        Match(input, Token.DOWN, null);

                        PushFollow(FOLLOW_expr_in_expr162);

                        a = expr();

                        followingStackPointer_--;

 

                        PushFollow(FOLLOW_expr_in_expr166);

                        b = expr();

                        followingStackPointer_--;

 

 

                        Match(input, Token.UP, null);

                        value =  a*b;

 

                    }

                    break;

                case 4 :

                    // Eval.g:28:9: ID

                    {

                        ID4 = (CommonTree)input.LT(1);

                        Match(input,ID,FOLLOW_ID_in_expr180);

 

                                    object obj = memory[ID4.Text];

                                    if (obj != null){

                                        value =  (int)obj;

                                    } else {

                                        Console.WriteLine("undefined variable " + ID4.Text);

                                    }

 

 

                    }

                    break;

                case 5 :

                    // Eval.g:37:9: integer= INT

                    {

                        integer = (CommonTree)input.LT(1);

                        Match(input,INT,FOLLOW_INT_in_expr203);

                        value =  Int32.Parse(integer.Text); /* fix for 3.0b5 bug (will be fixed in next version so integer.Text works) */

 

                    }

                    break;

 

            }

        }

        catch (RecognitionException re)

        {

            ReportError(re);

            Recover(input,re);

        }

        finally

        {

        }

        return value;

    }

 

 

The method starts declaring the variables used in the alternatives and the variable used to hold the return vale.

The first switch decides what alternative to select based on the next token of the input stream. Recall that the rule expr was:

 

expr returns [int value]

    :   ^('+' a=expr b=expr)  { $value = a+b; }

    |   ^('-' a=expr b=expr)  { $value = a-b; }  

    |   ^('*' a=expr b=expr)  { $value = a*b; }

    |   ID

        {

            object obj = memory[$ID.text];

            if (obj != null){

                $value = (int)obj;

            } else {

                Console.WriteLine("undefined variable " + $ID.text);

            }

        }

    |   integer=INT                   { $value = Int32.Parse($integer.text); /* fix for 3.0b5 bug (will be fixed in next version so $INT.text works) */ }

    ;


The code associated to each alternative is inside the second switch. The first 3 alternatives have the same structure but differ in a token and in the operation to perform, so we’ll focus only on the first one. As what we want to match is a tree that has a ‘+’ token (token number 9) as the root, and two expressions as its childs, the code first has a Match to check that the token ‘+’ is present, then a Match to check that the ficticious DOWN token is present (so it is walking to the childs of the root). Then it calls two expr rules (saving the follow information in the stack as we explained earlier), and saving the return value in the variables a and b. Finally, the ficticious UP token is matched so it verifies that there are no more child nodes.

 

The alternatives 4 and 5 are straightforward as they only match a token and have the code inserted in the rule.

And that’s it. You can see that a tree parser is very very similar to a parser.

 

The code used to connect the lexer, the parser and the tree parser is:

 

// sets the input string

ANTLRStringStream strStream = new ANTLRStringStream(

    @"    a=3

        a+5

    ");

 

// creates the lexer

ExprLexer lexer = new ExprLexer(strStream);

 

// creates the parser with a stream of tokens that will be extracted from the lexer to it

CommonTokenStream tokStream = new CommonTokenStream(lexer);

ExprParser parser = new ExprParser(tokStream);

 

// parse the input string and get the results

ExprParser.prog_return returnValue = parser.prog();

 

// creates the tree parser with a stream of tokens that will be extracted from the AST

CommonTreeNodeStream treeNodeStream = new CommonTreeNodeStream((CommonTree)returnValue.Tree);

EvalTreeParser treeWalker = new EvalTreeParser(treeNodeStream);

 

// walk the AST executing any associated code

treeWalker.prog();

 

// get the AST and wrap it in our tree node class in order to visualize it with StructsViz when debugging

// you can download an evaluation version of StructsViz here: http://www.manuelabadia.com/products

AntlrDebugCommonTree AST = new AntlrDebugCommonTree((CommonTree)returnValue.Tree);

 

 

The parse tree used to evaluate the sample expression in the tree parser grammar is:

 

example_parseTree2.PNG

 

I hope this tutorial about ANTLR has been useful to clarify some gray areas and have a general view of the main classes involved.

 

To learn more about ANTLR refer to the upcoming book:

 

 

Download article source code (16.47 KB)
Monday, January 8, 2007 8:41:33 AM (Romance Standard Time, UTC+01:00)  #    Comments [5]   ANTLR | Microsoft .NET Framework  | 
Friday, February 2, 2007 7:37:36 PM (Romance Standard Time, UTC+01:00)
Hola Manuel, no se si eres Español o no, así que te escribo en Español e Inglés.
Dear Manuel, I don't know if you are Spanish or not, therefore I write you in English and Spanish.

Gracias por tu artículo.
Thanks for your article.

Estoy desarrollando mi proyecto fin de carrera y espero que me ayude.
I'm developing the project of my university course and I wish your article will help me.

SAludos,
Regards,
Rodrigo.
Rodrigo
Saturday, November 3, 2007 10:40:08 AM (Romance Standard Time, UTC+01:00)
Hi Manuel,

Great series of articles that helped me a lot in understanding ANTLR with C#. But I have one question:

What happened to EvalTreeParser? It does not appear in any generated code for my projects. Is there anything I have to specify to make it appear?
Saturday, November 3, 2007 11:05:21 AM (Romance Standard Time, UTC+01:00)
David,

There are two .g files in the example. The Expr.g should generate (using the Antlr tool) the ExprLexer.cs and ExprParser.g files, and Eval.g should generate EvalTreeParser.g.
Tuesday, June 9, 2009 2:19:33 PM (Romance Daylight Time, UTC+02:00)
When I generate the C# code, instead of "EarlyExitException eee", I get "EarlyExitException eeen" (where n is some number such as "1", "8", etc.) But the "throw" statement is always "throw eee", as shown in this article. Consequently I have a compile error that I must fix every time.

This looks like a bug to me. Is it known/reported? Is there a fix or workaround?

Many thanks.
Adrian Waters
Wednesday, June 10, 2009 4:07:28 PM (Romance Daylight Time, UTC+02:00)
Adrian,

this looks like a bug in the templates for C# code. IIRC I reported a similar bug (missing the final number in the generated code) some time ago and that was fixed at the time.

As I have not use ANTLR for several months so I can't tell you if it is reported or not. Probably there is no workaround for it, the generated code is wrong because the template doesn't add the final number to the eee. Report a bug to the C# maintainers and I'm sure they will fix it soon.

Best regards.
All comments require the approval of the site owner before being displayed.
Name
E-mail
Home page

Comment (Some html is allowed: a@href@title, strike) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.  

[Captcha]Enter the code shown (prevents robots):

Live Comment Preview
Copyright © 2019 Manuel Abadia. All rights reserved.
DasBlog 'Portal' theme by Johnny Hughes.