Blog Home  Home Feed your aggregator (RSS 2.0)  
ANTLR C# Overview - Part 2 - Manuel Abadia's ASP.NET stuff
 
# Tuesday, January 2, 2007

Click here for Part 1 of this overview
Click here for Part 3 of this overview

IMHO the more awkward thing in ANTLR is the tree implementation. A tree is implemented as a tree node and all the operations related to trees are moved to another class. Also, there can be “nil” nodes, that are fake nodes used as parent for childs (this is used to create flat lists).

In ANTLR a tree has to implement the ITree interface:

public interface ITree

{

    int CharPositionInLine { get; }

    int ChildCount { get; }

    int Line { get; }

    bool Nil { get; }

    string Text { get; }

    int Type { get; }

 

    void AddChild(ITree t);

    ITree DupNode();

    ITree DupTree();

    ITree GetChild(int i);

    string ToString();

    string ToStringTree();

}

In the ITree interface there are some properties that are also present in the IToken interface (Line, CharPositionInLine, Type and Text) in order to associate a position with the current tree for errors and because it is also needed for parsing.

The Nil property is used to check if this node is a “nil” node.

The property ChildCount and the method GetChild are used to navigate the tree.

The DupNode methods clones the current node and the DupTree method clones the whole tree.

In a normal implmentation of a tree node, the method AddChild doesn’t need more comments but in ANTLR the methods that operate on trees are not orthogonal and usually differ if one of the operands is a “nil” node. In this case, if the tree node passed to the AddChild is a “nil” node, all the children of that “nil” node are added as children of the tree. Otherwise works as expected.

The ITree interface is implemented by the abstract class BaseTree that implements the tree using a list of child nodes, and the node doesn’t have any payload.

The concrete classes that implement BaseTree in ANTLR are:
• CommonTree: that adds a token as the payload.
• ParseTree: that stores strings and tokens as the payload.

To operate on the tree you have to use a class that implements the ITreeAdaptor interface:

public interface ITreeAdaptor

{

    void AddChild(object t, IToken child);

    void AddChild(object t, object child);

    object BecomeRoot(IToken newRoot, object oldRoot);

    object BecomeRoot(object newRoot, object oldRoot);

    object Create(IToken payload);

    object Create(int tokenType, IToken fromToken);

    object Create(int tokenType, string text);

    object Create(int tokenType, IToken fromToken, string text);

    object DupNode(object treeNode);

    object DupTree(object tree);

    object GetChild(object t, int i);

    int GetChildCount(object t);

    object GetNilNode();

    string GetNodeText(object t);

    int GetNodeType(object t);

    int GetTokenStartIndex(object t);

    int GetTokenStopIndex(object t);

    int GetUniqueID(object node);

    object RulePostProcessing(object root);

    void SetNodeText(object t, string text);

    void SetNodeType(object t, int type);

    void SetTokenBoundaries(object t, IToken startToken, IToken stopToken);

}

 

The AddChild and Dup methods do the same as the ITree counterparts.

The Create methods are used to create a tree node from a token.

The method GetNilNode is used to obtain a “nil” node.

The GetChildCound and GetChild methods are used to traverse the tree.

The BecomeRoot method returns a tree where the newRoot is the root of the resulting tree and the oldRoot is a child of the resulting tree. There is special handling for “nil” nodes (see the ANTLR code for details).

The RulePostProcessing method is called after a rule has been processed in order to optimize the resulting tree. After this method, the SetTokenBoundaries is called in order to set the bounds in the input token stream of this tree (including all the children).

The other methods are for getting/setting the node type and text when doing tree parsing.

The abstract class BaseTreeAdaptor implements the ITreeAdaptor interface and adds an abstract method called createToken. The concrete class that is used in ANTLR is the CommonTreeAdaptor class, that inherits from BaseTreeAdaptor. The CommonTreeAdaptor class creates CommonTree and CommonToken objects.

When ANTLR generates a Parser that builds the AST, the generated class has a protected field called adaptor (and the associated property Adaptor) which is used in the rules to create the AST.  Also a return type of each method created for each rule changes to a new generated class (with a name rulename_return) that inherits from the ParserRuleReturnScope:

public class ParserRuleReturnScope : RuleReturnScope

{

    public IToken start;

    public IToken stop;

}

 

public class RuleReturnScope

{

    public virtual object Start { get; }

    public virtual object Stop { get; }

    public virtual Antlr.StringTemplate.StringTemplate Template { get; }

    public virtual object Tree { get; }

}

 

The RuleReturnScope class is used to provide start and stop information (token or tree) as well as the AST or the template when output=AST or when output=template is used.

The generated class that inherits from ParserRuleReturnScope stores the resulting AST and any return value defined in the rule.

You may be wondering why to generate an AST and then create a Tree Parser for the generated AST to do more work if you can add attributes and logic to the rules in the parser to do what you want to do. The reason is that there are a lot of rules in the grammar that are just for precedence or to make the grammar cleaner and also there are some tokens (for example the brackets) that are not needed after parsing. The resulting AST has only the data need to perform our task without garbage. For example, consider a grammar for evaluating expressions with support for relational expressions and bit operators. The expression ((i >= 0) AND (i <= n)) OR ((index & 2) = 1) produces this parse tree:

 

The generated AST from that expression is:

 

So the AST is a lot easier (and clearer) to use.

I have used my StructsViz DebuggerVisualizer pack to generate those tree diagrams in the Visual Studio.NET 2005 debugger.

After studying the tree related classes and the Parser class it is time to start with Tree Parsers. The TreeParser class inherits from BaseRecognizer and will consume tree nodes from an ITree. The input of a tree parser will be a class implementing the ITreeNodeStream interface. This interface is an specialization of the IIntStream interface to operate on tree nodes:

public interface ITreeNodeStream : IIntStream

{

    bool HasUniqueNavigationNodes { set; }

    ITreeAdaptor TreeAdaptor { get; }

    object TreeSource { get; }

 

    object LT(int k);

    string ToString(object start, object stop);

}

 

As you can see, the interface can return any tree node with the LT method, and with the TreeSource and TreeAdaptor properties you can access to the tree providing the data and to the associated adaptor.

The ToString method prints the text of the specified nodes.

The HasUniqueNavigationNodes is used in debugging mode and specifies if the UP and DOWN nodes are reused or created each time they’re needed. The meaning of the UP and DOWN nodes will be explained later.

The CommonTreeNodeStream implements the ITreeNodeStream interface, and also the IEnumerable interface (so the nodes can be accessed sequentially using the IEnumerable related methods and properties, that is, Reset, MoveNext, Current). The class has 3 static tree nodes defined: the UP node, the DOWN node and the EOF node. To traverse the tree sequentially the class uses two stacks, the node stack and the index stack. The node stack is used to track the parent node when we’re walking down the tree. The index stack is used to know which child node we’re currently visiting. The tree is visited using a depth first traversal. When the first child of a node is visited, a ficticious DOWN node is added to the lookahead buffer and when the last child of a node is visited, a ficticious UP node is added to the lookahead buffer. After all nodes have been visited, the ficticious EOF node is always returned. When the nodes are accessed sequentially using the IEnumerable interface, the ficticious nodes are not returned. The ficticious nodes are returned only when the LT method of the ItreeNodeStream interface is called.

For completeness here are all the methods and properties of the CommonTreeNodeStream class:

public class CommonTreeNodeStream : ITreeNodeStream, IIntStream, IEnumerator

{

    public CommonTreeNodeStream(ITree tree);

    public CommonTreeNodeStream(ITreeAdaptor adaptor, ITree tree);

 

    public virtual object Current { get; }

    public virtual void Reset();

    public virtual bool MoveNext();

 

    public virtual void Consume();

    public virtual int Index();

    public virtual int LA(int i);

    public virtual int Mark();

    public virtual void Release(int marker);

    public void Rewind();

    public virtual void Rewind(int marker);

    public virtual void Seek(int index);

    public virtual int Size();

 

    public bool HasUniqueNavigationNodes { get; set; }

    public ITreeAdaptor TreeAdaptor { get; }

    public virtual object TreeSource { get; }

    public virtual object LT(int k);

    public virtual string ToString(object start, object stop);

 

    protected int LookaheadSize { get; }

    protected internal virtual void AddLookahead(ITree node);

    protected internal virtual void AddNavigationNode(int ttype);

    protected internal virtual void fill(int k);

    protected internal virtual ITree handleRootNode();

 

    public override string ToString();

    public virtual string ToNodesOnlyString();

 

    protected internal virtual void ToStringWork(ITree p, ITree stop, StringBuilder buf);

    protected internal virtual ITree VisitChild(int child);

    protected internal virtual void WalkBackToMostRecentNodeWithUnvisitedChildren();

}

 

After the constructors, you can see the method and properties of the IEnumerable interface, followed by the methods and properties of the IIntStream interface, and then the methods of the ITreeNodeStream interface.

The other methods and properties are used to maintain a circular lookahead buffer of tree nodes and to handle the navigation taking the ficticious nodes into account.

Nothing better than a couple of images here. For the following AST:


 

 

The lookahead buffer with the ficticious tokens included looks like:

 

The TreeParser class inherits from BaseRecognizer and gets tree nodes from a class implementing the ITreeNodeStream interface:

public class TreeParser : BaseRecognizer

{

    public TreeParser(ITreeNodeStream input);

 

    public override IIntStream Input { get; }

    public virtual ITreeNodeStream TreeNodeStream { get; set; }

 

    protected internal override void Mismatch(IIntStream input, int ttype, BitSet follow);

    public override IList ToTemplates(IList retvals);

}

 

As you can see, the TreeParser is a very simple class that has properties for the input stream and overrides 2 methods of the BaseRecognizer class.

When ANTLR creates a TreeParser, it generates a class that inherits from the TreeParser class. The generated class has a method for each rule in a similar way as the parsers.

Thanks to the smart idea behind ITreeNodeStream, a Parser and a TreeParser are very similar if you take a look at the generated code. The Parser consumes tokens from an ITokenStream in a linear way (one dimension) and the TreeParser consumes tokens in a hierarchical way (two dimensions), but serialized to one dimension thanks to the ficticious UP and DOWN tree nodes.

To complete this ANTLR overview, I’ll post an example of a lexer, a parser and a tree parser and the generated code in a few days.

Tuesday, January 2, 2007 11:13:38 AM (Romance Standard Time, UTC+01:00)  #    Comments [0]   ANTLR | Microsoft .NET Framework  | 
All comments require the approval of the site owner before being displayed.
Name
E-mail
Home page

Comment (Some html is allowed: a@href@title, strike) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.  

[Captcha]Enter the code shown (prevents robots):

Live Comment Preview
Copyright © 2019 Manuel Abadia. All rights reserved.
DasBlog 'Portal' theme by Johnny Hughes.