Irony - Language Implementation Kit/Grammar/Non Terminals

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Rules[edit | edit source]

Rules tell the parser how to group the tokens fed to it by the scanner into expressions and statements.

The "+" and "|" have been overloaded so you can string terminals and non terminals together to define these rules.

From the ExpressionEvaluatorGrammar example found in the Irony project:

// 2. Non-terminals
var Term = new NonTerminal("Term");
var BinExpr = new NonTerminal("BinExpr", typeof(BinaryOperationNode));
var ParExpr = new NonTerminal("ParExpr");
var UnExpr = new NonTerminal("UnExpr", typeof(UnaryOperationNode));
var UnOp = new NonTerminal("UnOp");
var BinOp = new NonTerminal("BinOp", "operator");
var PrefixIncDec = new NonTerminal("PrefixIncDec", typeof(IncDecNode));
var PostfixIncDec = new NonTerminal("PostfixIncDec", typeof(IncDecNode));
var IncDecOp = new NonTerminal("IncDecOp");
var AssignmentStmt = new NonTerminal("AssignmentStmt", typeof(AssignmentNode));
var AssignmentOp = new NonTerminal("AssignmentOp", "assignment operator");
var Statement = new NonTerminal("Statement");
var Program = new NonTerminal("Program", typeof(StatementListNode));

// 3. BNF rules
Expr.Rule = Term | UnExpr | BinExpr | PrefixIncDec | PostfixIncDec;
Term.Rule = number | ParExpr | identifier | stringLit;
ParExpr.Rule = "(" + Expr + ")";
UnExpr.Rule = UnOp + Term;
UnOp.Rule = ToTerm("+") | "-"; 
BinExpr.Rule = Expr + BinOp + Expr;
BinOp.Rule = ToTerm("+") | "-" | "*" | "/" | "**";
PrefixIncDec.Rule = IncDecOp + identifier;
PostfixIncDec.Rule = identifier + IncDecOp;
IncDecOp.Rule = ToTerm("++") | "--";
AssignmentStmt.Rule = identifier + AssignmentOp + Expr;
AssignmentOp.Rule = ToTerm("=") | "+=" | "-=" | "*=" | "/=";
Statement.Rule = AssignmentStmt | Expr | Empty;
Program.Rule = MakePlusRule(Program, NewLine, Statement);

Kleene Operators[edit | edit source]

In traditional BNF notation, the "?", "+", and "*" characters are used to indicate "0 or 1 time", "1 or more times" and "0 or more times", respectively. In Irony, it's done slightly differently. You use the MakePlusRule and MakeStarRule methods from the base Grammar class for "+" and "*" or you can use the Q(), Plus(), and Star() methods directly on the term within the rule.

A lot of languages will start with a non terminal called "program" which consists of one or more "statement" non terminals. The following is how you would indicate that:

Root = program;

program = MakePlusRule(program, statement);

statement = ...

Important note: when using MakePlusRule or MakeStarRule, you cannot have anything else in the rule.

Hints[edit | edit source]

PreferShiftHere ReduceHere ResolveInCode ImplyPrecedenceHere

Transients[edit | edit source]

Transients are those non terminals that are used by the parser to break down statements into finer and finer expressions, but otherwise are not needed. For example, if you have a non terminal called "expression" that can break down into something more granular like "binaryExpression", then usually all you care about is that it was finally identified as a "binaryExpression". Putting "expression" into the parse tree just creates an extra node that doesn't really need to be there.

You can indicate which non terminals are transient using the MarkTransient method from the base Grammar class:

MarkTransient(Term, Expr, Statement, BinOp, UnOp, IncDecOp, AssignmentOp, ParExpr);

You cannot mark transient any non terminal whose AST node is of type StatementListNode. Therefore, non terminals like "program" cannot be marked transient as its definition is a list of statements.