Irony - Language Implementation Kit/Grammar/Terminals

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Terminals are the tokens identified by a scanner and passed to the parser. Irony provides a handful of key terminals that are found in almost every programming language (comments, identifiers, string literals, etc.).

Standard terminals[edit | edit source]

These terminals are already defined in the Grammar base class:

Empty

Used to identify an optional element in a non terminal:

term.Rule = term1 | Empty;

Eof

Identifies end of file (using Eof in grammar rules is optional, the Parser automatically adds this symbol as a lookahead to Root non terminal

LineStartTerminal

Used for error tokens

SyntaxError

Used for error tokens

The following are used in indent-sensitive languages like Python. They are not produced by scanner but are produced by CodeOutlineFilter after scanning and before parsing:

NewLine

Indent

Indicates an indentation

Dedent

Indicates the end of an indentation

Eos

End-of-Statement terminal - used in indentation-sensitive language to signal end-of-statement. It is not always synced with CRLF chars, and CodeOutlineFilter carefully produces Eos tokens (as well as Indent and Dedent) based on line/col information in incoming content tokens.


CommentTerminal[edit | edit source]

The comment terminal allows you to easily declare what defines a comment in your language. Most languages provide at least a line comment, but lots of others allow the concept of a block comment.

To set up either type of comment terminal, just declare a new CommentTerminal type and set the start and end characters.

Example Line Comment:

CommentTerminal LINE_COMMENT = new CommentTerminal("LINE_COMMENT", "--", "\n", "\r\n");

Example Block Comment:

CommentTerminal BLOCK_COMMENT = new CommentTerminal("BLOCK_COMMENT", "/*", "*/");

If you want the scanner to basically ignore your comment terminals so they don't show up in your parse tree etc., then add them to the non grammar terminal list.

NonGrammarTerminals.Add(BLOCK_COMMENT);
NonGrammarTerminals.Add(LINE_COMMENT);


ConstantTerminal[edit | edit source]

This terminal allows to declare a set of constants in the input language.

It should be used when constant symbols do not look like normal identifiers; e.g. in Scheme, #t, #f are true/false constants, and they don't fit into Scheme identifier pattern.

ConstantTerminal CONSTANT = new ConstantTerminal("CONSTANT");
CONSTANT.Add("#t", true);
CONSTANT.Add("#f", false);

DateLiteral[edit | edit source]

DataLiteralBase DATETIME = new DataLiteralBase("DATETIME", TypeCode.DateTime);


IdentifierTerminal[edit | edit source]

The identifier terminal will identify those tokens in source code that represent variables expressed in the normal standard way (i.e. starts with an underscore or letter and contains only letters, numbers, and underscores) but can be configured to identify other non-standard methods of expressing variables.

IdentifierTerminal IDENTIFIER = new IdentifierTerminal("IDENTIFIER");


NumberLiteral[edit | edit source]

The built-in number literal terminal can identify numerous types of numbers from simple integers (e.g. 1) to decimals (e.g. 1.0) to numbers expressed in scientific notation (e.g. 1.1e2).

NumberLiteral NUMBER = new NumberLiteral("NUMBER");


StringLiteral[edit | edit source]

Use this terminal to identify string literals; just set the start/end character(s).

StringLiteral STRING = new StringLiteral("STRING", "\"", StringOptions.IsTemplate);

One of the useful properties of the StringLiteral terminal is its ability to treat a string as a template and resolve expressions embedded within like in Ruby. Just set the IsTemplate option like above and then feed it a settings class to tell it how to find those expressions. Your expression root (the non terminal used to resolve the embedded expressions) also needs to be added to the SnippetRoots list.

In this example, a new StringTemplateSettings is created where any expression surrounded by curly braces ({ and }) is treated as an expression ("expression" being the non terminal acting as the root expression):

StringTemplateSettings stringTemplateSettings = new StringTemplateSettings();
stringTemplateSettings.StartTag = "{";
stringTemplateSettings.EndTag = "}";
stringTemplateSettings.ExpressionRoot = expression;

this.SnippetRoots.Add(expression);

STRING.AstNodeConfig = stringTemplateSettings;


Keywords[edit | edit source]

Keyword terminals can be declared two ways: explicitly using the ToTerm method in a variable declaration or implicitly within production rules.

Explicit declaration of the keyword SELECT in SQL and then its use in the SELECT statement production:

KeyTerm SELECT = ToTerm("select");

selectStatement.Rule = SELECT + optionalSelectArgs + FROM + ... + SEMICOLON;

Implicit declaration inside a SELECT statement production in SQL:

selectStatement.Rule = ToTerm("select") + optionalSelectArgs + ToTerm("from") + ... + ToTerm(";");


Operators[edit | edit source]

You define operators as terminals in the same way you would with keywords. You define the associativity and precedence of those operators using the RegisterOperators method in the base Grammar class.

Example indicating associativity and precendence of simple binary operators:

RegisterOperators(6, Associativity.Right, POW);
RegisterOperators(5, MULT, DIV);
RegisterOperators(4, PLUS, MINUS);


Punctuation[edit | edit source]

You can tell the scanner and parser what terminals are being used as punctuation in your language by using the MarkPunctuation method in the base Grammar class. Typically, these are terminals like the left and right parentheses characters or curly braces characters.

Example indicating what terminals act as punctuation (LPAREN, RPAREN, LBRACE, and RBRACE assumed to be KeyTerm objects defined beforehand):

MarkPunctuation(LPAREN, RPAREN, LBRACE, RBRACE);


Custom Terminals[edit | edit source]

You can create your own terminals if the built in ones don't fit your needs. Just extend Irony.Parsing.Terminal and go from there. You can also extend the built-in terminals if you need to make slight adjustments to them to fit your language if you can't do so by simply setting existing properties.