Parrot Virtual Machine/Advanced PGE

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Advanced PGE[edit | edit source]

We've already looked at some of the basics of parser constructing using PGE and NQP. In this chapter we are going to give a more in-depth look at some of the features of the grammar engine that we haven't seen yet. Some of these more advanced features, such as inline PIR code, assertions, function calls and built-in token types will make the life of a compiler designer much easier, but are not needed for most basic tasks.

regex, token and proto[edit | edit source]

A regex is a high-level matching operation that allows backtracking. A token is a low-level matching operation that does not allow backtracking. A proto is like a regex but allows multiple dispatch. Think of a proto declaration as being a prototype or signature that several functions can match.

Inline PIR Sections[edit | edit source]

PIR can be embedded directly into both PGE grammar files and NQP files. This is important to fill in some gaps that NQP cannot handle due to its limitations. It is also helpful to insert some active processing into a grammar sometimes, to be able to direct the parser in a more intelligent way.

In NQP, PIR code can be inlined using the PIR statement, followed by a quoted string of PIR code. This quoted string can be in the form of a perl-like "qw< ... >" type of quotation, if you think that looks better.

In PGE, inline PIR can be inserted using double-curly-brackets "{{ ... }}". Once in PIR mode, you can access the current match object by calling $Px = find_global "$/" (where $Px is any of the valid PIR registers where x is a number).

Built-In Token Types[edit | edit source]

PGE has basic default values of certain rules already defined to help with parsing. However, you can redefine these to be something else, if you don't like the default behavior.

Calling Functions[edit | edit source]

functions or subroutines are an integral part of modern programming practices. As such, support for them is part of the PAST system, and is relatively easy to implement. We're going to cover a little bit of necessary background information first, and then we will discuss how to put all the pieces together to create a system with usable subroutines.

return Described[edit | edit source]

In Parrot control flow, especially return operations from subroutines, are implemented as special control exceptions. The reason why it is done as an exception and not as a basic .return() PIR statement is a little bit complicated. Many languages allow for nested lexical scopes, where variables defined in an "inner" scope cannot be seen, accessed, or modified by statements in the "outer" scope. In most compilers, this behavior is enforced by the compiler directly, and is invisible when the code is converted to assembly and machine languages. However PIR is like an assembly language for the Parrot system, and it's not possible to hide things at that level. All local variables are local to the entire subroutine and cannot be localized to a single part of a subroutine. To implement nested scopes, Parrot instead uses nested subroutine

Returns and Return Values[edit | edit source]

Functions can be made to return a value use the "return" PAST.op type. The return system is based on a control exception. Exceptions, as we've discussed before, move control flow to a specified location called the "exception handler". In terms of a return exception, the handler is the code directly after the original function call. The return values (currently, the return PAST node only allows a single return value) are passed as exception data items and are retrieved by the control exception handler.

All of these details are generally hidden from the programmer, and you can treat a return PAST node exactly like you would expect. You pass a return value, if any, to the return PAST node. The current function ends and its scope is destroyed. Control flow returns to the calling function, and the return value from the function is made available.

Assertions[edit | edit source]

Repetition Counting with **[edit | edit source]

MetaSyntactic Assertions[edit | edit source]

You can call a function from within a rule using the <FUNC( )> format.

Non-Capturing Assertions[edit | edit source]

Use <. > form to create a match object that does not capture its contents.

Indirect Rules[edit | edit source]

A rule of the form <$ >, which can be a string or some other data, is converted into a regular expression and then run.

Character Classes[edit | edit source]

Rules of the form <[ ]> contain custom character classes. Rules with <-[ ]> are complimented character classes.

Built-in Assertions[edit | edit source]

  • <?before>, <!before>
  • <?after>, <!after>
  • <?same>, <!same>
  • <.ws>
  • <?at()>, <!at()>

Partial Matches[edit | edit source]

You can specify a partial match, a match which attempts to match as much as possible and never fails, with the <* > form.

Recursive Calls[edit | edit source]

You can recurse back into subrules of the current match rule using the <~~ > rule.

Resources[edit | edit source]


Previous Parrot Virtual Machine Next
Optables and Expressions Building A Compiler