Parrot Virtual Machine/Parrot Compiler Tools
Parrot Compiler Tools
The first section of this book covered some of the basics of the Parrot platform, and the various features that Parrot provides for use with other high level languages. It is important to notice that Parrot provides more features and capabilities than most individual languages require. This is because Parrot aims to be a platform to support multiple high-level dynamic programming languages, each of which have diverse feature sets. Some of the most recent versions of these programming languages, such as Perl 6 and Python 3000 have very interesting feature sets planned that cannot be supported well by any other existing interpreter or virtual machine platform.
PIR and Parrot programming so far has been relatively low-level, but the goal of Parrot is to support high-level languages. To facilitate this goal, Parrot provides tools that compiler-designers can use to quickly and easily create the advanced language features that next-generation languages like Python 3000 and Perl 6 need. These are, collectively, known as the Parrot Compiler Tools (PCT). The PCT are a set of tools that people can use to quickly and easily implement new programming languages on the Parrot platform. We will talk about them in this chapter and some of the following chapters too.
Parsing and Compiling: How Parrot Works
Parrot is designed to be a highly modular system. This means that many components can be interchanged as needed. Some of these changes need to be specified at compile time, but others can be performed at runtime.
Inputting a program to Parrot goes through multiple steps. Here is a brief overview of these:
- Parser and Lexer
- The first stage to Parrot is the Parser and Lexer (Lexer is short for "Lexical Analyzer"). We will discuss the operations of these components in more detail later in this chapter, and in future chapters. The parser and lexer read input code in PIR or PASM and convert it into a data representation called an abstract syntax tree (AST). An AST is a way to represent program instructions in a way that is very easy for a computer to work with.
- The compiler unit converts information in the AST into Parrot Bytecode format. Bytecode is a set of instructions in binary machine language. From here, Parrot can execute the bytecode directly, or it can save the bytecode to disk and execute it later.
- The optimizer takes the generated bytecode and attempts to make it smaller, faster, and more efficient. Bytecode that has been properly optimized will typically execute faster then non-optimized bytecode.
- JIT Compiler
- Short for "Just In Time", the JIT compiler attempts to convert Parrot bytecode into native machine code. This will typically bring large speed increases, but is highly platform-dependent and does not yet work on any systems.
- Once a program has been converted into bytecode, that bytecode is loaded into the interpreter where it is executed.
This is just a very brief overview of these components, we will discuss them in more detail in later chapters. It is worth nothing here, however, that many of these components are modular and can be swapped out if you would like to use a different one. For instance, if you already have a parser written for a particular language, instead of having to rewrite the parser using PCT, you can load your existing parser into Parrot. Of course, you will probably need to make modifications to ensure that your custom parser outputs a proper AST, but that's a small price to pay to avoid having to completely rewrite your language parser from the ground up.
PCT Design Process
PCT includes a number of tools and design steps necessary to create a compiler for a new programming language. Here is a brief look at some of the steps required to create a new compiler:
- Create a language shell
- Create a Grammar file
- Create a grammar actions file
- Create necessary classes, built-in functions, and PMCs
- Create the driver program
Once you have your compiler, you can use it to run programs written in your high-level language. Here are some steps involved in running your compiler:
- Compile your grammar into a Parrot Abstract Syntax Tree (PAST)
- Compile the PAST into a Parrot Optimized Syntax Tree (POST)
- Compile the POST into Parrot Bytecode (PBC) or PIR
- Run the PBC or PIR on Parrot
This should give you a rough idea of what needs to be done to create a compiler, and how a compiler operates. We'll elaborate on each of these steps in this chapter and the next few chapters in this section.
Creating a Language Shell
A new language shell has a number of components. There are the grammar and action files that we've mentioned, but you also need a driver program to create the HLLCompiler object and start the compilation. Also, if you want to have any built-in functions or classes, you will need to write them. To simplify the whole process, you will want to have a makefile to handle all the build steps for writing your language.
Luckily, there is a tool available to simplify this process,
mk_language_shell.pl is a Perl 5 program that creates all the necessary files for creating a new language compiler, and fills those files with some helpful default code. It is located, from the Parrot root folder in the
tools/dev/ folder. To run this program from your shell, go to the Parrot root folder and type:
tools/dev/mk_language_shell.pl <LANGUAGE_NAME> <PATH>
<LANGUAGE_NAME> is the name of your new language, and
<PATH> is the directory where you want it to be stored. By convention, all language projects are stored in the
languages/ directory. Using this directory makes it more easy for other build tools to find it.
For example, if we wanted to create a new language called "mylanguage", we could write
tools/dev/mk_language_shell.pl mylanguage languages/mylanguage
This will create all the necessary files, including a makefile for your language project. Notice that many of these default files, including the makefile, will need to be edited or modified as time goes on. You may want to, as practice, open the makefile and see how things are being built. If you've never seen a makefile before, this is your opportunity to learn about what they are and how they work.
Grammars and Actions
Grammars, typically files with a ".pg" file extension, are compiled using the Parrot Grammar Engine (PGE). PGE is an implementation of the Perl 6 rules engine for Parrot. PGE uses a Recursive Descent parser, although certain components such as expressions can be parsed using a bottom-up parser for efficiency. If you have read the book on Compiler Construction this should make some sense to you. If not, the details about the parser are not particularly important at this point.
Unfortunately, there is a little bit of terminology that we need to cover before we can go any further into this. People who are familiar with grammars and parsers can skip this section. Everybody else should try to read through it because it's valuable and pertinent information.
A tool called a lexical analyzer reads the input file and converts chunks of text into things called "tokens". Tokens are then arranged into particular patterns called "rules" by the parser. When a rule is successfully applied to a set of input tokens, the rule is said to "match" the input. Think of a token as a word in a sentence. Alone, a single word might not have much meaning. But if you put multiple words together into a sentence, the intended meaning becomes clear. A parser takes a group of tokens together and tries to form them into a "sentence", or a known pattern. If a valid pattern of tokens is found, the parser succeeds.
At each step of the parsing process, the parser receives a token from the lexical analyzer. If the parser has enough tokens to make a valid pattern, it succeeds. If it doesn't have enough information to form a valid pattern, it requests the next token and tries again. Large patterns are divided up into smaller patterns. Tokens are combined together into small patterns, and small patterns are combined together into larger tokens. Eventually, the whole code file is reduced to a single pattern and the parser exits.
At each step the parser may optionally perform an action using information in the token. The parser will associate particular actions with different token types. The action performed on an open-parenthesis token is not going to be the same as the action performed on a close-parenthesis token. In the case of PGE, actions are functions, typically written in PIR or NQP, that create a PAST node. PAST nodes are stored into a large tree that represents the input. This is called the parse tree. When the parser reaches its final match and succeeds, the parse tree is passed to the next stages of the toolkit for processing and eventual conversion into Parrot bytecode.
|The process of parsing and all the necessary theory involved is very complex, and is beyond the scope of this book. However, if you would like more information, there are links in the section titled Resources at the bottom of this page.|
Implementing a new language on Parrot, as we mentioned earlier, is broken into a number of parts:
- Write a grammar file using Perl 6 Grammar rules
- Write a grammar actions file using NQP
- Write a driver program in PIR
- Write built-in functions, classes, and PMCs, using PIR (or C, for the PMCs)
Once you create your language shell, all of these files will be produced for you. All you need to do is fill in your grammar and actions into the necessary files, write the rest of the necessary built-in code, and you should have a working compiler. Once you have modified these files to do what you need them to, there is an additional optional step that you should take:
- Write a series of test modules to verify that your language operates properly.
We will discuss testing and test harnesses later. We will discuss writing parsers and action files in the next few chapters.
The Driver Program
The driver program, which is the main entry point to your compiler has a number of tasks to perform. The first and most important job for your driver program is to create a compiler object for the high-level language in question, and pass the command-line arguments to that compiler object. The compiler object is an HLLCompiler object, and the HLLCompiler class contains all the necessary methods for parsing command-line arguments and initializing the compiler. For more information about the HLLCompiler class, see the Appendix.
A driver program has a number of tasks. Here they are, in no particular order:
- Specify a
:mainfunction, which starts the program
- Create an HLLCompiler object for the given High-Level Language(HLL)
- Specify any additional details to the HLLCompiler object, to change the operation of the compiler prior to the parsing stage.
- Include the necessary libraries of classes and built-in functions that the language needs to operate. For most language, this will include at least one library loading routine capable of loading additional libraries into Parrot for use with the HLL and programs written in it.
- Declare any global variables that will be used with the parser, or will be used by HLL programs.
In addition to these, there may be other tasks which the language designer might wish to perform inside the main driver program.
When you are writing your new language compiler, there are a number of places that you can go to get help. The Parrot repository contains all the current Parrot documentation, in POD format. Perl 5 programmers will be familiar with POD, but other users might not be. POD is a simple documentation format that is treated like multi-line comments in Perl code. Special programs like pod2html can be used to convert POD files into other file types for presentation, such as HTML.
There are many languages in the
languages/ directory. If you are trying to implement a particular feature for your language, chances are good that you can find an existing example of how another language has implemented that feature. One excellent tool to use, especially when you are constructing PAST node trees, or writing functions in PIR is the
--target= directive to Parrot. This directive lets you specify an output dump format. For instance, if you go to the
languages/perl6/ directory, you can type the following
../../parrot perl6.pbc --target=pir
This command will output the PIR of any Perl 6 instructions that you type in. These options work for Parrot, so all the languages will use them, not just Perl 6. Here are some of the other targets you may want to try:
- pir: prints out the result PIR from the code
- pasm: Prints out the result PASM code
- past: Prints out the past node tree that is generated from the code
- parse: prints out a parse tree of the code
Try all these, and see what kinds of results you get using different languages.
If you have looked for help in the POD documentation and in the existing code examples, it might be time to find a real human to ask. Parrot developers and enthusiasts congregate in the #parrot (irc.perl.org) chatroom. Perl 6 developers and enthusiasts congregate in the #perl6 (freenode) chatroom.
Other resources and methods of contact are available at http://www.parrotcode.org/resources.html