Parrot Virtual Machine/Parrot Internals
Parrot Development Process
The Parrot development project is a large and complex project with multiple facets. Here is an overview of some key points about the Parrot build process. Some of the points here have not been discussed before, but we will covert them in this or later chapters:
- The build environment is configured using the Configure.pl program. This program is written, like many of the build tools for Parrot, in Perl 5. Configure.pl determines options on your system including which compiler you are using, which Make program (if any) you are using, what platform-specific libraries are required, etc.
- PMCs are written in a C-like script which is compiled into C code using the PMC Compiler. The PMC Compiler will produce C code and associated header files for all PMCs, and will register the PMCs into the Parrot PMC table.
- Opcodes are written in a C-like script which is compiled into C, just like PMCs are. The syntax of Opcode files is similar in some respects to that used for PMCs, but is different in many ways too. Ops files are converted into C before being compiled into machine code.
- Native Call Interface (NCI) function signatures must be converted into C functions prior to compilation using the NCI compiler
- Just-In-Time operations must be converted into C code for compilation into native code.
- The parsers for PASM and PIR are written in Lex/Bison. These need to be compiled into C files for compilation.
- The constant string converter converts CONST_STRING declarations into string constants at compile time. This saves a lot of time at execution.
- The Makefile automates the build process by compiling all the PMCs, Compiling all the C files, building the executables and libraries, etc.
In this chapter we are going to give an overview of some of the components of the Parrot Virtual Machine, later chapters will discuss the various Parrot subsystems including many of the processes that we've described above. The chapters in this section are all going to discuss Parrot hacking and development. If you aren't interesting in helping with Parrot development, you can skip these chapters.
Here is the general structure of the Parrot Repository, as far as source code is concerned:
Major Parrot Components
PASM and PIR Parsers
There are two parsers for PIR available. The first is IMCC, which is used currently but is inefficient, and the other is PIRC which is more efficient but not stable yet. The long-term plan is for PIRC to become the predominant PIR parser by the time the 1.0 version of Parrot is released.
Both IMCC and PIRC are written in the C programming language with parsers written in Lex and Yacc.
PIRC and IMCC act as front-ends to two other Parrot components: the bytecode compiler and the interpreter.
Bytecode Compiler and Optimizer
The bytecode compiler is the portion of Parrot which is responsible for converting input symbols (in the form of PASM or PIR) into Parrot bytecode. This bytecode, once compiled, can be run on Parrot quickly and efficiently.
Another related Parrot component is the bytecode optimizer which is responsible for low-level optimizations of Parrot bytecode.
While the bytecode compiler takes input symbols from PIRC or IMCC and converts them into a bytecode for storage and later execution, the interpreter uses these symbols to execute the program directly. This means that there is no intermediate step of compilation, and a script can be execute quickly without having to be compiled.
The I/O subsystem controls reading and writing operations to the console, to files, and to the operating system. Much of this functionality is being performed in special PMCs.
Regular Expression Engine
The regular expression engine is used to provide fast regular expressions for Parrot programs. The functionality of this engine is most obviously expressed in PGE, but is also available in other places as well. Perl 6 regular expressions, on which this engine is based, differ significantly from Perl 5 regular expressions and their variants.
Garbage Collector and Memory Management
The memory management subsystem is designed to allocate and organize memory for use with Parrot and programs which run on top of Parrot. The garbage collector detects when allocated memory is no longer being used and returns that memory to the pool for later allocation.
Dynamic Library Loader
Memory System and Garbage Collector
Parrot is not just an executable program, it's also a linkable library called libparrot. libparrot can be linked to other programs, and a Parrot interpreter object can be called from inside that program. An entire embedding API has been created to allow libparrot to communicate with other programs.
Parrot can be extended by using dynamic libraries, such as linux
.so files, or Windows
.dll files. These extensions must interact with Parrot in a safe and controlled way. For this, the Extensions API was written to given extensions a communications channel into the heart of Parrot.
The next several chapters are going to look at the individual components of Parrot. We will discuss the software architectures and operations of each component. As we have already seen, Parrot itself is written using the C programming language, although individual components (such as the opcodes, PMCs, and other features) are written in special domain-specific languages and later translated into C code. Some higher-level functionality, such as PCT is writtin in PASM and PIR too. Parsers for PIR are written using a combination of Lex and Yacc.
Programming for Parrot is typically going to require a good knowledge of the C programming language, but also a good understanding of Perl 5. this is because Perl 5 is used to write all the development tools which control the build process for Parrot.