Parrot Virtual Machine/Run Core and Opcodes

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Run Core[edit | edit source]

We've discussed run cores earlier, but in this chapter we are going to get into a much deeper discussion of them. Here, we are going to talk about opcodes, and the special opcode compiler that converts them into standard C code. We will also look at how these opcodes are translated by the opcode compiler into different forms, and we will see the different runcores that perform these opcodes.

Opcodes[edit | edit source]

Opcodes are written using a very special syntax which is a mix of C and special keywords. Opcodes are converted by the opcode compiler, tools/dev/ops2c.pl into the formats necessary for the different run cores.

The core opcodes for Parrot are all defined in src/ops/, in files with a *.ops extension. Opcodes are divided into different files, depending on their purpose:

Ops file Purpose
bit.ops bitwise logical operations
cmp.ops comparison operations
core.ops Basic Parrot operations, private internal operations, control flow, concurrency, events and exceptions.
debug.ops ops for debugging Parrot and HLL programs.
experimental.ops ops which are being tested, and which might not be stable. Do not rely on these ops.
io.ops ops to handle input and output to files and the terminal.
math.ops mathematical operations
object.ops ops to deal with object-oriented details
obscure.ops ops for obscure and specialized trigonometric functions
pic.ops private opcodes for the polymorphic inline cache. Do not use these.
pmc.ops Opcodes for dealing with PMCs, creating PMCs. Common operations for dealing with array-like PMCs (push, pop, shift, unshift) and hash-like PMCs
set.ops ops to set and load registers
stm.ops Ops for software transactional memory, the inter-thread communication system for Parrot. In practice, these ops are not used, use the STMRef and STMVar PMCs instead.
string.ops Ops for working with strings
sys.ops Operations to interact with the underlying system
var.ops ops to deal with lexical and global variables

Writing Opcodes[edit | edit source]

Ops are defined with the op keyword, and work similarly to C source code. Here is an example:

op my_op () {
}

Alternatively, we can use the inline keyword as well:

inline op my_op () {
}

We define the input and output parameters using the keywords in and out, followed by the type of input. If an input parameter is used but not altered, you can define it as inconstThe types can be PMC, STR (strings), NUM (floating-point values) or INT (integers). Here is an example function prototype:

op my_op(out NUM, in STR, in PMC, in INT) {
}

That function takes a string, a PMC, and an int, and returns a num. Notice how the parameters do not have names. Instead, they correspond to numbers:

op my_op(out NUM, in STR, in PMC, in INT)
              ^       ^       ^       ^
              |       |       |       |
             $1      $2      $3      $4

Here's another example, an operation that takes three integer inputs, adds them together, and returns an integer sum:

op sum(out INT, in INT, in INT, in INT) {
   $1 = $2 + $3 + $4;
}

Nums are converted into ordinary floating point values, so they can be passed directly to functions that require floats or doubles. Likewise, INTs are just basic integer values, and can be treated as such. PMCs and STRINGs, however, are complex values. You can't pass a Parrot STRING to a library function that requires a null-terminated C string. The following is bad:

#include <string.h>
op my_str_length(out INT, in STR) {
  $1 = strlen($2);  // WRONG!
}

Advanced Parameters[edit | edit source]

When we talked about the types of parameters above, we weren't entirely complete. Here is a list of direction qualifiers that you can have in your op:

direction meaning example
in The parameter is an input
op my_op(in INT)
out The parameter is an output
op pi(out NUM) {
  $1 = 3.14;
}
inout The parameter is an input and an output:
op increment(inout INT) {
 $1 = $1 + 1;
}
|-
| inconst || The input parameter is constant, it is not modified
| <pre>
op double_const(out INT, inconst INT) {
  $1 = $2 + $2;
}

And, in PIR:

$I0 = double_const 5 # numeric literal "5" is a constant
invar The input parameter is a variable, like a PMC
op my_op(invar PMC)

The type of the argument can also be one of several options:

type meaning example
INT integer value 42 or $I0
NUM floating-point value 3.14 or $N3
STR string "Hello" or $S4
PMC PMC variable $P0
KEY Hash key ["name"]
INTKEY Integer index [5]
LABEL location in code to jump to jump_here:

OP naming and function signatures[edit | edit source]

You can have many ops with the same name, so long as they have different parameters. The two following declarations are okay:

op my_op (out INT, in INT) {
}
op my_op (out NUM, in INT) {
}

The ops compiler converts these op declarations similar to the following C function declarations:

INTVAL op_my_op_i_i(INTVAL param1) {
}
NUMBER op_my_op_n_i(INTVAL param1) {
}

Notice the "_i_i" and "_n_i" suffixes at the end of the function names? This is how Parrot ensures that function names are unique in the system to prevent compiler problems. This is also an easy way to look at a function signature and see what kinds of operands it takes.

Control Flow[edit | edit source]

An opcode can determine where control flow moves to after it has completed executing. For most opcodes, the default behavior is to move to the next instruction in memory. However, there are many sorts of ways to alter control flow, some of which are very new and exotic. There are several keywords that can be used to obtain an address of an operation. We can then goto that instruction directly, or we can store that address and jump to it later.

Keyword Meaning
NEXT() Jump to the next opcode in memory
ADDRESS(a) Jump to the opcode given by a. a is of type opcode_t*.
OFFSET(a) Jump to the opcode given by offset a from the current offset. a is typically type in LABEL.
POP() get the address given at the top of the control stack. This feature is being deprecated and eventually Parrot will be stackless internally.

The Opcode Compiler[edit | edit source]

The opcode compiler is located at dev/build/ops2c.pl, although most of its functionality is located in a variety of included libs, such as Parrot::OpsFile. Parrot::Ops2c::* and Parrot::OpsTrans::*.

We'll look at the different runcores in the section below. Suffice it to say, however, that different runcores require that the opcodes be compiled into a different format for execution. Therefore the job of the opcode compiler is relatively complex: it must read in the opcode description files and output syntactically correct C code in several different output formats.

Dynops: Dynamic Opcode Libraries[edit | edit source]

The ops we've been talking about so far are all the standard built-in ops. These aren't the only ops available however, Parrot also allows dynamic op libraries to be loaded in at runtime.

dynops are dynamically-loadable op libraries. They are written almost exactly like regular built-in ops are, but they're compiled separately into a library and loaded in to Parrot at runtime using the .loadlib directive.

Run Cores[edit | edit source]

Runcores are the things that decode and execute the stream of opcodes in a PBC file. In the most simple case, a runcore is a loop that takes each bytecode value, gathers the parameter data from the PBC stream, and passes control to the opcode routine for execution.

There are several different opcores. Some are very practical and simple, some use special tricks and compiler features to optimize for speed. Some opcores perform useful ancillary tasks such as debugging and profiling. Some runcores serve no useful purpose except to satisfy some basic academic interest.

Basic Cores[edit | edit source]

Slow Core
In the slow core, each opcode is compiled into a separate function. Each opcode function takes two arguments: a pointer to the current opcode, and the Parrot interpreter structure. All arguments to the opcodes are parsed and stored in the interpreter structure for retrieval. This core is, as its name implies, very slow. However, it's conceptually very simple and it's very stable. For this reason, the slow core is used as the base for some of the specialty cores we'll discuss later.
Fast Core
The fast core is exactly like the slow core, except it doesn't do the bounds checking and explicit context updating that the slow core does.
Switched Core
The switch core uses a gigantic C switch { } statement to handle opcode dispatching, instead of using individual functions. The benefit is that functions do not need to be called for each opcode, which saves on the number of machine code instructions necessary to call an opcode.

Native Code Cores[edit | edit source]

JIT Core
Exec Core

Advanced Cores[edit | edit source]

The two cores that we're going to discuss next rely on a specialty feature of some compilers called computed goto. In normal ANSI C, labels are control flow statements and are not treated like first-class data items. However, compilers that support compute goto allow labels to be treated like pointers, stored in variables, and jumped to indirectly.

 void * my_label = &&THE_LABEL;
 goto *my_label;

The computed goto cores compile all the opcodes into a single large function, and each opcode corresponds to a label in the function. These labels are all stored in a large array:

 void *opcode_labels[] = {
   &&opcode1,
   &&opcode2,
   &&opcode3,
   ...
 };

Each opcode value can then be taken as an offset to this array as follows:

 goto *opcode_labels[current_opcode];
Computed Goto Core
The computed goto core uses the mechanism described above to dispatch the various opcodes. After each opcode is executed, the next opcode in the incoming bytecode stream is looked up in the table and dispatched from there.
Predereferenced Computed Goto Core
In the precomputed goto core, the bytecode stream is preprocessed to convert opcode numbers into the respective labels. This means they don't need to be looked up each time, the opcode can be jumped to directly as if it was a label. Keep in mind that the dispatch mechanism must be used after every opcode, and in large programs there could be millions of opcodes. Even small savings in the number of machine code instructions between opcodes can make big differences in speed.

Specialty Cores[edit | edit source]

GC Debug Core
Debugger Core
Profiling Core
Tracing Core


Previous Parrot Virtual Machine Next
IMCC and PIRC PMC System