ROSE Compiler Framework/Abstract Syntax Tree

From Wikibooks, open books for an open world
Jump to navigation Jump to search

The main intermediate representation of ROSE is its abstract syntax tree (AST). To use a programming language, you have to get familiar with the language syntax, semantics, etc. To use ROSE, you have to get familiar with its internal representation of an input code.

The best way to know AST is to visualize it using simplest code samples.

Visualization of AST[edit]


Three things are needed to visualize ROSE AST:

  • Sample input code: you provide it
  • a dot graph generator to generate a dot file from AST: ROSE provides dot graph generators
  • a visualization tool to open the dot graph: ZGRViewer and Graphviz are used by ROSE developers

If you don't want to install ROSE+ZGRview + Graphvis from scratch, you can directly use ROSE virtual machine image, which has everything you need installed and configured so you can just visualize your sample code.

Sample input code[edit]

Please prepare simplest input code without including any headers so you can get a small enough AST to digest.

Dot Graph Generator[edit]

We provide ROSE_INSTALLATION_TREE/bin/dotGeneratorWholeASTGraph (complex graph) and dotGenerator (a simpler version) to generate a dot graph of the detailed AST of input code.

Tools to generate AST graph in dot format. There are two versions

  • dotGenerator: simple AST graph generator showing essential nodes and edges
  • dotGeneratorWholeASTGraph: whole AST graph showing more details. It provides filter options to show/hide certain AST information.

command line:

 dotGeneratorWholeASTGraph  yourcode.c  // it is best to avoid including any header into your input code to have a small enough tree to visualize.
dotGeneratorWholeASTGraph --help
   -rose:help                     show this help message
   -rose:dotgraph:asmFileFormatFilter           [0|1]  Disable or enable asmFileFormat filter
   -rose:dotgraph:asmTypeFilter                 [0|1]  Disable or enable asmType filter
   -rose:dotgraph:binaryExecutableFormatFilter  [0|1]  Disable or enable binaryExecutableFormat filter
   -rose:dotgraph:commentAndDirectiveFilter     [0|1]  Disable or enable commentAndDirective filter
   -rose:dotgraph:ctorInitializerListFilter     [0|1]  Disable or enable ctorInitializerList filter
   -rose:dotgraph:defaultFilter                 [0|1]  Disable or enable default filter
   -rose:dotgraph:defaultColorFilter            [0|1]  Disable or enable defaultColor filter
   -rose:dotgraph:edgeFilter                    [0|1]  Disable or enable edge filter
   -rose:dotgraph:expressionFilter              [0|1]  Disable or enable expression filter
   -rose:dotgraph:fileInfoFilter                [0|1]  Disable or enable fileInfo filter
   -rose:dotgraph:frontendCompatibilityFilter   [0|1]  Disable or enable frontendCompatibility filter
   -rose:dotgraph:symbolFilter                  [0|1]  Disable or enable symbol filter
   -rose:dotgraph:emptySymbolTableFilter        [0|1]  Disable or enable emptySymbolTable filter
   -rose:dotgraph:typeFilter                    [0|1]  Disable or enable type filter
   -rose:dotgraph:variableDeclarationFilter     [0|1]  Disable or enable variableDeclaration filter
   -rose:dotgraph:variableDefinitionFilter      [0|1]  Disable or enable variableDefinitionFilter filter
   -rose:dotgraph:noFilter                      [0|1]  Disable or enable no filtering
Current filter flags' values are: 
         m_asmFileFormat = 0 
         m_asmType = 0 
         m_binaryExecutableFormat = 0 
         m_commentAndDirective = 1 
         m_ctorInitializer = 0 
         m_default = 1 
         m_defaultColor = 1 
         m_edge = 1 
         m_emptySymbolTable = 0 
         m_expression = 0 
         m_fileInfo = 1 
         m_frontendCompatibility = 0 
         m_symbol = 0 
         m_type = 0 
         m_variableDeclaration = 0 
         m_variableDefinition = 0 
         m_noFilter = 0 

Dot Graph Visualization[edit]

To visualize the generated dot graph, you have to install

Please note that you have to configure ZGRViewer to have correct paths to some commands it uses. You can do it from its configuration/setting menu item. Or directly modify the text configuration file (.zgrviewer).

One example configuration is shown below (cat .zgrviewer)

<?xml version="1.0" encoding="UTF-8"?>
<zgrv:config xmlns:zgrv="">
        <zgrv:tmpDir value="true">/tmp</zgrv:tmpDir>
    <zgrv:webBrowser autoDetect="true" options="" path=""/>
    <zgrv:proxy enable="false" host="" port="80"/>
    <zgrv:preferences antialiasing="false" cmdL_options=""
        highlightColor="-65536" magFactor="2.0" saveWindowLayout="false"
        sdZoom="false" sdZoomFactor="2" silent="true"/>

You have to configure the script to have correct path also



# If you want to be able to run ZGRViewer from any directory,
# set ZGRV_HOME to the absolute path of ZGRViewer's main directory
# e.g. ZGRV_HOME=/usr/local/zgrviewer


java -jar $ZGRV_HOME/target/zgrviewer-0.8.1.jar "$@"

Example session[edit]

A complete example

# make sure the environment variables(PATH, LD_LIBRARY_PATH) for the installed rose are correctly set
which dotGeneratorWholeASTGraph

# run the dot graph generator
dotGeneratorWholeASTGraph -c ttt.c

#see it

example output[edit]

We put some example source files and their AST dump files into:

Sanity Check[edit]

We provide a set of sanity check for AST. We use them to make sure the AST is consistent. It is also highly recommended that ROSE developers add a sanity check after their AST transformation is done. This has a higher standard than just correctly unparsed code to compilable code. It is common for an AST to unparse correctly but then fail on the sanity check.

The recommend sanity check is

  • AstTests::runAllTests(project); from src/midend/astDiagnostics. Internally, it calls the following checks:
    • TestAstForProperlyMangledNames
    • TestAstCompilerGeneratedNodes
    • AstTextAttributesHandling
    • AstCycleTest
    • TestAstTemplateProperties
    • TestAstForProperlySetDefiningAndNondefiningDeclarations
    • TestAstSymbolTables
    • TestAstAccessToDeclarations
    • TestExpressionTypes
    • TestMangledNames::test()
    • TestParentPointersInMemoryPool::test()
    • TestChildPointersInMemoryPool::test()
    • TestMappingOfDeclarationsInMemoryPoolToSymbols::test()
    • TestLValueExpressions
    • TestMultiFileConsistancy::test() //2009
    • TestAstAccessToDeclarations::test(*i); // named type test

There are some other functions floating around. But they should be merged into AstTests::runAllTests(project)

  • FixSgProject(*project); //in Qing's AST interface
  • Utility::sanityCheck(SgProject* )
  • Utility::consistencyCheck(SgProject*) // SgFile*

Text Output of an AST[edit]

Just call: SgNode::unparseToString(). You can call it from any SgLocatedNode within the AST to dump partial AST's text format.

AST Iterator[edit]

1) The iterator class: The iterator follows the STL iterator pattern and is implemented as pre-order traversal and maintains its own stack. The iterator performs the exact same traversal as the traversal classes in ROSE (it is using the same underlying information):

#include "RoseAst.h"
SgNode* node= .... // any subtree

RoseAst ast(node);

for(RoseAst::iterator i=ast.begin();i!=ast.end();++i) {
   cout<<"We are here:"<<(*i)->class_name()<<endl;

Some more features:

  • By default it is not traversing null pointers (you won't see them). However, if you want to see&traverse also all the null pointers, you can use the begin function with: ast.begin().withNullValues()
  • It also has a feature to exclude subtrees from traversing during the traversal: You can simply call on the *iterator*:
    • i.skipChildrenOnForward(); ++i; // skips the children of current node and goes to the next node that follows in the traversal after all those children

Relevant sourcefiles

Content of AST[edit]


Some useful member functions

  • get_base_type() :member function on some IR nodes derived from SgType and returns the non-recursively striped (immediate) type under the typedefs, reference, pointers, arrays, modifiers, etc.
  • findBaseType() recursively strip away all
        typedefs, SgTypedefType
        reference, SgReferenceType
        pointers, SgPointerType
        arrays, SgArrayType
        modifiers SgModifierType

Returns hidden type beneath layers of typedefs, pointers, references, modifiers, array representation, etc.

  • SgType * stripTypedefsAndModifiers () const

File location information[edit]

All AST nodes with file location information derive from SgLocatedNode, which has start and end Sg_File_Info to indicate begin and end location information.

You can obtain and printout the pair of location information by calling

locatedNode->get_startOfConstruct()->display() ; 

locatedNode->get_endOfConstruct()->display() ;

// get beginning info only
locatedNode->get_file_info()->display() ;

The output for display() may look like

Inside of Sg_File_Info::display(debug.......) 
     isTransformation                      = false 
     isCompilerGenerated                   = true (no position information) 
     isOutputInCodeGeneration              = false 
     isShared                              = false 
     isFrontendSpecific                    = true (part of ROSE support for gnu compatability) 
     isSourcePositionUnavailableInFrontend = false 
     isCommentOrDirective                  = false 
     isToken                               = false 
     file_id  = 2 
     filename = /home/liao6/daily-test-rose/upcwork/install/include/gcc_HEADERS/rose_edg_required_macros_and_functions.h 
     line     = 167  column   = 1 

.... // transformation generated, will be outputted by the unparser
upcr_pshared_ptr_t gsj;
Inside of Sg_File_Info::display(debug.......) 
     isTransformation                      = true (part of a transformation) 
     isCompilerGenerated                   = false 
     isOutputInCodeGeneration              = true (output in code generator) 
     isShared                              = false 
     isFrontendSpecific                    = false 
     isSourcePositionUnavailableInFrontend = false 
     isCommentOrDirective                  = false 
     isToken                               = false 
     file_id  = -3 
     filename = transformation 
     line     = 0  column   = 0 

As you can see, there are AST nodes generated by ROSE's frontends or by a translator. A transformation generated located node may not have line or column numbers.

You can get file name, line, column numbers

 SgLocatedNode* node =  .... ;

  Sg_File_Info* info_start = node->get_startOfConstruct ();
  size_t a_start = (size_t)info_start->get_line ();

   string filename = node->get_file_info()->get_filename();

  Sg_File_Info* info_end = node->get_endOfConstruct ();
  size_t a_end = (info_end == NULL) ? a_start : info_end->get_line ();

Preprocessing Information[edit]

In addition to nodes and edges, ROSE AST may have attributes in addition to nodes and edges that are attached for preprocessing information like #include or #if .. #else. They are attached before, after, or within a nearby AST node (only the one with source location information.)

An example translator will traverse the input code's AST and dump information which may include preprocessing information.

For example

exampleTranslators/defaultTranslator/preprocessingInfoDumper -c main.cxx
Found an IR node with preprocessing Info attached:
(memory address: 0x2b7e1852c7d0 Sage type: SgFunctionDeclaration) in file
/export/tmp.liao6/workspace/userSupport/main.cxx (line 3 column 1)
-------------PreprocessingInfo #0 ----------- :
classification = CpreprocessorIncludeDeclaration:
  String format = #include "all_headers.h"

relative position is = before

Source: (Chapter 29 - Handling Comments, Preprocessor Directives, And Adding Arbitrary Text to Generated Code)

AST matching[edit]

ROSE Compiler Framework/AST Matching

AST Construction[edit]

SageBuilder and SageInterface namespaces provide functions to create ASTs and manipulate them. Doxygen docs