ROSE Compiler Framework/Print version

From Wikibooks, open books for an open world
< ROSE Compiler Framework
Jump to: navigation, search

Contents

About the Book[edit]

FYI: http://wiki.rosecompiler.org redirects here.

Introduction[edit]

The goal of this book is to have a community documentation providing extensive and up-to-date instructional information about how to use the open-source ROSE compiler framework, developed at Lawrence Livermore National Laboratory .

While the ROSE project website (http://www.rosecompiler.org) already has a variety of official documentations, having a wikibook for ROSE allows anybody to contribute to gathering instructional information about this software.

Again, please note that this wikibook is not the official documentation of ROSE. It is the community efforts contributed by anyone just like you.

How To Contribute[edit]

  • Welcomed Contributions:
    • Fix typos and grammar of existing pages for quality, clarity, and readability.
    • Add new pages about ROSE-specific tutorials, how-tos, FAQ, and workflow
    • Start discussions on the Discussion tab of an existing page about new suggestions of how things can be done better
  • Unwelcome Contributions:
    • Rewriting existing documentation, link to it if necessary

References[edit]

Conventions[edit]

  • Technical names, identifiers, etc. should be enclosed in teletype, <tt></pre>

The FooBar class can be found in the foobar.cpp file.

  • Source code should use a highlighted code block:
<source lang="<language>">
<Code goes here...>
</ source>

(Enclosing code in a <pre></pre> block is also okay, but the highlighted code block is preferred.)

  • Headings:
    • Capitalize all words e.g. "Graph Processing"
  • Subheadings:
    • First word capitalized only

Tracking Wiki Changes[edit]

Learn how to "Track Changes": http://en.wikibooks.org/wiki/Help:Tracking_changes

Enable Email Notifications for Changes to this book[edit]

If you want to be notified of changes to this book, WikiBooks provides email notifications for changes to Wiki pages that you explicitly choose to watch.

To use this feature:

1. Create an account with WikiBooks: http://en.wikibooks.org/w/index.php?title=Special:UserLogin&returnto=Main+Page&type=signup

2. Login to WikiBooks and set your preferences (top right corner of the web page) for both email notifications and your watch list:

  • Email notification settings
    • Preferences-> User profile-> E-mail notifications -> E-mail me when a page on my watchlist is changed (check this on)
  • Define your watchlist
    • Preferences->Watchlist -> Advanced options -> you can select the options you want, such as "Add pages I edit to my watchlist" and "Add pages I create to my watchlist"
    • you can also individually watch and unwatch any wiki page: by click on the star on the page's tab list (after View history)

Caveat: we don't know if wikibooks supports users to watch one entire book. So far, you have to do this one page after another by editing them at some points.

Wikibook Writing Tips[edit]

1. What exactly is "BookCat" for? It is a category tag automatically added by wiki robot scripts.

2. Should "BookCat" be at the end of the document? Any position in the page should be fine. Having it at top may be better so it won't be accidentally deleted when we add new things at the bottom.

ROSE's Documentations[edit]

ROSE uses a range of materials to document the project.

Static content of ROSE web: http://rosecompiler.org/index.html

  • ROSE manual: the design, algorithm, and implementation details. Written in LaTeX, the content of the manual can come from published papers. It may contain intense academic citations and math formula.
  • ROSE tutorial: short code examples for tools built on top of ROSE, step-by-step instructions for doing things. Tightly integrated into ROSE's build system so each example is always verified to work with ROSE.
  • Doxygen web reference: class/namespace references of source code

Wiki

  • This wikibook: non-official, community documentation. Editable by anyone, aimed to supplement official documents and to collect tutorials, FAQ and quick pointers to important topics.

Obtaining ROSE[edit]

Git Repositories[edit]

ROSE's source files are managed by git, a distributed revision control and source code management system. There are several ways to download the source tree:

Internal Repos[edit]

  • Private Git Repositories Within LLNL
    • Private git repository hosted within Lawrence Livermore National Laboratory: the internal file path is /usr/casc/overture/ROSE/git/ROSE.git: central repository of ROSE, in most cases this is automatically updated by Jenkins only after incoming commits pass all regression tests
    • Private git repository hosted by github.llnl.gov: used for daily pushes and code review, accessible in LC. LC does not permit SSH, so you'll have to use HTTPS to clone repos there.

Note: You may encounter SSL certificate problems. If you do, simply disable SSL verification in cURL using either export GIT_SSL_NO_VERIFY=false or configuring git:

 $ git config --global http.sslVerify false


HTTP 401 is unauthorized access. If you're not prompted for your username/password in the Shell, then you need to use X-forwarding so the authentication windows will popup:

 $ ssh -X <LC machine>

Otherwise, create a ~/.netrc file with:


machine github.llnl.gov
  login <username>
  password <password>

EDG 4.x internal branch[edit]

We are working on a branch using more recent version of EDG frontends to have better C++ support.

If you work from a workstation from your office. You should generate your public-private key pair and send Justin your public key for authentication. After that, try

 git clone rose-dev@rosecompiler1:rose/edg4x/rose.git

If you work from an LC machine, try to use your OUN name via https (SSH is restricted) and make sure you have git 1.7.10 or later.

 git clone https://OUN@rose-github.llnl.gov/rose-compiler/edg4x-rose

You may have to modify the URLs to submodules if you clone ROSE from LC machines.

aztec1@too1:git clone https://rose-github.llnl.gov/rose-compiler/edg4x-rose.git
Initialized empty Git repository in /tmp/edg4x-rose/.git/
remote: Counting objects: 184706, done.
remote: Compressing objects: 100% (46828/46828), done.
remote: Total 184706 (delta 133529), reused 184562 (delta 133385)
Receiving objects: 100% (184706/184706), 242.37 MiB | 25.92 MiB/s, done.
Resolving deltas: 100% (133529/133529), done.

aztec1@too1:cd edg4x-rose/

aztec1@too1:vi .gitmodules

aztec1@too1:cat .gitmodules
[submodule "src/frontend/CxxFrontend/EDG"]
        path = src/frontend/CxxFrontend/EDG
        url = https://rose-github.llnl.gov/rose-compiler/edg.git
[submodule "projects/vulnerabilitySeeding"]
        path = projects/vulnerabilitySeeding
        url = https://rose-github.llnl.gov/rose-compiler/vulnerabilitySeeding.git

aztec1@too1:git submodule update --init
Submodule 'projects/vulnerabilitySeeding'
(https://rose-github.llnl.gov/rose-compiler/vulnerabilitySeeding.git)
registered for path 'projects/vulnerabilitySeeding'
Submodule 'src/frontend/CxxFrontend/EDG'
(https://rose-github.llnl.gov/rose-compiler/edg.git) registered for path
'src/frontend/CxxFrontend/EDG'
Initialized empty Git repository in
/tmp/edg4x-rose/projects/vulnerabilitySeeding/.git/
remote: Counting objects: 5992, done.
remote: Compressing objects: 100% (1121/1121), done.
remote: Total 5992 (delta 3178), reused 5992 (delta 3178)
Receiving objects: 100% (5992/5992), 958.13 KiB, done.
Resolving deltas: 100% (3178/3178), done.
Submodule path 'projects/vulnerabilitySeeding': checked out
'847d1d884c8b5098b4b2895259058f91f98a525e'
Initialized empty Git repository in
/tmp/edg4x-rose/src/frontend/CxxFrontend/EDG/.git/
remote: Counting objects: 3351, done.
remote: Compressing objects: 100% (1113/1113), done.
remote: Total 3351 (delta 2574), reused 2939 (delta 2167)
Receiving objects: 100% (3351/3351), 23.01 MiB | 10.51 MiB/s, done.
Resolving deltas: 100% (2574/2574), done.
Submodule path 'src/frontend/CxxFrontend/EDG': checked out
'8950d363a12d1071f600c1b51c9a6d2b41b0dae1'

aztec1@too1:echo $?
0

Public repos[edit]

EDG 4x Public Branch[edit]

The ROSE team has been working on a new EDG4x branch of ROSE from 2012-2013. This branch adds a great deal of robustness, especially for the C/C++ language support. This work is currently in a separate repository, publicly accessible in Github:

   https://github.com/rose-compiler/edg4x-rose

A note about this has also been added to the official ROSE website:

   http://rosecompiler.org/?page_id=14

We hope to have this new branch merged into the mainline soon, so stay tuned!

Here are some rough instructions to get you started with building and installing ROSE on your Linux machine:

1. Install the Boost C++ libraries. You can use my quick-and-dirty script: install-boost.sh.

2. Configure your environment

3. Run the $ROSE/build script to perform the Autotools bootstrap

4. Run the $ROSE/configure script with "--with-boost=/path/to/boost", "--prefix=/path/for/installation", and "--disable-java --disable-fortran" since Fortran and Java support is still working in progress. Optionally, you should configure --with-gomp_omp_runtime_library=/usr/apps/gcc/4.4.1/lib64/ if you want to play with OpenMP support of ROSE and link with GCC 4.4's OpenMP runtime library.

5. Run "make –j24 install-core" to build and install ROSE

Please note that we only support GCC 4.2.4 and GCC 4.4.5 currently on the ROSE-EDG4 repository. Full documentation is available on our website, http://rosecompiler.org, though instructions specific to ROSE-EDG4 are generally lacking.

Current Limitations[edit]

Currently, the EDG4x branch of ROSE is only tested with GCC 4.2.4 and 4.4.5 (and EDG binaries are only generated and released for these compiler versions).

  • Fortran and Java support is still working in progress, you should configure with "--disable-java --disable-fortran" since

Autoconf Configuration[edit]

The EDG4.7 frontend is enabled by default, you can just configure ROSE by using the following options

   $ $ROSE/configure --with-boost=/path/to/boost/installation --prefix=/path/for/rose/installation --disable-java --disable-fortran

Example Environment[edit]

   # Example setup.sh file to source relevant environment settings for building ROSE
   #
   # Update the paths to fit your environment…
   export BOOST_ROOT=/home/too1/local/opt/boost/1_45_0
   export BOOST_HOME="${BOOST_ROOT}"
   export JAVA_HOME=/home/too1/local/workspace/rose/edg3/keep_going/opt/java/jdk/1.7.0_15
   export LIBTOOL_HOME=/usr/apps/libtool/2.2.4
   export AUTOMAKE_HOME=/usr/apps/automake/1.9.6
   export LD_LIBRARY_PATH=":${JAVA_HOME}/lib:${JAVA_HOME}/jre/lib/amd64/server:${BOOST_ROOT}/lib:"
   export PATH="${JAVA_HOME}/bin:${AUTOMAKE_HOME}/bin:${LIBTOOL_HOME}/bin:$PATH"
   export PATH="/usr/apps/git/latest/bin:$PATH"
   # GCC 4.4.1
   source /nfs/apps/mpfr/2.4.1/setup.sh
   source /nfs/apps/gmp/4.3.1/setup.sh
   source /nfs/apps/gcc/4.2.4/setup.sh

Virtual Machine Image[edit]

It can take quite some time to install ROSE for the first time. A virtual machine image is provided with a Ubuntu 10.04 OS with ROSE already installed.

You can download it and play it using VMware Player

Downloading The Virtual Machine Image[edit]

More information is at ROSE Virtual Machine Image

git 1.7.10 or later for github.com[edit]

github requires git 1.7.10 or later to avoid https cloning errors, as mentioned at https://help.github.com/articles/https-cloning-errors

Ubuntu 10.04's package repository contains git 1.7.0.4. So building later version of git is needed. But you still need an older version of git to get the latest version of git.

 apt-get install git-core

Now you can clone the latest git

 git clone https://github.com/git/git.git

Install all prerequisite packages needed to build git from source files(assuming you already installed GNU tool chain with GCC compiler, make, etc.)

 sudo apt-get install gettext zlib1g-dev asciidoc libcurl4-openssl-dev
 $ cd git  # enter the cloned git directory
 $ make configure ;# as yourself
 $ ./configure --prefix=/usr ;# as yourself
 $ make all doc ;# as yourself
 # make install install-doc install-html;# as root

EDG source code[edit]

If you have an EDG license, we can provide you with ROSE's EDG source code. The original, official EDG source code does NOT work with ROSE since we have modified EDG to better serve our purposes.

Note: We provide you with a snapshot of our Git revision controlled ROSE-EDG source code repository. This way, you can more easily contribute your EDG modifications back into ROSE.

1. Send your EDG (research) license to two ROSE staff members, just in case one is on vacation or on travel.

2. Provide ROSE staff with a drop-off location for the EDG source code (ssh or ftp server, etc.)

3. Once you receive the EDG source code, you have two options:

As a submodule[edit]

a. Use ROSE-EDG as a submodule (assuming you have ROSE's Git source tree):

This is the recommended way to use the EDG git repo we provide. So the assumption is that you use a local git clone of ROSE($ROSE).

Edit submodule path in $ROSE/.gitmodules to point to your ROSE-EDG repository:

[submodule "src/frontend/CxxFrontend/EDG"]
       path = src/frontend/CxxFrontend/EDG
-       url = ../ROSE-EDG.git
+       url = <path/to/your/ROSE-EDG.git>
-[submodule "projects/vulnerabilitySeeding"]
-       path = projects/vulnerabilitySeeding
-       url = ../vulnerabilitySeeding.git

Run git-submodule commands:

$ cd $ROSE
$ git submodule init
$ git submodule update

The commands above will check out a version of the EDG submodule and save it into ROSE/src/frontend/CxxFrontend/EDG

As a Drop-in[edit]

b. As a Drop-in

Move ROSE-EDG tarball into its correct location within the ROSE source tree: $ROSE/src/frontend/EDG

  $ tar xzvf ROSE-EDG-b12158aa2.tgz
  $ ls
  EDG  ROSE-EDG-b12158aa2.tgz
  $ mv EDG $ROSE/src/frontend/EDG

Warning: This method may not work because EDG is a submodule of ROSE and therefore, requires a version synchronization between the two. For example, the latest version of ROSE may not use the latest version of ROSE's EDG.

The remaining steps[edit]

4. In ROSE, run the $ROSE/build script from the top-level of the ROSE source tree, i.e. $ROSE. This script bootstraps Autotools, including the Makefile.ams in the EDG source tree.

5. Configure and build ROSE: Normally, during this process ROSE would attempt to download an EDG binary tarball for you, but since you have the source code, this step will be skipped.


EDG tarball[edit]

Process[edit]

If you don't have access to the EDG source code, you will be able to automatically download a packaged EDG binary tarball during the ROSE build process. The download is triggered during make in $ROSE_BUILD/src/frontend/CxxFrontend.

The EDG binary version is a computed binary compatibility signature relative to your version of ROSE. You can check this version by running the $ROSE/scripts/bincompat-sig, for example:

$ ./scripts/bincompat-sig 
7b1930fafc929de85182ee1a14c86758

You may encounter this error:

$ ./scripts/bincompat-sig 
Unable to find a remote tracking a canonical repository.  Please add a
canonical repository as a remote and ensure it is up to date.  Currently
configured remotes are:

   origin => https://github.com/rose-compiler/rose

Potential canonical repositories include:

   anything ending with "rose.git" (case insensitive)

If you do, simply add ".git" to the end of your origin's URL path. In our example, this translates to:

https://github.com/rose-compiler/rose.git

List of binaries[edit]

View the list of available EDG binaries here: http://www.rosecompiler.org/edg_binaries/edg_binaries.txt.

EDG binaries are generated for these platforms (Last updated on 12/22/2012):

Platform EDG 3.3 EDG 4.0
amd64-linux GCC 3.4.6, 4.0.4, 4.1.2, 4.2.4, 4.3.2, 4.4.1 GCC 3.4.6, 4.0.4, 4.1.2, 4.2.4, 4.3.2, 4.4.1
i686-linux GCC 3.4.6, 4.0.4, 4.1.2, 4.2.4, 4.3.2, 4.4.1 GCC 3.4.6, 4.0.4, 4.1.2, 4.2.4, 4.3.2, 4.4.1
32bit-macos-10.5 GCC 4.0.4
64bit-macos-10.6 GCC 4.2.4
64bit-x86_64-macos-10.6 GCC 4.2.4
34bit-debian GCC 3.4.6, 4.0.4, 4.1.2, 4.2.4, 4.3.2, 4.4.1 GCC 3.4.6, 4.0.4, 4.1.2, 4.2.4, 4.3.2, 4.4.1

Installation[edit]

ROSE is released as an open source software package. Users are expected to compile and install the software.

Platform Requirement[edit]

ROSE is portable to Linux and Mac OS X on IA-32 and x86-64 platforms. In particular, ROSE developers often use the following development environments:

  • Red Hat Enterprise Linux 5.6 or its open source equivalent Centos 5.6
  • Ubuntu 10.04.4 LTS. Higher versions of Ubuntu are NOT supported due to the GCC versions supported by ROSE.
  • Mac OS X 10.5 and 10.6

Software Requirement[edit]

Here is a list for prerequisite software packages for installing ROSE

  • GCC 4.0.x to 4.4.x , the range of supported GCC versions is checked by support-rose.m4 during configuration
    • gcc
    • g++
    • gfortran (optional for Fortran support)
  • boost library: version 1.36 to 1.47. Again the range of supported Boost versions is checked by support-rose.m4 during configuration
  • Sun Java JDK
  • GNU autoconf >=2.6 and automake >= 1.9.5, GNU m4 >=1.4.5
  • libtool: >=1.5.6, defined in rose/build
  • bison (byacc),
  • flex
  • glibc-devel
  • git
  • ZGRViewer, a GraphViz/DOT Viewer: essential to view dot graphs of ROSE AST
    • install Graphviz first - Graph Visualization Software

Optional packages for additional features or advanced users

  • libxml2-devel
  • sqlite
  • texlive-full, need for building LaTeX docs

LLNL internal environment[edit]

For LLNL internal developers, we already have all software installed. They can be accessed by source

  • Redhat 5: /nfs/casc/overture/ROSE/opt/rhel5/x86_64/rose_environment.sh
  • Redhat 6: /nfs/casc/overture/ROSE/opt/rhel6/x86_64/rose_environment.sh


Some typical software paths are:

__GIT_HOME="/nfs/casc/overture/ROSE/opt/rhel5/x86_64/git"
__BOOST_HOME="/nfs/casc/overture/ROSE/opt/rhel5/x86_64/boost"
__GCC_HOME="/nfs/casc/overture/ROSE/opt/rhel5/x86_64/gcc"
__GHC_HOME="/nfs/casc/overture/ROSE/opt/rhel5/x86_64/ghc"
__JAVA_HOME="/nfs/casc/overture/ROSE/opt/rhel5/x86_64/java/jdk"
__PYTHON_HOME="/nfs/casc/overture/ROSE/opt/rhel5/x86_64/python"
__LIBTOOL_HOME="/nfs/casc/overture/ROSE/opt/rhel5/x86_64/libtool"
__AUTOCONF_HOME="/nfs/casc/overture/ROSE/opt/rhel5/x86_64/autoconf"
__AUTOMAKE_HOME="/nfs/casc/overture/ROSE/opt/rhel5/x86_64/automake"
__LLVM_CLANG_HOME="/nfs/casc/overture/ROSE/opt/rhel5/x86_64/llvm/clang"
__CMAKE_HOME="/nfs/casc/overture/ROSE/opt/rhel5/x86_64/cmake"
__CCACHE_HOME="/nfs/casc/overture/ROSE/opt/rhel5/x86_64/ccache"
__GHOSTSCRIPT_HOME="/nfs/casc/overture/ROSE/opt/rhel5/x86_64/ghostscript"
__GRAPHVIZ_HOME="/nfs/casc/overture/ROSE/opt/rhel5/x86_64/graphviz"
__XERCES_HOME="/nfs/casc/overture/ROSE/opt/rhel5/x86_64/xerces"

Default versions are

#-------------------------------------------------------------------------------
# Set Defaults
#-------------------------------------------------------------------------------
BOOST_VERSION_DEFAULT=1_45_0
GCC_VERSION_DEFAULT=4.4.5
GHC_VERSION_DEFAULT=6.10.4
GIT_VERSION_DEFAULT=latest
JAVA_VERSION_DEFAULT=latest
PYTHON_VERSION_DEFAULT=2.7.5
LIBTOOL_VERSION_DEFAULT=2.4
AUTOCONF_VERSION_DEFAULT=2.69
AUTOMAKE_VERSION_DEFAULT=1.14
LLVM_CLANG_VERSION_DEFAULT=3.0
CMAKE_VERSION_DEFAULT=latest
CCACHE_VERSION_DEFAULT=latest
GHOSTSCRIPT_VERSION_DEFAULT=latest
GRAPHVIZ_VERSION_DEFAULT=latest
XERCES_VERSION_DEFAULT=latest

Installing boost[edit]

The installation of Boost may need some special attention.

Download a supported boost version from http://sourceforge.net/projects/boost/files/boost/

For version 1.36 to 1.38

./configure --prefix=/home/usera/opt/boost-1.35.0
 make
 make install

Ignore the warning like : Unicode/ICU support for Boost.Regex?... not found.

For version 1.39 and 1.47: create the boost installation directory first

In boost source tree

  • ./bootstrap.sh --prefix=your_boost_install_path
  • ./bjam -j4 install --prefix=your_boost_install_path --libdir=your_boost_install_path/lib


Remember to export LD_LIBRARY_PATH for the installed boost library, for example

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/leo/opt/boost_1.45.0_inst/lib
export PATH LD_LIBRARY_PATH

Installing Java JDK[edit]

Download Java SE JDK from http://www.oracle.com/technetwork/java/javase/downloads/index.html

For example, you can download jdk-7u5-linux-i586.tar.gz for your Linux 32-bit system.

After untar it to your installation path, remember to set environment variables for Java JDK

# jdk path should be search first before other paths
PATH=/home/leo/opt/jdk1.7.0_05/bin:$PATH

# lib path for libjvm.so
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/leo/opt/jdk1.7.0_05/jre/lib/i386/server

# Don't forget to export both variables!!
export PATH LD_LIBRARY_PATH

./build[edit]

In general, it is better to rebuild the configure file in the top level source directory of ROSE. Just type:

 rose_sourcetree>./build

configure[edit]

The next step is to run configure in a separated build tree. ROSE will complain if you try to build it within its source directory.

There are many configuration options. You can see the full list of options by typing ../sourcetree/configure --help . But only --prefix and --with-boost are required as the minimum options.

 mkdir buildrose
 cd buildrose
 ../rose_sourcetree/configure --prefix=/home/user/opt/rose_tux284 --with-boost=/home/user/opt/boost-1.36.0/

ROSE's configure turns on debugging option by default. The generated object files should already have debugging information.

Additional useful configure options

  • Specify where a gcc's OpenMP runtime library libgomp.a is located. Only GCC 4.4's gomp lib should be used to have OpenMP 3.0 support
    • --with-gomp_omp_runtime_library=/usr/apps/gcc/4.4.1/lib/

EDG 4.x Branch[edit]

We are upgrading ROSE to use a later version of EDG frontend version 4.7.


Special configuration options are needed to turn on this new frontend, as follows:


../sourcetree/configure --prefix=/export/tmp.liao6/workspace/edg-4-7-rose/buildtree/install --with-boost=/export/tmp.liao6/opt/gcc4.4.1/boost_1_40_0-inst/ --with-ROSE_LONG_MAKE_CHECK_RULE=yes --with-gomp_omp_runtime_library=/usr/apps/gcc/4.4.1/lib64/

By default, ROSE is configured with GCC's -O2 and -g options by default so the translators shipped with ROSE should already have some debugging information available. But some variables may be optimized away. To preserve the max debugging information, you may have to reconfigure/recompile rose to turn off GCC optimizations.

  • --with-C_OPTIMIZE=-O0 --with-CXX_OPTIMIZE=-O0 // in the configuration option list
  • --without-CXX_OPTIMIZE --without-C_OPTIMIZE // alternative method
  • --with-C_OPTIMIZE=no --with-CXX_OPTIMIZE=no // third way

make[edit]

In ROSE's build tree, type

 cd buildrose
 make -j4

will build the entire ROSE, including librose.so, tutorials, projects, tests, and so on. -j4 means to use four processes to perform the build. You can have bigger numbers if your machine supports more concurrent processes. Still, the entire process will take hours to finish.

For most users, building librose.so should be enough for most of their work. In this case, just type

 make -C src/ -j4


tun off silent build[edit]

By default, the Automake build system will use silent build to reduce screen output. Details about compilation/linking command lines and options are hidden. In case you want to see the full command lines, you can pass an option to make, like "make V=1".

More background information about this subject is available at https://www.flameeyes.eu/autotools-mythbuster/automake/silent.html.

make check[edit]

Optionally, you can type make check to make sure the compiled rose pass all its shipped tests. This takes hours again to go through all make check rules within projects, tutorial, and tests directories.

To save time, you can just run partial tests under a selected directory, like the buildrose/tests

 make -C tests/ check -j4

make install[edit]

After "make", it is recommended to run "make install" so rose's library (librose.so), headers (rose.h) and some prebuilt rose-based tools can be installed under the specified installation path using --prefix.

set environment variables[edit]

After the installation, you should set up some standard environment variables so you can use rose. For bash, the following is an example:

ROSE_INS=/home/userx/opt/rose_installation_tree
PATH=$PATH:$ROSE_INS/bin
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ROSE_INS/lib
# Don't forget to export variables !!!
export PATH LD_LIBRARY_PATH

try out a rose translator[edit]

There are quite some pre-built rose translators installed under $ROSE_INS/bin.

You can try identityTranslator, which just parses input code, generates AST, and unparses it back to original code:

  identityTranslator -c helloWorld.c

It should generate an output file named rose_helloWorld.c, which should just look like your input code.

Trouble shooting[edit]

We list common issues associated with ROSE's installation.

EDG binary[edit]

If you do not have the EDG frontend source code, ROSE's build system will automatically attempt to download an appropriate EDG binary using wget during the build process (i.e. make -C src/frontend/CxxFrontend).

The EDG binaries are platform-specific and have historically been a cause of issues, i.e. Autoconf detecting wrong host/build/platform types. One possible remedy to these problems is to use the Autoconf Build and Host Options:

1. Check what build system Autoconf thinks you have:

$ ./config/config.guess 
x86_64-unknown-linux-gnu

2. Use the appropriate Autoconf options during configuration of ROSE:

$ $ROSE/configure [--build|--host|--target|...]

See Using the Target Type.

A real user's solution:

Hi Justin,

Checking the config.guess file in source tree, I search the apple darwin for detail information in  --build option, 
then  I found that  UNAME-PROCESSOR and UNAME_RELEASE are needed in --build

First, I type uname  -m (for finding UNAME_PROCESSOR in config.guess)
 result :  x86_64 
Second, I type uname -r (for finding UNAME_RELEASE)
 result : 10.8.0 (darwin kernel version)

Third, I type command to configure again, but I added  --build option, then autoconf can directly find the detail platform type
 
/Users/ma23/ROSE/configure --with-CXX_DEBUG=-ggdb3 --with-CXX_WARNINGS=-Wall --with-boost=/Users/ma23/Desktop/ROSE/boost/BOOST_INSTALL 
--with-gfortran=/Users/ma23/Desktop/macports/bin/gfortran-mp-4.4 --with-alternate_backend_fortran_compiler=gfortran-mp-4.4 
GFORTRAN_PATH=/Users/ma23/Desktop/macports/bin/gfortran-mp-4.4 --build=x86_64-apple-darwin10 

At last, make :)

Thanks:)
Regards,
Hongyi Ma

Virtual machine image[edit]

The goal of this page is to document

  • How users can download the virtual machine image (or virtual appliance) and use ROSE out of box.
  • how the virtual machine image for a fully installed ROSE is created.

How to use the virtual machine image[edit]

Obtain the Virtual Machine Image[edit]

Download the virtual machine image created by using VMware Player:

Warning: LLNL users may not be able to download it due to limitations to max downloaded file size within LLNL. It may also be against LLNL's security policy to run a virtual machine without authorization. So this image should not be used inside LLNL.

Content of the VM Image[edit]

Copy&paste from README within the virtual machine

This is a virtual machine image for the ROSE source-to-source compiler framework.

sourcetree, cloned from github.com/rose-compiler/rose on July 21, 2012

  • /home/demo/rose

buildtree

  • /home/demo/buildrose

installation tree (--prefix path)

  • /home/demo/opt/rose-inst

A script to set environment variables to use the installed ROSE tools

  • /home/demo/set.rose.env

A test translator

  • /home/demo/myTranslator

Some dot graphs of a simplest function. Type "run.sh file.dot" will view a dot file

  • /home/demo/dotGraphs

Install VMware Player[edit]

You have to install VMware Player to your machine to use the virtual machine image.

Goto http://www.vmware.com/go/downloadplayer/

Select the right bundle for your platform. For example: VMware-Player-4.0.4-744019.i386.txt

After downloading (assuming you are using Ubuntu 10.04)

  • chmod a+x VMware-Player-4.0.4-744019.i386.txt
  • sudo ./VMware-Player-4.0.4-744019.i386.txt
  • follow the GUI to finish the installation

To start VMPlayer, goto Menu->Applications-> System Tools -> VMware Player

Open/Play the virtual machine[edit]

After downloading and untar the tar.gz package to a directory, use VMware player to open the configuration file of the directory.

How was the virtual machine made[edit]

Host Machine[edit]

We used Ubuntu 10.04 LTS as a host machine to create the virtual machine image.

uname -a
Linux 8core-ubuntu 2.6.32-41-generic-pae #91-Ubuntu SMP Wed Jun 13 12:00:09 UTC 2012 i686 GNU/Linux

cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.04
DISTRIB_CODENAME=lucid
DISTRIB_DESCRIPTION="Ubuntu 10.04.4 LTS"

Configurations[edit]

VMware player has been installed onto the host machine, as described above.

Basic configuration for the virtual machine

Hardware

  • Memory : 2 GB
  • Processors: 2
  • Hard Disk size: 15 GB: We would like to keep it small while having enough space for users.
    • 5GB is used for Ubuntu system files and
    • 10GB for the demonstration user's home directory
  • Network Adapter: NAT: share the host's IP address

OS

  • OS: Ubuntu 10.04 LTS
  • Demonstration user account (sudo user in Ubuntu):
    • account: demo
    • password: password
  • screen size: 1280x960 (4:3)

Download Ubuntu 10.04 LTS http://releases.ubuntu.com/lucid/ We currently use the i386 desktop ISO as the start point

Within the virtual machine[edit]

We installed Software Prerequisites

  • sudo apt-get install gcc g++ gfortran
  • sudo apt-get install autoconf automake libtool
  • sudo apt-get install git-core bison flex texlive-full graphviz python-all-dev

We then installed ROSE

How to create a new virtual image[edit]

Here are some general guidelines for creating a new virtual machine. Following these exact steps are not required, although they are recommended to ensure a consistent user experience with the ROSE VM's.

Please make sure you document the whole process in its entirety.

These steps must be performed within the VM (guest OS):

1. Install the prerequisite software using the platform's software package manager. Only as a last resort should you manually install software. Use the platform's default software versions if possible. (Use bash as the default login shell.)

2. Setup the ROSE workspace

  $ export ROSE_HOME=${HOME}/development/projects/rose
  $ export ROSE_SOURCE=${HOME}/development/projects/rose/src
  $ export ROSE_INSTALL=${HOME}/development/opt/rose
 
  $ mkdir -p "$ROSE_HOME"
  $ mkdir -p "$ROSE_INSTALL"

3. Clone the ROSE repository as src

  $ cd "$ROSE_HOME"
  $ git clone https://github.com/rose-compiler/rose "$ROSE_SOURCE"

4. Configure, build, and install ROSE

  $ cd "$ROSE_SOURCE"
 
  # Run ROSE's Autotools bootstrap script
  $ "${ROSE_SOURCE}/build"
 
  # Configure ROSE using the minimal useful configuration
  $ "${ROSE_SOURCE}/configure" --prefix="${ROSE_INSTALL}" --with-boost=</local/opt>
 
  $ make
  $ make install

5. Verify ROSE installation works

6. Create simple demo translator(s) in $ROSE_HOME/demo

ROSE tools[edit]

Overview[edit]

ROSE is a compiler framework to build customized compiler-based tools. A set of example tools are provided as part of the ROSE release to demonstrate the use of ROSE. Some of them are also useful for daily work of ROSE developers.

We list and briefly explain some tools built using ROSE. They are installed under ROSE_INSTALLATION_TREE/bin .


Any ROSE tool(translator) works like a compiler. You have to provide all necessary compilation flags to make it work properly. You have to specify some include paths and/or some macro to be defined.

  • One way to use rose tools is to replace your default compiler (gcc for example) with the rose tool command (identifyTransaltor, outline, etc) in your Makefile. So the tool will be called with all correct flags when you type "make".
  • Alternatively, if you are only interested in processing a single file. You can manually watch the full compilation command line (e.g. gcc .... -c ) used to compile that single file during a normal "make". Then replace your compiler (gcc) with the rose tool you are interested in (e.g. outline) in the command line.

Prerequisites[edit]

You have to install ROSE first, by typing configure, make, make install, etc.

You also have to set the environment variables properly before you can call ROSE tools from command line.

For example: if the installation path (or --prefix path in configure) is /home/opt/rose/install, you can have the following script to set the environment variables using bash:

ROSE_INS=/home/opt/rose/install
export ROSE_INS 

PATH=$ROSE_INS/bin:$PATH
export PATH

LD_LIBRARY_PATH=$ROSE_INS/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH

All ROSE tools are installed under the path specified by the --prefix option.

identityTranslator[edit]

Source: http://www.rosecompiler.org/ROSE_Tutorial/ROSE-Tutorial.pdf (chapter 2)

This is the simplest tool built using ROSE. It takes input source files , builds AST, and then unparses the AST back to compilable source code. It tries its best to preserve everything from the input file.

Uses[edit]

Typical use cases

  • without any options, test if ROSE can compile your code: replace the compiler used by your Makefile with identityTranslator
  • turn on some built-in analysis, translation or optimization phases, such as -rose:openmp:lowering to support OpenMP
    • type "identityTranslator --help" to see all options
  • debug a ROSE-based translator: the first step is often to use identityTranslator to rule out if it is a compilation problem using ROSE
  • use the source of the identityTranslator as a start point to add custom analysis and transformation. The code in the identityTranslator is indeed the minimum code required for almost all kinds of ROSE-based tools.

Source code[edit]

identityTranslator.c

#include "rose.h"
int main(int argc, char *argv[]){
	// Build the AST used by ROSE
	SgProject *project = frontend(argc, argv);
 
	// Run internal consistency tests on AST
	AstTests::runAllTests(project);
 
	// Insert your own manipulation of the AST here...
 
	// Generate source code from AST and call the vendor's compiler
	return backend(project);
}

Limitations[edit]

Due to limitations of the frontends used by ROSE and some internal processing, identityTranslator cannot generate 100% identical output compared to the input file.

Some notable changes it may introduce include:

  • "int a, b, c;" are transformed to three SgVariableDeclaration statements,
  • macros are expanded.
  • extra brackets are added around constants of typedef types (e.g. c=Typedef_Example(12); is translated in the output to c = Typedef_Example((12));)
  • Converting NULL to 0.

AST dot graph generators[edit]

Tools to generate AST graph in dot format. There are two versions

  • dotGenerator: simple AST graph generator showing essential nodes and edges
  • dotGeneratorWholeASTGraph: whole AST graph showing more details. It provides filter options to show/hide certain AST information.

command line:

 dotGeneratorWholeASTGraph  yourcode.c  // it is best to avoid include any header into your sample code to have a small enough tree to visualize!
dotGeneratorWholeASTGraph --help
   -rose:help                     show this help message
   -rose:dotgraph:asmFileFormatFilter           [0|1]  Disable or enable asmFileFormat filter
   -rose:dotgraph:asmTypeFilter                 [0|1]  Disable or enable asmType filter
   -rose:dotgraph:binaryExecutableFormatFilter  [0|1]  Disable or enable binaryExecutableFormat filter
   -rose:dotgraph:commentAndDirectiveFilter     [0|1]  Disable or enable commentAndDirective filter
   -rose:dotgraph:ctorInitializerListFilter     [0|1]  Disable or enable ctorInitializerList filter
   -rose:dotgraph:defaultFilter                 [0|1]  Disable or enable default filter
   -rose:dotgraph:defaultColorFilter            [0|1]  Disable or enable defaultColor filter
   -rose:dotgraph:edgeFilter                    [0|1]  Disable or enable edge filter
   -rose:dotgraph:expressionFilter              [0|1]  Disable or enable expression filter
   -rose:dotgraph:fileInfoFilter                [0|1]  Disable or enable fileInfo filter
   -rose:dotgraph:frontendCompatibilityFilter   [0|1]  Disable or enable frontendCompatibility filter
   -rose:dotgraph:symbolFilter                  [0|1]  Disable or enable symbol filter
   -rose:dotgraph:emptySymbolTableFilter        [0|1]  Disable or enable emptySymbolTable filter
   -rose:dotgraph:typeFilter                    [0|1]  Disable or enable type filter
   -rose:dotgraph:variableDeclarationFilter     [0|1]  Disable or enable variableDeclaration filter
   -rose:dotgraph:variableDefinitionFilter      [0|1]  Disable or enable variableDefinitionFilter filter
   -rose:dotgraph:noFilter                      [0|1]  Disable or enable no filtering
Current filter flags' values are: 
         m_asmFileFormat = 0 
         m_asmType = 0 
         m_binaryExecutableFormat = 0 
         m_commentAndDirective = 1 
         m_ctorInitializer = 0 
         m_default = 1 
         m_defaultColor = 1 
         m_edge = 1 
         m_emptySymbolTable = 0 
         m_expression = 0 
         m_fileInfo = 1 
         m_frontendCompatibility = 0 
         m_symbol = 0 
         m_type = 0 
         m_variableDeclaration = 0 
         m_variableDefinition = 0 
         m_noFilter = 0 

More information about how to use the tools can be found at How_to_visualize_AST

AST Outliner[edit]

Basic concept: outlining is the process of replacing a block of consecutive statements with a function call to a new function containing those statements. Conceptually, outlining the inverse of inlining.

ROSE provide a builtin translator called AST outliner, which can outline a specified portion of code and generate a function from it.

Official documentation for the AST outliner is located in Chapter 37 Using the AST Outliner with the ROSE Tutorial. pdf.

Supplemental information can be found here at ROSE Compiler Framework/outliner

Inliner[edit]

The ROSE Inliner inlines functions at function callsites.

Official documentation about the Inliner is Chapter 36 Calling the Inliner of ROSE tutorial: http://rosecompiler.org/ROSE_Tutorial/ROSE-Tutorial.pdf

Test directory with an example translator and test input files

By looking into Makefile.am, the example translator's source code will generate an executable named "inlineEverything" in your buildtree.

This is the tool you can try to inline your sample code.

The same Makefile.am's make check rules contain sample command lines to use the tool.

call graph generator[edit]

Command line:

 buildCallGraph -c yourprogram.cpp

Control flow graph generator[edit]

Command line:

 virtualCFG -c yourprogram.c

autoPar[edit]

This is a tool which can automatically insert OpenMP pragmas into input serial C/C++ codes

This tool is an implementation of automatic parallelization using OpenMP. It is used to explore semantics-aware automatic parallelization, as described in one of our paper* .

  • The source files are currently located in rose/projects/autoParallelization.
  • A standalone executable program (named autoPar ) is generated and installed to the installation tree of ROSE (under ROSE INS/bin).
  • you can test the tool in rose_build_tree/projects/autoParallelization by typing "make check"
  • There is a section in ROSE manual pdf: 12.7 Automatic Parallelization

[*] Chunhua Liao, Daniel J. Quinlan, Jeremiah J. Willcock and Thomas Panas, Extending Automatic Parallelization to Optimize High-Level Abstractions for Multicore, In Proceedings of the 5th international Workshop on OpenMP: Evolving OpenMP in An Age of Extreme Parallelism (Dresden, Germany, June 03 - 05, 2009).

More info is at ROSE Compiler Framework/autoPar

TODO[edit]

refactor the tool translators[edit]

Refactor the tools into a dedicated rose/tools directory. So they will always be built and available by default, with minimum dependency on other things, like which languages are turned on or off (when applicable of course)

Our current idea is we should separate translators used as examples or tutorials AND translators used for creating end-user tools.

  • For tutorial translators, they should NOT be installed as tools by default. Their purpose is to be included in Manual or Tutorial pdf files to illustrate something to developers by examples. Examples should be concise and to the point.
  • On the other hand, translators used to build end-user tools should have much higher standard to accept command options for different, even advanced features. These translators can be very sophisticated since they don't have the page limitation as tutorial examples do.

Supported Programming Languages[edit]

Overview[edit]

ROSE supports a wide range of main stream programming languages, with different degrees of maturity. The list of supported languages includes:

  • C and C++: based on the EDG C++ frontend version 3.3.
    • An ongoing effort is to upgrade the EDG frontend to its recent 4.4 version.
    • Another ongoing effort is to use clang as an alternative, open-source C/C++ frontend
  • Fortran 77/95/2003: based on the Open Fortran Parser
  • OpenMP 3.0: based on ROSE's own parsing and translation support for both C/C++ and Fortran OpenMP programs.
  • UPC 1.1: this is also based on the EDG 3.3 frontend

OpenMP[edit]

See more at OpenMP Support

ROSE supports OpenMP 3.0 for C/C++ (and limited Fortran support).

Configuration: please always try to use --with-gomp_omp_runtime_library=/usr/apps/gcc/4.4.1/lib64/ when configuring ROSE. So the generated ROSE translators can automatically link with libgomp.a to generate executables for you. This will also allow the execution tests of the omp Lowering be executed to catch errors. Without this option, only the compile level tests will run.

Experimental OpenMP Acclerator Model Implementation


Testing

  • There are about 70 builtin execution tests (many have self-verification) in ROSE.

Some benchmarks are used to test OpenMP support in ROSE in Jenkins (our regression test server)

  • a22b-NPB-2.3-C-parallel: all 8 benchmarks pass
  • a21-SPEC-OMP-64bit-parallel: 3 benchmarks pass.
  • LULESH OpenMP version: download

For builtin test:

You have to configure the path to GOMP if you want to them to be automatically executed when "make check" is typed. e.g. ../sourcetree ... --with-gomp_omp_runtime_library=/usr/apps/gcc/4.4.1/lib64/

UPC[edit]

UPC 1.1.1: this is based on the EDG 3.3 frontend

  • The supported version is limited by the EDG 3.3 frontend, which only supports UPC 1.1.1 ( UPC VERSION string is defined as

200310L). ROSE uses EDG 3.3 currently and it originally only supported UPC 1.0. We merged the UPC 1.1.1 support from EDG 3.10 into our EDG 3.3 frontend. We have also added the required work to support UPC 1.2.

Documentation:

Tests: make check rule under

  • rose/tests/CompileTests/UPC_tests

An example UPC-to-C translator: roseupcc

  • Not full featured. Only intended to serve as a start point for anybody who is interested/funded to implement UPC in ROSE
  • roseupcc is located in ROSE/projects/UpcTranslation
  • Documented by 13.5 An Example UPC-to-C Translator Using ROSE of the ROSE manual

MPI[edit]

MPI is mostly a library-based programming paradigm. In many cases, you can simply compile MPI applications normally using a ROSE-based translator, as long as you pass the include path of mpi.h to the command line.


However, there are some additional support for MPI in ROSE:

1) If ROSE is configured with --with-mpi=/mpi/install/location, then the configure adds the following variables to be used in Makefile.am

ROSE_WITH_MPI_CFLAGS='-I/usr/sci/scratch/sriram/local/include'
ROSE_WITH_MPI_CLDFLAGS='-L/usr/sci/scratch/sriram/local/lib  -lmpich -lopa  -lmpl  -lrt  -lpthread'
ROSE_WITH_MPI_CXXFLAGS='-I/usr/sci/scratch/sriram/local/include'
ROSE_WITH_MPI_CXXLDFLAGS='-L/usr/sci/scratch/sriram/local/lib  -lmpichcxx -lmpich  -lopa  -lmpl  -lrt  -lpthread'
ROSE_WITH_MPI_C_FALSE='#'
ROSE_WITH_MPI_C_TRUE=''

These flags are useful to compile ROSE as a MPI program. One can also use the CFLAGS to pass the location of mpi.h. These flags will be set for any version of MPI as long as valid install location is passed to --with-mpi.

2)There are a few MPI-specific analysis projects (MPI_Tools, extractMPISkeleton). Some of the dataflow analysis we are working on are i) Slicing for MPI communication ii) MPI communication invariant analysis ( prove that communication is constant in loops). We currently plan on using these analyses with other standard dataflow analyses to

  • Parallel control-flow graph analysis -- build communication graph of the MPI program with no input dependent control-flow or communication
  • Transformation to cDAG -- cDAG is runtime communication optimizer tool for MPI by Torsten Hoefler. We intend to replace MPI calls with cDAG calls to optimize for communication.

CUDA[edit]

ROSE has an experimental connection to EDG 4.0, which helps us support CUDA.

To enable parsing CUDA codes, please use the following configuration options:

 --enable-edg-version=4.0 --enable-cuda --enable-edg-cuda

Chapter 16 of ROSE User Manual has more details about this.


More details from Tristan on Sept. 24, 2012

  • "--enable-cuda" option enable CUDA IR in ROSE (IR, preinclude, ...)
  • "--enable-edg-cuda" option only apply to EDG: it activates the EDG support in EDG 4.x (actually I need to patch EDG 4.4)
  • When "--enable-edg-cuda" is present, we also need to have "--enable-edg-version=4.x" (x = {0, 3})
  • "--enable-cuda" is relevant for --enable-only-cuda as the "-edg-" options are targeting only EDG (which is usually distributed as binary).

OpenCL[edit]

There is a section discussing this support in Chapter 16 CUDA and OpenCL of the ROSE manual pdf:

Parser Building Blocks[edit]

Quick information:

FailSafe Assertion Language[edit]

FailSafe Assertion Language An experimental source code annotation language to support resilient computing.

Abstract Syntax Tree (Intermediate Representation)[edit]

The main intermediate representation of ROSE is its abstract syntax tree (AST). To use a programming language, you have to get familiar with the language syntax, semantics, etc. To use ROSE, you have to get familiar with its internal representation of an input code.

The best way to know AST is to visualize it using simplest code samples.

Visualization of AST[edit]

Overview[edit]

Three things are needed to visualize ROSE AST:

  • Sample input code: you provide it
  • a dot graph generator to generate a dot file from AST: ROSE provides dot graph generators
  • a visualization tool to open the dot graph: ZGRViewer and Graphviz are used by ROSE developers

If you don't want to install ROSE+ZGRview + Graphvis from scratch, you can directly use ROSE virtual machine image, which has everything you need installed and configured so you can just visualize your sample code.

Sample input code[edit]

Please prepare simplest input code without including any headers so you can get a small enough AST to digest.

Dot Graph Generator[edit]

We provide ROSE_INSTALLATION_TREE/bin/dotGeneratorWholeASTGraph (complex graph) and dotGenerator (a simpler version) to generate a dot graph of the detailed AST of input code.

Tools to generate AST graph in dot format. There are two versions

  • dotGenerator: simple AST graph generator showing essential nodes and edges
  • dotGeneratorWholeASTGraph: whole AST graph showing more details. It provides filter options to show/hide certain AST information.

command line:

 dotGeneratorWholeASTGraph  yourcode.c  // it is best to avoid include any header into your sample code to have a small enough tree to visualize!
dotGeneratorWholeASTGraph --help
   -rose:help                     show this help message
   -rose:dotgraph:asmFileFormatFilter           [0|1]  Disable or enable asmFileFormat filter
   -rose:dotgraph:asmTypeFilter                 [0|1]  Disable or enable asmType filter
   -rose:dotgraph:binaryExecutableFormatFilter  [0|1]  Disable or enable binaryExecutableFormat filter
   -rose:dotgraph:commentAndDirectiveFilter     [0|1]  Disable or enable commentAndDirective filter
   -rose:dotgraph:ctorInitializerListFilter     [0|1]  Disable or enable ctorInitializerList filter
   -rose:dotgraph:defaultFilter                 [0|1]  Disable or enable default filter
   -rose:dotgraph:defaultColorFilter            [0|1]  Disable or enable defaultColor filter
   -rose:dotgraph:edgeFilter                    [0|1]  Disable or enable edge filter
   -rose:dotgraph:expressionFilter              [0|1]  Disable or enable expression filter
   -rose:dotgraph:fileInfoFilter                [0|1]  Disable or enable fileInfo filter
   -rose:dotgraph:frontendCompatibilityFilter   [0|1]  Disable or enable frontendCompatibility filter
   -rose:dotgraph:symbolFilter                  [0|1]  Disable or enable symbol filter
   -rose:dotgraph:emptySymbolTableFilter        [0|1]  Disable or enable emptySymbolTable filter
   -rose:dotgraph:typeFilter                    [0|1]  Disable or enable type filter
   -rose:dotgraph:variableDeclarationFilter     [0|1]  Disable or enable variableDeclaration filter
   -rose:dotgraph:variableDefinitionFilter      [0|1]  Disable or enable variableDefinitionFilter filter
   -rose:dotgraph:noFilter                      [0|1]  Disable or enable no filtering
Current filter flags' values are: 
         m_asmFileFormat = 0 
         m_asmType = 0 
         m_binaryExecutableFormat = 0 
         m_commentAndDirective = 1 
         m_ctorInitializer = 0 
         m_default = 1 
         m_defaultColor = 1 
         m_edge = 1 
         m_emptySymbolTable = 0 
         m_expression = 0 
         m_fileInfo = 1 
         m_frontendCompatibility = 0 
         m_symbol = 0 
         m_type = 0 
         m_variableDeclaration = 0 
         m_variableDefinition = 0 
         m_noFilter = 0 


Dot Graph Visualization[edit]

To visualize the generated dot graph, you have to install

Please note that you have to configure ZGRViewer to have correct paths to some commands it uses. You can do it from its configuration/setting menu item. Or directly modify the text configuration file (.zgrviewer).

One example configuration is shown below (cat .zgrviewer)

<?xml version="1.0" encoding="UTF-8"?>
<zgrv:config xmlns:zgrv="http://zvtm.sourceforge.net/zgrviewer">
    <zgrv:directories>
        <zgrv:tmpDir value="true">/tmp</zgrv:tmpDir>
        <zgrv:graphDir>/home/liao6/svnrepos</zgrv:graphDir>
        <zgrv:dot>/home/liao6/opt/graphviz-2.18/bin/dot</zgrv:dot>
        <zgrv:neato>/home/liao6/opt/graphviz-2.18/bin/neato</zgrv:neato>
        <zgrv:circo>/home/liao6/opt/graphviz-2.18/bin/circo</zgrv:circo>
        <zgrv:twopi>/home/liao6/opt/graphviz-2.18/bin/twopi</zgrv:twopi>
        <zgrv:graphvizFontDir>/home/liao6/opt/graphviz-2.18/bin</zgrv:graphvizFontDir>
    </zgrv:directories>
    <zgrv:webBrowser autoDetect="true" options="" path=""/>
    <zgrv:proxy enable="false" host="" port="80"/>
    <zgrv:preferences antialiasing="false" cmdL_options=""
        highlightColor="-65536" magFactor="2.0" saveWindowLayout="false"
        sdZoom="false" sdZoomFactor="2" silent="true"/>
    <zgrv:plugins/>
    <zgrv:commandLines/>
</zgrv:config>

You have to configure the run.sh script to have correct path also

cat run.sh

#!/bin/sh

# If you want to be able to run ZGRViewer from any directory,
# set ZGRV_HOME to the absolute path of ZGRViewer's main directory
# e.g. ZGRV_HOME=/usr/local/zgrviewer

ZGRV_HOME=/home/liao6/opt/zgrviewer-0.8.1

java -jar $ZGRV_HOME/target/zgrviewer-0.8.1.jar "$@"

Example session[edit]

A complete example

# make sure the environment variables(PATH, LD_LIBRARY_PATH) for the installed rose are correctly set
which dotGeneratorWholeASTGraph
~/workspace/masterClean/build64/install/bin/dotGeneratorWholeASTGraph

# run the dot graph generator
dotGeneratorWholeASTGraph -c ttt.c

#see it
which run.sh
~/64home/opt/zgrviewer-0.8.2/run.sh

run.sh ttt.c_WholeAST.dot

Sanity Check[edit]

We provide a set of sanity check for AST. We use them to make sure the AST is consistent. It is also highly recommended that ROSE developers add a sanity check after their AST transformation is done. This has a higher standard than just correctly unparsed code to compilable code. It is common for an AST to unparse correctly but then fail on the sanity check.

The recommend sanity check is

  • AstTests::runAllTests(project); from src/midend/astDiagnostics. Internally, it calls the following checks:
    • TestAstForProperlyMangledNames
    • TestAstCompilerGeneratedNodes
    • AstTextAttributesHandling
    • AstCycleTest
    • TestAstTemplateProperties
    • TestAstForProperlySetDefiningAndNondefiningDeclarations
    • TestAstSymbolTables
    • TestAstAccessToDeclarations
    • TestExpressionTypes
    • TestMangledNames::test()
    • TestParentPointersInMemoryPool::test()
    • TestChildPointersInMemoryPool::test()
    • TestMappingOfDeclarationsInMemoryPoolToSymbols::test()
    • TestLValueExpressions
    • TestMultiFileConsistancy::test() //2009
    • TestAstAccessToDeclarations::test(*i); // named type test


There are some other functions floating around. But they should be merged into AstTests::runAllTests(project)

  • FixSgProject(*project); //in Qing's AST interface
  • Utility::sanityCheck(SgProject* )
  • Utility::consistencyCheck(SgProject*) // SgFile*

Text Output of an AST[edit]

Just call: SgNode::unparseToString(). You can call it from any SgLocatedNode within the AST to dump partial AST's text format.

Preprocessing Information[edit]

In addition to nodes and edges, ROSE AST may have attributes in addition to nodes and edges that are attached for preprocessing information like #include or #if .. #else. They are attached before, after, or within a nearby AST node (only the one with source location information.)

An example translator will traverse the input code's AST and dump information which may include preprocessing information. For example

exampleTranslators/defaultTranslator/preprocessingInfoDumper -c main.cxx
-----------------------------------------------
Found an IR node with preprocessing Info attached:
(memory address: 0x2b7e1852c7d0 Sage type: SgFunctionDeclaration) in file
/export/tmp.liao6/workspace/userSupport/main.cxx (line 3 column 1)
-------------PreprocessingInfo #0 ----------- :
classification = CpreprocessorIncludeDeclaration:
  String format = #include "all_headers.h"

relative position is = before

Source: http://www.rosecompiler.org/ROSE_Tutorial/ROSE-Tutorial.pdf (Chapter 29 - Handling Comments, Preprocessor Directives, And Adding Arbitrary Text to Generated Code)

AST Construction[edit]

SageBuilder and SageInterface namespaces provide functions to create ASTs and manipulate them.

Program Translation[edit]

With its high level intermediate representation, ROSE is suitable for building source-to-source translators. This is achieved by re-structuring the AST of the input source code, then unparsing the transformed AST to the output source code.

Documentation[edit]

Official tutorial: Chapter 32 AST Construction of [ROSE Tutorial http://rosecompiler.org/ROSE_Tutorial/ROSE-Tutorial.pdf]

Many beginners' questions should be readily answered after reading this chapter.

Expected behavior of a ROSE Translator[edit]

A translator built using ROSE is designed to act like a compiler (gcc, g++,gfortran ,etc depending on the input file types).

So users of the translator only need to change the build system for the input files to use the translator instead of the original compiler.

SageBuilder and SageInterface[edit]

The official guide for restructuring/constructing AST highly recommends using helper functions from SageBuilder and SageInterface namespaces to create AST pieces and moving them around. These helper functions try to be stable across low-level changes and be smart enough to transparently set many edges and maintain symbol tables.

Users who want to have lower level control may want to directly invoke the member functions of AST nodes and symbol tables to explicitly manipulate edges and symbols in the AST. But this process is very tedious and error-prone.

It is possible that some builder functions are not yet provided, especially for C++ constructs like template declaration etc. We are actively working on this. In the meantime, you can directly use new operators and other member functions as a workaround.

Steps for writing translators[edit]

Generic steps:

  • prepare a simplest source file (a.c) as an example input of your translator
    • avoid including any system headers so you can visualize the whole AST
    • use ROSE_INSTALLATION_TREE/bin/dotGeneratorWholeASTGraph to generate a whole AST for a.c
  • prepare another simplest source file (b.c) as an example output of your translator
    • again, avoid including any system headers
    • use ROSE_INSTALLATION_TREE/bin/dotGeneratorWholeASTGraph to generate a whole AST for b.c
  • compare the two dot graphs side by side
  • use SageInterface or SageBuilder functions to restruct the source AST graph to be the AST graph you want to generate
    • if there is no SageBuilder function to create what you want. You may have to use new operator to create the nodes and take care of edges, symbols yourself.

More details, see How to create a translator

Order to traverse AST[edit]

Naive pre-order traversal is not suitable for building a translator since the translator may change the nodes the traversal is expected to visit later on. Conceptually, this is essentially a similar problem to C++ iterator invalidation.

To safely transform AST, it is recommended to use a reverse iterator of the statement list generated by a preorder traversal. This is different from a list generated from a post order traversal.

For example, assuming we have a subtree of : parent <child 1, child 2>,

  • Pre order traversal will generate a list: parent, child 1, child2
  • Post order traversal will generate a list: child 1, child2, parent.
  • Reverse iterator of the pre order will give you : child2, child 1, and parent. Transforming using this order is the safest based on our experiences.

Example translators[edit]

There are many test translators under https://github.com/rose-compiler/rose/tree/master/tests/roseTests/astInterfaceTests

  • buildForStmt.C
  • buildFunctionCalls.C
  • buildPragmaDeclaration.C
  • and so on

Other examples:

  • Split one complex statement into multiple simpler statements: ROSE/projects/backstroke/ExtractFunctionArguments.C

Trouble shooting[edit]

Assertion failed: (expr->get_startOfConstruct() != NULL)[edit]

Assertion failed: (expr->get_startOfConstruct() != NULL), function unparseExpression, file ../../../ROSE/src/backend/unparser/languageIndependenceSupport/unparseLanguageIndependentConstructs.C, line 812.

void visitorTraversal::visit(SgNode* sgn){
    
        SageBuilder::pushScopeStack(body);
        SgAssignOp* sao = isSgAssignOp(sgn);
        if(!sao)
            return;
    
        SgVarRefExp* svr = SageBuilder::buildVarRefExp("mami");
        SgIntVal* siv =  SageBuilder::buildIntVal(33);
    
        SgAssignOp* newsao = new SgAssignOp(svr, siv, NULL);
        SageInterface::replaceWithPattern(sao, newsao);
        SageBuilder::popScopeStack();     
    }

The cause is: SgAssignOp* newsao = new SgAssignOp(svr, siv, NULL);

expr->get_startOfConstruct() != NULL says there is no start file position. There is an existing SageBuilder function to build Assign Op and take care of lots of details, including file info objects. Otherwise you have to maintain these details by yourself if you use raw new operators.

Program Analysis[edit]

ROSE have implemented the following compiler analysis

  • call graph analysis
  • control flow graph
  • data flow analysis: including liveness analysis, def-use analysis, etc.
  • dependence analysis
  • side effect analysis

control flow graph[edit]

ROSE provides several variants of control flow graphs

Virtual Control Flow Graph[edit]

The virtual control flow graph (vcfg) is dynamically generated on the fly when needed. So there is no mismatch between the ROSE AST and its corresponding control flow graph. The downside is that the same vcfg will be re-generated each time it is needed. This can be a potentially a performance bottleneck.

Facts

  • Documentation: virtual CFG is documented in Chapter 19 Virtual CFG of ROSE tutorial pdf
  • Source Files:
    • src/frontend/SageIII/virtualCFG/virtualCFG.h
    • src/frontend/SageIII/virtualCFG/virtualCFG.C //not only give definitions of virtualCFG.h, but also extend AST node support in VirtualCFG
    • src/ROSETTA/Grammar/Statement.code // prototypes of member functions for SgStatement nodes, etc.
    • src/ROSETTA/Grammar/Expression.code // prototypes of member functions for SgExpression nodes, etc.
    • src/ROSETTA/Grammar/Support.code // prototypes of member functions for SgInitialized(LocatedNode) nodes, etc.
    • src/ROSETTA/Grammar/Common.code // prototypes of member functions for other nodes, etc.
    • src/frontend/SageIII/virtualCFG/memberFunctions.C // implementation of virtual CFG related member functions for each AST node
      • This file will help the generation of buildTree/src/frontend/SageIII/Cxx_Grammar.h
  • Test directory: tests/CompileTests/virtualCFG_tests
  • A dot graph generator: generate a dot graph for either the raw or interesting virtual CFG.
    • Source: tests/CompileTests/virtualCFG_tests/generateVirtualCFG.C
    • Installed under rose_ins/bin

How to extend VirtualCFG to support OpenMP

  • how to add CFGNode for SgOmpClause in
  • 1.Identify the class name in ROSETTA in frontend

For example , if SgOmpPrivateClause or SgOmpSharedClause are not support in VirtualCFG, it is necessary to check whether buildTree/src/frontend/SageIII/Cxx_Grammar.h has function prototypes for adding CFGEdge of SgOmpClause, like SgOmpClause::cfgInEdge() SgOmpClause::cfgOutEdge() If there is no prototypes, then that means you CFGNode does not belong to SgExpression, SgStatement and SgExpression. SgOmpClause can be added in src/ROSETTA/Grammar/Support.code,

  • 2. add the function definitions in src/frontend/SageIII/virtualCFG/memberFunctions.C to give the definitions of adding CFGNode and CFGEdge
  step1: construct SgOmpClause::cfgndexForEnd()
            this index is based on the AST graph of your source code, the index is explicit in AST node
   real example:

SgOmpClauseBodyStatement::cfgIndexForEnd() const {

  int size = this->get_clauses().size(); // the number of clauses in #pragma omp parallel
  return (size + 1); // clauses + body

}

  step2: construct cfgInEdge() for this CFGNode
           please refer to AST, since AST can show all node information,
           real example:
           std::vector<CFGEdge> SgOmpClauseBodyStatement::cfgInEdges(unsigned int idx) {
 std::vector<CFGEdge> result;
 addIncomingFortranGotos(this, idx, result);

if( idx == 0 )

  {
   makeEdge( getNodeJustBeforeInContainer( this ), CFGNode( this, idx ), result );
   }
  else
  {
     if( idx == ( this->get_clauses().size() + 1 ) )
      {
       makeEdge( this->get_body()->cfgForEnd(), CFGNode( this, idx ) , result ); //connect variables clauses first, then parallel body
      }
     else
      {
          if( idx < ( this->get_clauses().size() + 1 ) )
           {
              makeEdge( this->get_clauses()[idx -1]->cfgForEnd(), CFGNode( this, idx ), result );//connect variables clauses first, then parallel body
           }
          else
           {
            ROSE_ASSERT( !" Bad index for SgOmpClauseBodyStatement" );
           }
      }
  }

return result; }

 step3: construct cfgOutEdge for CFGNode
 For example:
   std::vector<CFGEdge> SgOmpClauseBodyStatement::cfgOutEdges(unsigned int idx) {//! edited by Hongyi for edges between SgOmpClauseBodyStatement  and SgOmpClause
  std::vector<CFGEdge> result;

addIncomingFortranGotos( this, idx, result ); if( idx == (this->get_clauses().size() + 1 ) )

 {
    makeEdge( CFGNode( this ,idx), getNodeJustAfterInContainer( this ), result );
 }
else
 {
   if( idx == this->get_clauses().size()  )
      {
        makeEdge( CFGNode( this, idx ), this->get_body()->cfgForBeginning(), result ); // connect variable clauses first, parallel body last
      }
     else
     {
       if( idx < this->get_clauses().size() ) // connect variables clauses first, parallel body last
         {
           makeEdge( CFGNode( this, idx ), this->get_clauses()[idx]->cfgForBeginning(), result );
         }
        else
         {
           ROSE_ASSERT( !"Bad index for SgOmpClauseBodyStatement" );
         }
      }
 }

return result; }

  • 3.How to check the result

First check AST graph /Users/ma23/Desktop/Screen shot 2012-08-24 at 11.51.33 AM.png In this example, you will find that there are three subtree from SgOmpParallelStatement One is get_body, the other two are SgOmpPrivateClasue and SgOmpSharedClauserespectively. So the index is 3. // the order to visit CFGNode is to visit clauses first, then parallel body

Add caption here

Static Control Flow Graph[edit]

Due to the performance concern of virtual control flow graph, we developed another static version which persistently exists in memory like a regular graph.

Facts:

  • Documentation: 19.7 Static CFG of ROSE tutorial pdf
  • Test Directory: rose/tests/CompileTests/staticCFG_tests

Static and Interprocedural CFGs[edit]

Facts:

  • Documentation: 19.8 Static, Interprocedural CFGs of ROSE tutorial pdf
  • Test Directory: rose/tests/CompileTests/staticCFG_tests

Virtual Function Analysis[edit]

Facts

  • Original contributor: Faizur from UTSA, done in Summer 2011
  • Code: at src/midend/programAnalysis/VirtualFunctionAnalysis.
  • Implemented with the techniques used in the following paper: "Interprocedural Pointer Alias Analysis - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.2382". The paper boils down the virtual function resolution to pointer aliasing problem. The paper employs flow sensitive inter procedural data flow analysis to solve aliasing problem, using compact representation graphs to represent the alias relations.
  • Some test files in the roseTests folder of the ROSE repository and he told me that the implementation supports function pointers as well as code which is written across different files (header files etc).
  • Documentation: Chapter 24 Dataflow Analysis based Virtual Function Analysis, of ROSE tutorial pdf

Def-use analysis[edit]

If you want a def-use analysis, try this http://www.rosecompiler.org/ROSE_HTML_Reference/classVariableRenaming.html

VariableRenaming v(project);
v.run();
v.getReachingDefsAtNode(...);

Pointer Analysis[edit]

https://mailman.nersc.gov/pipermail/rose-public/2010-September/000390.html

On 9/1/10 11:49 AM, Fredrik Kjolstad wrote:
> Hi all,
>
> I am trying to use Rose as the analysis backend for a refactoring 
> engine and for one of the refactorings I am implementing I need 
> whole-program pointer analysis.  Rose has an implementation of 
> steensgard's algorithm and I have some questions regarding how to use 
> this.
>
> I looked at the file steensgaardTest2.C to figure out how to invoke 
> this analysis and I am a bit perplexed:
>
> 1. The file SteensgaardPtrAnal.h that is included by the test is not 
> present in the include directory of my installed version of Rose. 
>  Does this mean that the Steensgaard implementation is not a part of 
> the shipped compiler, or does it mean that I have to retrieve an 
> instance of it through some factory method whose static return type is 
> PtrAnal?
I believe it is in the shipped compiler. And you're using the correct 
file to figure out how to use it. It should be in the installed include 
directory --- if it is not, it's probably something that needs to be 
fixed. But you can copy the include file from 
ROSE/src/midend/programAnalysis/pointerAnal/ as a temporary fix
> 2. How do I initialize the alias analysis for a given SgProject?  Is 
> this done through the overloaded ()?
The steensgaardTest2.C file shows how to set up everything to invoke the 
analysis. Right now you need to go over each function definition and 
invoke the analysis explicitly, as illustrated by the main function in 
the file.
> 3. Say I want to query whether two pointer variables alias and I have 
> SGNodes to their declarations.  How do I get the AstNodePtr needed to 
> invoke the may_alias(AstInterface&, const AstNodePtr&, const 
> AstNodePtr&) function?  Or maybe I should rather invoke the version of 
> may_alias that takes two strings (varnames)?
To convert a SgNode* x to AstNodePtr, wrap it inside  an AstNodePtrImpl 
object, i.e., do AstNodePtrImpl(x), as illustrated inside the () 
operator of TestPtrAnal in steensgaardTest2.C.
> 4. How do I query whether two parameters alias?
The PtrAnal class has  the following interface method
     may_alias(AstInterface& fa, const AstNodePtr& r1, const AstNodePtr& 
r2);
It is implemented in SteensgaardPtrAnal class, which inherit PtrAnal 
class. To build AstInterface and AstNodePtr,
you simply need to wrap SgNode* with some wrapper classes, illustrated 
by steensgaardTest2.C

-Qing Yi

void func(void) {
int* pointer;
int* aliasPointer;

pointer = malloc(sizeof(int));
aliasPointer = pointer;
*aliasPointer = 42;

printf("%d\n", *pointer);
}

The SteensgaardPtrAnal::output function returns:
c:(sizeof(int )) LOC1=>LOC2
c:42 LOC3=>LOC4
v:func LOC5=>LOC6 (inparams: ) ->(outparams:  LOC7)
v:func-0 LOC8=>LOC7
v:func-2-1 LOC9=>LOC10
v:func-2-3 LOC11=>LOC12 (pending  LOC10 LOC13=>LOC14 =>LOC4 )
v:func-2-4 LOC15=>LOC16 =>LOC17
v:func-2-5 LOC18=>LOC14 =>LOC4
v:func-2-aliasPointer LOC19=>LOC14 =>LOC4
v:func-2-pointer LOC20=>LOC13 =>LOC14 =>LOC4
v:malloc LOC21=>LOC22 (inparams:  LOC2) ->(outparams:  LOC12)
v:printf LOC23=>LOC24 (inparams:  LOC16=>LOC17  LOC14=>LOC4 ) ->(outparams:
 LOC25)

SSA[edit]

ROSE has implemented an SSA form. Some discussions on the mailing list: link.

Rice branch has an implementation of array SSA. We are waiting for their commits to be pushed into Jenkins. --Liao (discusscontribs) 18:17, 19 June 2012 (UTC)

Side Effect Analysis[edit]

Quick Facts

  • The algorithm is based on the paper: K. D. Cooper and K. Kennedy. 1988. Interprocedural side-effect analysis in linear time. In Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation (PLDI '88), R. L. Wexelblat (Ed.). ACM, New York, NY, USA, 57-66.
  • Source Code: src/midend/programAnalysis/sideEffectAnalysis
  • Tests: tests/roseTests/programAnalysisTests/sideEffectAnalysisTests
  • SageInterface interface functions: http://rosecompiler.org/ROSE_HTML_Reference/namespaceSageInterface.html
    • such as bool SageInterface::collectReadWriteVariables (SgStatement *stmt, std::set< SgInitializedName * > &readVars, std::set< SgInitializedName * > &writeVars)

Generic Dataflow Framework[edit]

As the ROSE project goes on, we have collected quite some versions of dataflow analysis. It is painful to maintain and use them as they

  • Duplicate the iterative fixed-point algorithm
  • Scatter in different directories and
  • Use different representations for results.

An ongoing effort is to consolidate all dataflow analysis work within a single framework.

Quick facts

  • Original author: Greg Bronevetsky
  • Code reviewer: Chunhua Liao
  • Documentation:
  • Source codes: files under ./src/midend/programAnalysis/genericDataflow
  • Tests: tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests
  • Currently implemented analysis
    • Dominator analysis: dominatorAnalysis.h dominatorAnalysis.C
    • Livedead variable analysis, or liveness analysis: liveDeadVarAnalysis.h liveDeadVarAnalysis.C
    • Constant propagation: constantPropagation.h constantPropagation.C: TODO need to move the files into src/ from /tests

See more at Generic Dataflow Framework

Dependence analysis[edit]

TODO: it turns out the interface work is not merged into our master branch. So the following instructions do not apply!

The interface for dependence graph could be found in DependencyGraph.h. The underlying representation is n DepGraph.h. BGL is required to access the graph.

Here are 6 examples attached with this email. In deptest.C, there are also some macros to enable more accurate analysis.

If USE_IVS is defined, the induction variable substitution will be performed. if USE_FUNCTION is defined, the dependency could take a user-specified function side-effect interface. Otherwise, if non of them are defined, it will perform a normal dependence analysis and build the graph.

Generic Dataflow Framework[edit]

Introduction[edit]

As the ROSE project goes on, we have collected quite some versions of dataflow analysis. It is painful to maintain and use them as they

  • duplicate the iterative fixed-point algorithm,
  • scatter in different directories,
  • use different representations for results, and
  • has different level of maturity and robustness.

An ongoing effort is to consolidate all dataflow analysis work within a single framework.

Quick facts

  • original author: Greg Bronevetsky
  • code gatekeeper: Chunhua Liao
  • Documentation:
    • Chapter 18 Generic Dataflow Analysis Framework, of the ROSE tutorial pdf, git commit
    • This wikibook page
  • source codes: files under ./src/midend/programAnalysis/genericDataflow
  • tests: tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests

Implemented analysis[edit]

List

  • Constant Propagation
  • dominator analysis: dominatorAnalysis.h dominatorAnalysis.C
  • livedead variable analysis, or liveness analysis: liveDeadVarAnalysis.h liveDeadVarAnalysis.C
  • Pointer Analysis

Function, nodeState and FunctionState[edit]

Function and nodeState are two required parameters to run data flow analysis:

They are stored together inside FunctionState //functionState.h

functionState.h

genericDataflow/cfgUtils/CallGraphTraverse.h

function[edit]

An abstraction of functions, internally connected to SgFunctionDeclaration *decl

declared in ./src/midend/programAnalysis/genericDataflow/cfgUtils/CallGraphTraverse.h

constructors:

  • Function::Function(string name) based on function name
  • Function::Function(SgFunctionDeclaration* sample) // core constructor
  • Function::Function(SgFunctionDefinition* sample)

CGFunction* cgFunc; // call graph function

Function func(cgFunc);

NodeFact[edit]

any information related to a CFG node.

  • It has no dataflow 's IN/OUT concept
  • not meant to evolve during the dataflow analysis
class NodeFact: public printable
{
        public:

                // returns a copy of this node fact
        virtual NodeFact* copy() const=0;
        
};

NodeState[edit]

Store information about multiple analyses and their corresponding lattices, for a given node (CFG node ??)

./src/midend/programAnalysis/genericDataflow/state/nodeState.h

It also provide static functions to

  • initialize NodeState for all DataflowNode
  • to retrieve NodeState for a given DataflowNode
class NodeState
{
     // internal types: map between analysis and set of lattices

     typedef std::map<Analysis*, std::vector<Lattice*> > LatticeMap;
     typedef std::map<Analysis*, std::vector<NodeFact*> > NodeFactMap;
     typedef std::map<Analysis*, bool > BoolMap;

        // the dataflow information Above the node, for each analysis that 
        // may be interested in the current node
        LatticeMap dfInfoAbove;  // IN set in a dataflow
        
        // the Analysis information Below the node, for each analysis that 
        // may be interested in the current node
        LatticeMap dfInfoBelow;  // OUT set in a dataflow, 

        // the facts that are true at this node, for each analysis that 
        // may be interested in the current node
        NodeFactMap facts;
        
        // Contains all the Analyses that have initialized their state at this node. It is a map because
        // TBB doesn't provide a concurrent set.
        BoolMap initializedAnalyses;

// static interfaces 

        // returns the NodeState object associated with the given dataflow node.
        // index is used when multiple NodeState objects are associated with a given node
        // (ex: SgFunctionCallExp has 3 NodeStates: entry, function body, exit)
        static NodeState* getNodeState(const DataflowNode& n, int index=0);


// most useful interface: retrieve the lattices (could be only one) associated with a given analysis

      // returns the map containing all the lattices from above the node that are owned by the given analysis
        // (read-only access)
        const std::vector<Lattice*>& getLatticeAbove(const Analysis* analysis) const;

        // returns the map containing all the lattices from below the node that are owned by the given analysis
        // (read-only access)
        const std::vector<Lattice*>& getLatticeBelow(const Analysis* analysis) const;

}

FunctionState[edit]

./src/midend/programAnalysis/genericDataflow/state/functionState.h

A pair of Function and NodeState.

  • it provides static functions to initialize all FunctionState And retrieve FunctionState

class FunctionState
{
        friend class CollectFunctions;
        public:
        Function func;
        NodeState state;
        // The lattices that describe the value of the function's return variables
        NodeState retState;

        private:
        static std::set<FunctionState*> allDefinedFuncs;        
        static std::set<FunctionState*> allFuncs;
        static bool allFuncsComputed;
                
    public:
        FunctionState(Function &func): 
                func(func),
                state(/*func.get_declaration()->cfgForBeginning()*/)
        {}
  // We should use this interface --------------

  // 1. returns a set of all the functions whose bodies are in the project
        static std::set<FunctionState*>& getAllDefinedFuncs();

  // 2. returns the FunctionState associated with the given function
        // func may be any declared function
        static FunctionState* getFuncState(const Function& func);
 ...
} 


FunctionState* fs = new FunctionState(func); // empty From FuntionState to NodeState


/*************************************
 *** UnstructuredPassInterAnalysis ***
 *************************************/
void UnstructuredPassInterAnalysis::runAnalysis()
{
        set<FunctionState*> allFuncs = FunctionState::getAllDefinedFuncs(); // call a static function to get all function state s

        // Go through functions one by one, call an intra-procedural analysis on each of them
        // iterate over all functions with bodies
        for(set<FunctionState*>::iterator it=allFuncs.begin(); it!=allFuncs.end(); it++)
        {
                FunctionState* fState = *it;
                intraAnalysis->runAnalysis(fState->func, &(fState->state));
        }
}

// runs the intra-procedural analysis on the given function, returns true if 
// the function's NodeState gets modified as a result and false otherwise
// state - the function's NodeState
bool UnstructuredPassIntraAnalysis::runAnalysis(const Function& func, NodeState* state)
{
        DataflowNode funcCFGStart = cfgUtils::getFuncStartCFG(func.get_definition(),filter);
        DataflowNode funcCFGEnd = cfgUtils::getFuncEndCFG(func.get_definition(), filter);
        
        if(analysisDebugLevel>=2)
                Dbg::dbg << "UnstructuredPassIntraAnalysis::runAnalysis() function "<<func.get_name().getString()<<"()\n";
        
        // iterate over all the nodes in this function
        for(VirtualCFG::iterator it(funcCFGStart); it!=VirtualCFG::dataflow::end(); it++)
        {
                DataflowNode n = *it;
                // The number of NodeStates associated with the given dataflow node
                //int numStates=NodeState::numNodeStates(n);
                // The actual NodeStates associated with the given dataflow node
                const vector<NodeState*> nodeStates = NodeState::getNodeStates(n);
                
                // Visit each CFG node
                for(vector<NodeState*>::const_iterator itS = nodeStates.begin(); itS!=nodeStates.end(); itS++)
                        visit(func, n, *(*itS));
        }
        return false;
}

example: retrieve the liveness analysis's IN lattice

void getAllLiveVarsAt(LiveDeadVarsAnalysis* ldva, const NodeState& state, set<varID>& vars, string indent)

  • LiveVarsLattice* liveLAbove = dynamic_cast<LiveVarsLattice*>(*(state.getLatticeAbove(ldva).begin()));

Lattices[edit]

Caveat: lattice vs. lattice value

  • A lattice by definition is a set of values. However, an instance of lattice type in Generic dataflow framework is used to represent an individual value within a lattice also. Sorry for this confusing. We welcome suggestions to fix this.

Basics[edit]

See more at ROSE Compiler Framework/Lattice

Store the data flow analysis information attached to CFG nodes.

Fundamental operations:

  • what to store: lattice value set, bottom, up , and anything in between
  • initialization: LiveDeadVarsAnalysis::genInitState()
  • creation: transfer function
  • meet operation: a member function of the lattice

Example

  • liveness analysis: the live variable set at the entry point of a CFG node:
  • constant propagation: lattice values from no information (bottom) -> unkown --> constant --> too much information (conflicting constant values, top),
// blindly add all of that_arg's values into current lattice's value set
void LiveVarsLattice::incorporateVars(Lattice* that_arg)  

// retrieve a subset lattice information for a given expr. This lattice may contain more information than those about a given expr.
Lattice* LiveVarsLattice::project(SgExpression* expr) 

// add lattice (exprState)information about expr into current lattice's value set: default implementation just calls meetUpdate(exprState)
bool LiveVarsLattice::unProject(SgExpression* expr, Lattice* exprState)  

below/above vs IN/OUT[edit]

The concept is based on the original CFG flow direction

  • above: the incoming edge direction
  • below: the outcoming edge direction


IN and OUT depends on the direction of the problem, forward vs. backward

  • forward direction: IN == above lattice, OUT = below lattice
  • backward direction: IN == below lattice, OUT = above lattice

Common Utility Lattices[edit]

the framework provides some pre-defined lattices ready for use.

lattice.h/latticeFull.h

  • BoolAndLattice

LiveVarsLattice[edit]

class LiveVarsLattice : public FiniteLattice
{
        public:
        std::set<varID> liveVars;  // bottom is all live variables,  top is the empty set, meet brings down the lattice -> union of variables. 
    ...
 };


// Meet operation: simplest set union of two lattices: 

// computes the meet of this and that and saves the result in this
// returns true if this causes this to change and false otherwise
bool LiveVarsLattice::meetUpdate(Lattice* that_arg)
{
        bool modified = false;
        LiveVarsLattice* that = dynamic_cast<LiveVarsLattice*>(that_arg);
        
        // Add all variables from that to this
        for(set<varID>::iterator var=that->liveVars.begin(); var!=that->liveVars.end(); var++) {
                // If this lattice doesn't yet record *var as being live 
                if(liveVars.find(*var) == liveVars.end()) { // this if () statement gives a chance to set the modified flag. 
                                                           // otherwise, liveVars.insert() can be directly called. 
                        modified = true;
                        liveVars.insert(*var);
                }
        }
        
        return modified;        
}

Transfer Function[edit]

basics: Data_flow_analysis#flow.2Ftransfer_function

  • IN = sum of OUT (predecessors)
  • OUT = GEN + (IN - KILL)

The impact of program constructs on the current lattices (how to change the current lattices).

  • lattices: stores IN and OUT information
  • additional data members are necessary to store GEN and KILL set inside the transfer function.


class hierarchy:

class IntraDFTransferVisitor : public ROSE_VisitorPatternDefaultBase
{ 
protected:
  // Common arguments to the underlying transfer function
  const Function &func;  // which function are we talking about
  const DataflowNode &dfNode;  // wrapper of CFGNode
  NodeState &nodeState;   // lattice element state, context information?
  const std::vector<Lattice*> &dfInfo;  // data flow information

public:

  IntraDFTransferVisitor(const Function &f, const DataflowNode &n, NodeState &s, const std::vector<Lattice*> &d)
    : func(f), dfNode(n), nodeState(s), dfInfo(d)
  { }

  virtual bool finish() = 0;
  virtual ~IntraDFTransferVisitor() { }

 };



class LiveDeadVarsTransfer : public IntraDFTransferVisitor
{

};


class ConstantPropagationAnalysisTransfer : public VariableStateTransfer<ConstantPropagationLattice>
{}

Constant Propagation[edit]

template <class LatticeType>
class VariableStateTransfer : public IntraDFTransferVisitor
{
  ...
};

class ConstantPropagationAnalysisTransfer : public VariableStateTransfer<ConstantPropagationLattice> {};

void
ConstantPropagationAnalysisTransfer::visit(SgIntVal *sgn)
   {
     ROSE_ASSERT(sgn != NULL);
     ConstantPropagationLattice* resLat = getLattice(sgn);
     ROSE_ASSERT(resLat != NULL);
     resLat->setValue(sgn->get_value());
     resLat->setLevel(ConstantPropagationLattice::constantValue);
   }

LiveDead Variable[edit]

Functions to convert program point to Generator and KILL set. For liveness analysis

  • Kill (s) = {variables being defined in s}: //
  • Gen (s) = {variables being used in s}

OUT = IN -KILL + GEN

  • OUT is initialized to be IN set,
  • transfer function will apply -KILL + GEN

class LiveDeadVarsTransfer : public IntraDFTransferVisitor

{
  LiveVarsLattice* liveLat;  // the result of this analysis

  bool modified;
  // Expressions that are assigned by the current operation
  std::set<SgExpression*> assignedExprs;  // KILL () set
  // Variables that are assigned by the current operation
  std::set<varID> assignedVars;
  // Variables that are used/read by the current operation
  std::set<varID> usedVars;   // GEN () set

public:
  LiveDeadVarsTransfer(const Function &f, const DataflowNode &n, NodeState &s, const std::vector<Lattice*> &d, funcSideEffectUses *fseu_)
    : IntraDFTransferVisitor(f, n, s, d), indent("    "), liveLat(dynamic_cast<LiveVarsLattice*>(*(dfInfo.begin()))), modified(false), fseu(fseu_)
  {
        if(liveDeadAnalysisDebugLevel>=1) Dbg::dbg << indent << "liveLat="<<liveLat->str(indent + "    ")<<std::endl;
        // Make sure that all the lattice is initialized
        liveLat->initialize();
  }

  bool finish();
 //  operationg on different AST nodes
  void visit(SgExpression *);
  void visit(SgInitializedName *);
  void visit(SgReturnStmt *);
  void visit(SgExprStatement *);
  void visit(SgCaseOptionStmt *);
  void visit(SgIfStmt *);
  void visit(SgForStatement *);
  void visit(SgWhileStmt *);
  void visit(SgDoWhileStmt *);
}


// Helper transfer function, focusing on handling expressions.
// live dead variable analysis: LDVA, 
//  expression transfer: transfer functions for expressions
/// Visits live expressions - helper to LiveDeadVarsTransfer
class LDVAExpressionTransfer : public ROSE_VisitorPatternDefaultBase
{
  LiveDeadVarsTransfer &ldva;

public:

  // Plain assignment: lhs = rhs,  set GEN (read/used) and KILL (written/assigned) sets
  void visit(SgAssignOp *sgn) {
    ldva.assignedExprs.insert(sgn->get_lhs_operand());
                                
    // If the lhs of the assignment is a complex expression (i.e. it refers to a variable that may be live) OR
    // if is a known expression that is known to may-be-live
    // THIS CODE ONLY APPLIES TO RHSs THAT ARE SIDE-EFFECT-FREE AND WE DON'T HAVE AN ANALYSIS FOR THAT YET
    /*if(!isVarExpr(sgn->get_lhs_operand()) || 
      (isVarExpr(sgn->get_lhs_operand()) && 
      liveLat->isLiveVar(SgExpr2Var(sgn->get_lhs_operand()))))
      { */
    ldva.used(sgn->get_rhs_operand());
  }
...
}

Call Stack[edit]

(gdb) bt
#0  LDVAExpressionTransfer::visit (this=0x7fffffffcea0, sgn=0xa20320)
    at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/simpleAnalyses/liveDeadVarAnalysis.C:228
#1  0x00002aaaac3d9968 in SgAssignOp::accept (this=0xa20320, visitor=...) at Cxx_Grammar.C:143069
#2  0x00002aaaadc61c04 in LiveDeadVarsTransfer::visit (this=0xaf9e00, sgn=0xa20320)
    at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/simpleAnalyses/liveDeadVarAnalysis.C:384
#3  0x00002aaaadbbaef0 in ROSE_VisitorPatternDefaultBase::visit (this=0xaf9e00, variable_SgBinaryOp=0xa20320) at ../../../src/frontend/SageIII/Cxx_Grammar.h:316006
#4  0x00002aaaadbba04a in ROSE_VisitorPatternDefaultBase::visit (this=0xaf9e00, variable_SgAssignOp=0xa20320) at ../../../src/frontend/SageIII/Cxx_Grammar.h:315931
#5  0x00002aaaac3d9968 in SgAssignOp::accept (this=0xa20320, visitor=...) at Cxx_Grammar.C:143069
#6  0x00002aaaadbcca0a in IntraUniDirectionalDataflow::runAnalysis (this=0x7fffffffd9f0, func=..., fState=0xafbd18, analyzeDueToCallers=true, calleesUpdated=...)
    at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/analysis/dataflow.C:282
#7  0x00002aaaadbbf444 in IntraProceduralDataflow::runAnalysis (this=0x7fffffffda00, func=..., state=0xafbd18)
    at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/analysis/dataflow.h:74
#8  0x00002aaaadbb0966 in UnstructuredPassInterDataflow::runAnalysis (this=0x7fffffffda50)
    at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/analysis/analysis.C:467
#9  0x000000000040381a in main (argc=2, argv=0x7fffffffdba8)
    at ../../../../../sourcetree/tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests/liveDeadVarAnalysisTest.C:101

Control flow graph and call graph[edit]

The generic dataflow framework works on virtual control flow graph in ROSE

Filtered Virtual CFG[edit]

The raw virtual CFG may not be desirable for all kinds of analyses since it can have too many administrative nodes which are not relevant to a problem.

So the framework provides a filter parameter to the Analysis class. A default filter will be used unless you specify your own filter.

// Example filter funtion deciding if a CFGnNode should show up or not
bool gfilter (CFGNode cfgn)
{
  SgNode *node = cfgn.getNode();

  switch (node->variantT())
  {
    //Keep the last index for initialized names. This way the def of the variable doesn't propagate to its assign initializer.
    case V_SgInitializedName:
      return (cfgn == node->cfgForEnd());

    // For function calls, we only keep the last node. The function is actually called after all its parameters  are evaluated.
    case V_SgFunctionCallExp:
      return (cfgn == node->cfgForEnd());

   //For basic blocks and other "container" nodes, keep the node that appears before the contents are executed
    case V_SgBasicBlock:
    case V_SgExprStatement:
    case V_SgCommaOpExp:
      return (cfgn == node->cfgForBeginning());

   // Must have a default case: return interesting CFGNode by default in this example
    default:
      return cfgn.isInteresting();
  }
}

// Code using the filter function
int
main( int argc, char * argv[] )
{
  SgProject* project = frontend(argc,argv);
  initAnalysis(project);
  LiveDeadVarsAnalysis ldva(project);
  ldva.filter = gfilter; // set the filter to be your own one

  UnstructuredPassInterDataflow ciipd_ldva(&ldva);
  ciipd_ldva.runAnalysis();
  ....
}

Analysis Driver[edit]

Key function:

bool IntraUniDirectionalDataflow::runAnalysis(const Function& func, NodeState* fState, bool analyzeDueToCallers, set<Function> calleesUpdated)  // analysis/dataflow.C

Basic tasks: run the analysis by

  • initialize data flow state: lattices and other information
  • walk the CFG : find descendants from a current node
  • call transfer function

Class Hierarchy[edit]

  • Analysis -> IntraProceduralAnalysis -> IntraProceduralDataflow -> IntraUnitDataflow --> IntraUniDirectionalDataflow (INTERESTING level)-> IntraBWDataflow -> LiveDeadVarsAnalysis
class Analysis {}; // an empty abstract class for any analysis

class IntraProceduralAnalysis : virtual public Analysis  //analysis/analysis.h ,  any intra procedural analysis, data flow or not
{
  protected: 
   InterProceduralAnalysis* interAnalysis;
  public: 
    void setInterAnalysis(InterProceduralAnalysis* interAnalysis) // connection to inter procedural analysis
    virtual bool runAnalysis(const Function& func, NodeState* state)=0;  // run this per function, NodeState stores lattices for each CFG node, etc.
    virtual ~IntraProceduralAnalysis();
}


//No re-entry. analysis will be executed  once??,   data flow , intra-procedural analysis
// now lattices are interested
class IntraProceduralDataflow : virtual public IntraProceduralAnalysis  //analysis/dataflow.h
{
// initialize lattice etc for a given dataflow node within a function
  virtual void genInitState (const Function& func, const DataflowNode& n, const NodeState& state,  
       std::vector<Lattice*>& initLattices, std::vector<NodeFact*>& initFacts); 

  virtual bool runAnalysis(const Function& func, NodeState* state, bool analyzeDueToCallers, std::set<Function> calleesUpdated)=0;  // the analysis on a function could be triggered by the state changes of function's callers, or callees.

 std::set<Function> visited; // make sure a function is initialized once when visited multiple times

}


class IntraUnitDataflow : virtual public IntraProceduralDataflow
{
  // transfer function: operate on lattices associated with a dataflow node, considering its current state
 virtual bool transfer(const Function& func, const DataflowNode& n, NodeState& state, const std::vector<Lattice*>& dfInfo)=0;

};


// Uni directional dataflow: either forward or backward, but not both directions!
class IntraUniDirectionalDataflow : public IntraUnitDataflow {
public:
 bool runAnalysis(const Function& func, NodeState* state, bool analyzeDueToCallers, std::set<Function> calleesUpdated);

protected:   
  bool propagateStateToNextNode (
             const std::vector<Lattice*>& curNodeState, DataflowNode curDFNode, int nodeIndex,
             const std::vector<Lattice*>& nextNodeState, DataflowNode nextDFNode);

  std::vector<DataflowNode> gatherDescendants(std::vector<DataflowEdge> edges,
                                                    DataflowNode (DataflowEdge::*edgeFn)() const);

        virtual NodeState*initializeFunctionNodeState(const Function &func, NodeState *fState) = 0;
        virtual VirtualCFG::dataflow*
          getInitialWorklist(const Function &func, bool firstVisit, bool analyzeDueToCallers, const set<Function> &calleesUpdated, NodeState *fState) = 0;

        virtual vector<Lattice*> getLatticeAnte(NodeState *state) = 0;
        virtual vector<Lattice*> getLatticePost(NodeState *state) = 0;

        // If we're currently at a function call, use the associated inter-procedural
        // analysis to determine the effect of this function call on the dataflow state.
        virtual void transferFunctionCall(const Function &func, const DataflowNode &n, NodeState *state) = 0;

        virtual vector<DataflowNode> getDescendants(const DataflowNode &n) = 0;
        virtual DataflowNode getUltimate(const Function &func) = 0; // ultimate what?   final CFG node?
}; 

class IntraBWDataflow  : public IntraUniDirectionalDataflow {//BW: Backward
   public:
        
        IntraBWDataflow()
        {}

        NodeState* initializeFunctionNodeState(const Function &func, NodeState *fState);

        VirtualCFG::dataflow*
          getInitialWorklist(const Function &func, bool firstVisit, bool analyzeDueToCallers, const set<Function> &calleesUpdated, NodeState *fState);

        virtual vector<Lattice*> getLatticeAnte(NodeState *state);
        virtual vector<Lattice*> getLatticePost(NodeState *state);

        void transferFunctionCall(const Function &func, const DataflowNode &n, NodeState *state);

        vector<DataflowNode> getDescendants(const DataflowNode &n); // next CFG nodes, depending on the direction
        { return gatherDescendants(n.inEdges(),  &DataflowEdge::source);  } 

        DataflowNode getUltimate(const Function &func); // the last CFG should be the start CFG of the function for a backward dataflow problem
       {   return cfgUtils::getFuncStartCFG(func.get_definition());   }


};

foward intra-procedural data flow analysis: e.g. reaching definition ()

  • class IntraFWDataflow  : public IntraUniDirectionalDataflow

Initialization: InitDataflowState[edit]

Used to initialized the lattices/facts for CFG nodes. It is an analysis by itself. unstructured pass

// super class: provides the driver of initialization: visit each CFG node

class UnstructuredPassIntraAnalysis : virtual public IntraProceduralAnalysis
{
public: 
          // call the initialization function on each CFG node
          bool runAnalysis(const Function& func, NodeState* state);
         // to be implemented by InitDataflowState
        virtual void visit(const Function& func, const DataflowNode& n, NodeState& state)=0;

}

bool UnstructuredPassIntraAnalysis::runAnalysis(const Function& func, NodeState* state)
{
        DataflowNode funcCFGStart = cfgUtils::getFuncStartCFG(func.get_definition());
        DataflowNode funcCFGEnd = cfgUtils::getFuncEndCFG(func.get_definition());
        
        if(analysisDebugLevel>=2)
                Dbg::dbg << "UnstructuredPassIntraAnalysis::runAnalysis() function "<<func.get_name().getString()<<"()\n";
        
        // iterate over all the nodes in this function
        for(VirtualCFG::iterator it(funcCFGStart); it!=VirtualCFG::dataflow::end(); it++)
        {
                DataflowNode n = *it;
                // The number of NodeStates associated with the given dataflow node
                //int numStates=NodeState::numNodeStates(n);
                // The actual NodeStates associated with the given dataflow node
                const vector<NodeState*> nodeStates = NodeState::getNodeStates(n);
                
                // Visit each CFG node
                for(vector<NodeState*>::const_iterator itS = nodeStates.begin(); itS!=nodeStates.end(); itS++)
                        visit(func, n, *(*itS));
        }
        return false;
}
//-------------------- derived class provide link to a concrete analysis, and visit() implementation
class InitDataflowState : public UnstructuredPassIntraAnalysis
{
        IntraProceduralDataflow* dfAnalysis;  // link to the dataflow analysis to be initialized
       
        public:
        InitDataflowState(IntraProceduralDataflow* dfAnalysis/*, std::vector<Lattice*> &initState*/)
        {
                this->dfAnalysis = dfAnalysis;
        }
     
        void visit(const Function& func, const DataflowNode& n, NodeState& state);
};


void InitDataflowState::visit (const Function& func, const DataflowNode& n, NodeState& state)
{
   ...
   dfAnalysis->genInitState(func, n, state, initLats, initFacts);
   state.setLattices((Analysis*)dfAnalysis, initLats);
   state.setFacts((Analysis*)dfAnalysis, initFacts);
   ....
}

worklist[edit]

list of CFG nodes, accessed through an iterator interface

auto_ptr<VirtualCFG::dataflow> workList(getInitialWorklist(func, firstVisit, analyzeDueToCallers, calleesUpdated, fState));


class iterator //Declared in cfgUtils/VirtualCFGIterator.h  
{
public:             
    std::list<DataflowNode> remainingNodes;
    std::set<DataflowNode> visited;
    bool initialized;
       protected:
        // returns true if the given DataflowNode is in the remainingNodes list and false otherwise
        bool isRemaining(DataflowNode n);
                
        // advances this iterator in the given direction. Forwards if fwDir=true and backwards if fwDir=false.
        // if pushAllChildren=true, all of the current node's unvisited children (predecessors or successors, 
        //    depending on fwDir) are pushed onto remainingNodes
        void advance(bool fwDir, bool pushAllChildren);
        
        public:
        virtual void operator ++ (int);
        
        bool eq(const iterator& other_it) const;
        
        bool operator==(const iterator& other_it) const;
        
        bool operator!=(const iterator& it) const;
...
  
};


void iterator::advance(bool fwDir, bool pushAllChildren)
{
        ROSE_ASSERT(initialized);
        /*printf("   iterator::advance(%d) remainingNodes.size()=%d\n", fwDir, remainingNodes.size());
        cout<<"        visited=\n";
        for(set<DataflowNode>::iterator it=visited.begin(); it!=visited.end(); it++)
                cout << "            <"<<it->getNode()->class_name()<<" | "<<it->getNode()<<" | "<<it->getNode()->unparseToString()<<">\n";*/
        if(remainingNodes.size()>0)
        {
                // pop the next CFG node from the front of the list
                DataflowNode cur = remainingNodes.front();
                remainingNodes.pop_front();
                
                if(pushAllChildren)
                {
                        // find its followers (either successors or predecessors, depending on value of fwDir), push back 
                        // those that have not yet been visited
                        vector<DataflowEdge> nextE;
                        if(fwDir)
                                nextE = cur.outEdges();
                        else
                                nextE = cur.inEdges();
                        for(vector<DataflowEdge>::iterator it=nextE.begin(); it!=nextE.end(); it++)
                        {
                                DataflowNode nextN((*it).target()/* need to put something here because DataflowNodes don't have a default constructor*/);
                                if(fwDir) nextN = (*it).target();
                                else nextN = (*it).source();
                                        
                                /*cout << "      iterator::advance "<<(fwDir?"descendant":"predecessor")<<": "<<
                                                   "<"<<nextN.getNode()->class_name()<<" | "<<nextN.getNode()<<" | "<<nextN.getNode()->unparseToString()<<">, "<<
                                                   "visited="<<(visited.find(nextN) != visited.end())<<
                                                   " remaining="<<isRemaining(nextN)<<"\n";*/
                                
                                // if we haven't yet visited this node and don't yet have it on the remainingNodes list
                                if(visited.find(nextN) == visited.end() &&
                                        !isRemaining(nextN))
                                {
                                        //printf("   pushing back node <%s: 0x%x: %s> visited=%d\n", nextN.getNode()->class_name().c_str(), nextN.getNode(), nextN.getNode()->unparseToString().c_str(), visited.find(nextN)!=visited.end());
                                        remainingNodes.push_back(nextN);
                                }
                        }
                }
                
                // if we still have any nodes left remaining
                if(remainingNodes.size()>0)
                {
                        // take the next node from the front of the list and mark it as visited
                        //visited[remainingNodes.front()] = true;
                        visited.insert(remainingNodes.front());
                }
        }
}



class dataflow :  public virtual iterator {};

class back_dataflow:  public virtual dataflow {};

void back_dataflow::operator ++ (int)
{
        advance(false, true);  // backward,  add all children
}


class IntraUniDirectionalDataflow : public IntraUnitDataflow
{  ...
   virtual VirtualCFG::dataflow*
          getInitialWorklist(const Function &func, bool firstVisit, bool analyzeDueToCallers, const set<Function> &calleesUpdated, NodeState *fState) = 0;
} 

Implemented in derived classes:

  • VirtualCFG::dataflow* IntraFWDataflow::getInitialWorklist ()
  • VirtualCFG::dataflow* IntraBWDataflow::getInitialWorklist()

apply transfer function[edit]

b is a basic block in CFG

  • {\rm IN}[b] = \bigcup_{p \in pred[b]} {\rm OUT}[p] // information goes into b is the union/join of information comes out of all predecessor nodes of b
  • {OUT}[b] = {\rm GEN}[b] \cup ({\rm IN}[b] - {\rm KILL}[b]) // information goes out out S is the information generated by b minus information killed by b. This is the transfer function operating on b!!
bool IntraUniDirectionalDataflow::runAnalysis(const Function& func, NodeState* fState, bool analyzeDueToCallers, set<Function> calleesUpdated)
{

      // Iterate over the nodes in this function that are downstream from the nodes added above
        for(; it != itEnd; it++)
        {
                DataflowNode n = *it;
                SgNode* sgn = n.getNode();

  ...
               for(vector<NodeState*>::const_iterator itS = nodeStates.begin(); itS!=nodeStates.end(); )
                {
                        state = *itS;

                        const vector<Lattice*> dfInfoAnte = getLatticeAnte(state);  // IN set
                        const vector<Lattice*> dfInfoPost = getLatticePost(state);   // OUT set

                            // OUT = IN first     // transfer within  the node: from IN to OUT,                                            
                        // Overwrite the Lattices below this node with the lattices above this node.
                        // The transfer function will then operate on these Lattices to produce the
                        // correct state below this node.
                        
                        vector<Lattice*>::const_iterator itA, itP;
                        int j=0;
                        for(itA  = dfInfoAnte.begin(), itP  = dfInfoPost.begin();
                            itA != dfInfoAnte.end() && itP != dfInfoPost.end(); 
                            itA++, itP++, j++)
                        {
                                if(analysisDebugLevel>=1){  // 
                                        Dbg::dbg << "    Meet Before: Lattice "<<j<<": \n        "<<(*itA)->str("            ")<<endl;
                                        Dbg::dbg << "    Meet After: Lattice "<<j<<": \n        "<<(*itP)->str("            ")<<endl;
                                }
                                (*itP)->copy(*itA);
                                /*if(analysisDebugLevel>=1){
                                        Dbg::dbg << "    Copied Meet Below: Lattice "<<j<<": \n        "<<(*itB)->str("            ")<<endl;
                                }*/
                        }
                        
                        // =================== TRANSFER FUNCTION ===================
                        //  (IN - KILL ) + GEN
                        if (isSgFunctionCallExp(sgn))
                          transferFunctionCall(func, n, state);

                        boost::shared_ptr<IntraDFTransferVisitor> transferVisitor = getTransferVisitor(func, n, *state, dfInfoPost);
                        sgn->accept(*transferVisitor);
                        modified = transferVisitor->finish() || modified;

                        // =================== TRANSFER FUNCTION ===================
       ...//
    }

} 

propagate state to next (meetUpdate)[edit]

This is prove to be essential to propagate information along the path. Cannot commenting it out!!

??? not sure about the difference between this step and the step before (Meet Before () / Meet After)

meetUpdate() is called here also

// Propagates the dataflow info from the current node's NodeState (curNodeState) to the next node's 
//     NodeState (nextNodeState).
// Returns true if the next node's meet state is modified and false otherwise.
bool IntraUniDirectionalDataflow::propagateStateToNextNode(
                      const vector<Lattice*>& curNodeState, DataflowNode curNode, int curNodeIndex,
                      const vector<Lattice*>& nextNodeState, DataflowNode nextNode)
{
        bool modified = false;
        vector<Lattice*>::const_iterator itC, itN;
        if(analysisDebugLevel>=1){
                Dbg::dbg << "\n        Propagating to Next Node: "<<nextNode.getNode()<<"["<<nextNode.getNode()->class_name()<<" | "<<Dbg::escape(nextNode.getNode()->unparseToString())<<"]"<<endl;
                int j;
                for(j=0, itC = curNodeState.begin(); itC != curNodeState.end(); itC++, j++)
                        Dbg::dbg << "        Cur node: Lattice "<<j<<": \n            "<<(*itC)->str("            ")<<endl;
                for(j=0, itN = nextNodeState.begin(); itN != nextNodeState.end(); itN++, j++)
                        Dbg::dbg << "        Next node: Lattice "<<j<<": \n            "<<(*itN)->str("            ")<<endl;
        }

        // Update forward info above nextNode from the forward info below curNode.
        
        // Compute the meet of the dataflow information along the curNode->nextNode edge with the 
        // next node's current state one Lattice at a time and save the result above the next node.
        for(itC = curNodeState.begin(), itN = nextNodeState.begin();
            itC != curNodeState.end() && itN != nextNodeState.end(); 
            itC++, itN++)
        {
                // Finite Lattices can use the regular meet operator, while infinite Lattices
                // must also perform widening to ensure convergence.
                if((*itN)->finiteLattice())
                        modified = (*itN)->meetUpdate(*itC) || modified;
                else
                {
                        //InfiniteLattice* meetResult = (InfiniteLattice*)itN->second->meet(itC->second);
                        InfiniteLattice* meetResult = dynamic_cast<InfiniteLattice*>((*itN)->copy());
                        Dbg::dbg << "        *itN: " << dynamic_cast<InfiniteLattice*>(*itN)->str("            ") << endl;
                        Dbg::dbg << "        *itC: " << dynamic_cast<InfiniteLattice*>(*itC)->str("            ") << endl;
                        meetResult->meetUpdate(*itC);
                        Dbg::dbg << "        meetResult: " << meetResult->str("            ") << endl;
                
                        // Widen the resulting meet
                        modified =  dynamic_cast<InfiniteLattice*>(*itN)->widenUpdate(meetResult);
                        delete meetResult;
                }
        }
        
        if(analysisDebugLevel>=1) {
                if(modified)
                {
                        Dbg::dbg << "        Next node's in-data modified. Adding..."<<endl;
                        int j=0;
                        for(itN = nextNodeState.begin(); itN != nextNodeState.end(); itN++, j++)
                        {
                                Dbg::dbg << "        Propagated: Lattice "<<j<<": \n            "<<(*itN)->str("            ")<<endl;
                        }
                }
                else
                        Dbg::dbg << "        No modification on this node"<<endl;
        }

        return modified;
}

stop condition[edit]

class IntraUniDirectionalDataflow : public IntraUnitDataflow
{
public:
         protected:
        // propagates the dataflow info from the current node's NodeState (curNodeState) to the next node's  NodeState (nextNodeState)
           // return true if any state is modified.
        bool propagateStateToNextNode(
             const std::vector<Lattice*>& curNodeState, DataflowNode curDFNode, int nodeIndex,
             const std::vector<Lattice*>& nextNodeState, DataflowNode nextDFNode);

}

live dead variable[edit]

Backward Intra-Procedural Dataflow Analysis: e.g. liveness analysis ( use --> backward --> defined)

  • class IntraBWDataflow  : public IntraUniDirectionalDataflow
class LiveDeadVarsAnalysis : public IntraBWDataflow {

  protected:
        funcSideEffectUses* fseu;
       
  public:
        LiveDeadVarsAnalysis(SgProject *project, funcSideEffectUses* fseu=NULL);
        
 // Generates the initial lattice state for the given dataflow node, in the given function, with the given NodeState
 void genInitState(const Function& func, const DataflowNode& n, const NodeState& state,
                          std::vector<Lattice*>& initLattices, std::vector<NodeFact*>& initFacts);

        
  boost::shared_ptr<IntraDFTransferVisitor> getTransferVisitor(const Function& func, const DataflowNode& n,
                                                               NodeState& state, const std::vector<Lattice*>& dfInfo)
  { return boost::shared_ptr<IntraDFTransferVisitor>(new LiveDeadVarsTransfer(func, n, state, dfInfo, fseu)); }

  bool transfer(const Function& func, const DataflowNode& n, NodeState& state, const std::vector<Lattice*>& dfInfo) { assert(0); return false; }


}; 

Inter-procedural analysis[edit]

Key: transfer function that is applied to call sites to perform the appropriate state transfers across function boundaries.

transfer function[edit]


void IntraFWDataflow::transferFunctionCall(const Function &func, const DataflowNode &n, NodeState *state)
{
  vector<Lattice*> dfInfoBelow = state->getLatticeBelow(this);

  vector<Lattice*>* retState = NULL;
  dynamic_cast<InterProceduralDataflow*>(interAnalysis)->
    transfer(func, n, *state, dfInfoBelow, &retState, true);

  if(retState && !(retState->size()==0 || (retState->size() == dfInfoBelow.size()))) {
    Dbg::dbg << "#retState="<<retState->size()<<endl;
    for(vector<Lattice*>::iterator ml=retState->begin(); ml!=retState->end(); ml++)
      Dbg::dbg << "        "<<(*ml)->str("            ")<<endl;
    Dbg::dbg << "#dfInfoBelow="<<dfInfoBelow.size()<<endl;
    for(vector<Lattice*>::const_iterator l=dfInfoBelow.begin(); l!=dfInfoBelow.end(); l++)
      Dbg::dbg << "        "<<(*l)->str("            ")<<endl;
  }

  // Incorporate information about the function's return value into the caller's dataflow state
  // as the information of the SgFunctionCallExp
  ROSE_ASSERT(retState==NULL || retState->size()==0 || (retState->size() == dfInfoBelow.size()));
  if(retState) {
    vector<Lattice*>::iterator lRet;
    vector<Lattice*>::const_iterator lDF;
    for(lRet=retState->begin(), lDF=dfInfoBelow.begin(); 
        lRet!=retState->end(); lRet++, lDF++) {
      Dbg::dbg << "    lDF Before="<<(*lDF)->str("        ")<<endl;
      Dbg::dbg << "    lRet Before="<<(*lRet)->str("        ")<<endl;
      (*lDF)->unProject(isSgFunctionCallExp(n.getNode()), *lRet);
      Dbg::dbg << "    lDF After="<<(*lDF)->str("        ")<<endl;
    }
  }
}

InterProceduralDataflow[edit]

InterProceduralDataflow::InterProceduralDataflow(IntraProceduralDataflow* intraDataflowAnalysis) :
        InterProceduralAnalysis((IntraProceduralAnalysis*)intraDataflowAnalysis)


 // !!! NOTE: cfgForEnd() AND cfgForBeginning() PRODUCE THE SAME SgFunctionDefinition SgNode BUT THE DIFFERENT INDEXES
                        // !!!       (0 FOR BEGINNING AND 3 FOR END). AS SUCH, IT DOESN'T MATTER WHICH ONE WE CHOOSE. HOWEVER, IT DOES MATTER
                        // !!!       WHETHER WE CALL genInitState TO GENERATE THE STATE BELOW THE NODE (START OF THE FUNCTION) OR ABOVE IT 
                        // !!!       (END OF THE FUNCTION). THE CAPABILITY TO DIFFERENTIATE THE TWO CASES NEEDS TO BE ADDED TO genInitState
                        // !!!       AND WHEN IT IS, WE'LL NEED TO CALL IT INDEPENDENTLY FOR cfgForEnd() AND cfgForBeginning() AND ALSO TO MAKE
                        // !!!       TO SET THE LATTICES ABOVE THE ANALYSIS 


TODO: begin and end func definition issue is mentioned inside of this

simplest form:unstructured[edit]

Simplest form: No transfer action at call sites at all

class UnstructuredPassInterDataflow : virtual public InterProceduralDataflow
{
        public:
        
        UnstructuredPassInterDataflow(IntraProceduralDataflow* intraDataflowAnalysis) 
                             : InterProceduralAnalysis((IntraProceduralAnalysis*)intraDataflowAnalysis), InterProceduralDataflow(intraDataflowAnalysis)
        {}
                
        // the transfer function that is applied to SgFunctionCallExp nodes to perform the appropriate state transfers
        // fw - =true if this is a forward analysis and =false if this is a backward analysis
        // n - the dataflow node that is being processed
        // state - the NodeState object that describes the dataflow state immediately before (if fw=true) or immediately after 
        //         (if fw=false) the SgFunctionCallExp node, as established by earlier analysis passes
        // dfInfo - the Lattices that this transfer function operates on. The function propagates them 
        //          to the calling function and overwrites them with the dataflow result of calling this function.
        // retState - Pointer reference to a Lattice* vector that will be assigned to point to the lattices of
        //          the function call's return value. The callee may not modify these lattices.
        // Returns true if any of the input lattices changed as a result of the transfer function and
        //    false otherwise.  
        bool transfer(const Function& func, const DataflowNode& n, NodeState& state, 
                      const std::vector<Lattice*>& dfInfo, std::vector<Lattice*>** retState, bool fw)
        { 
                return false;
        }
        
        void runAnalysis();
};

// simply call intra-procedural analysis on each function one by one.
void UnstructuredPassInterDataflow::runAnalysis()
{
        set<FunctionState*> allFuncs = FunctionState::getAllDefinedFuncs();
        
        // iterate over all functions with bodies
        for(set<FunctionState*>::iterator it=allFuncs.begin(); it!=allFuncs.end(); it++)
        {
                const Function& func = (*it)->func;
                FunctionState* fState = FunctionState::getDefinedFuncState(func);
                
                // Call the current intra-procedural dataflow as if it were a generic analysi
                intraAnalysis->runAnalysis(func, &(fState->state));
        }
}

ContextInsensitiveInterProceduralDataflow[edit]

TODO

How to use one analysis[edit]

Call directly[edit]

Direct call: Runs the intra-procedural analysis on the given function and returns true if the function's NodeState gets modified as a result and false otherwise state - the function's NodeState

  • bool IntraUniDirectionalDataflow::runAnalysis(const Function& func, NodeState* state, bool analyzeDueToCallers, std::set<Function> calleesUpdated);
  • direct call with a simpler parameter list : not feasible, all intra procedural analysis has to have an inter procedural analysis set interally!
bool IntraProceduralDataflow::runAnalysis(const Function& func, NodeState* state)
{
   // Each function is analyzed as if it were called directly by the language's runtime, ignoring 
  // the application's actual call graph
    bool analyzeDueToCallers = true; 
                
    // We ignore the application's call graph, so it doesn't matter whether this function calls other functions
   std::set<Function> calleesUpdated;
                
      return runAnalysis(func, state, analyzeDueToCallers, calleesUpdated);
}

Through inter-procedural analysis[edit]

Invoke a simple intra-procedural analysis through the unstructured pass inter-procedural data flow class

int main()
{
  SgProject* project = frontend(argc,argv);
  initAnalysis(project);

  // prepare debugging support
  Dbg::init("Live dead variable analysis Test", ".", "index.html");
  liveDeadAnalysisDebugLevel = 1;
  analysisDebugLevel = 1;

  // basis analysis
  LiveDeadVarsAnalysis ldva(project);
     // wrap it inside the unstructured inter-procedural data flow
   UnstructuredPassInterDataflow ciipd_ldva(&ldva);
   ciipd_ldva.runAnalysis();
  
   .....

}

Retrieve lattices[edit]

Sample code:

// Initialize vars to hold all the variables and expressions that are live at DataflowNode n
//void getAllLiveVarsAt(LiveDeadVarsAnalysis* ldva, const DataflowNode& n, const NodeState& state, set<varID>& vars, string indent)
void getAllLiveVarsAt(LiveDeadVarsAnalysis* ldva, const NodeState& state, set<varID>& vars, string indent)
{
        LiveVarsLattice* liveLAbove = dynamic_cast<LiveVarsLattice*>(*(state.getLatticeAbove(ldva).begin()));
        LiveVarsLattice* liveLBelow = dynamic_cast<LiveVarsLattice*>(*(state.getLatticeBelow(ldva).begin()));

        // The set of live vars AT this node is the union of vars that are live above it and below it
        for(set<varID>::iterator var=liveLAbove->liveVars.begin(); var!=liveLAbove->liveVars.end(); var++)
                vars.insert(*var);
        for(set<varID>::iterator var=liveLBelow->liveVars.begin(); var!=liveLBelow->liveVars.end(); var++)
                vars.insert(*var);
}

Testing[edit]

It is essential to have a way to test the analysis results are correct.

We currently use a primitive way to test the correctness of analysis: comparing pragma and lattice string output

Two examples translators testing analysis correctness(comparing pragma and lattice string output):


An example test input file for liveness analysis's correctness

int bar(int flag)
{

   int a =1,b,c;
#pragma rose [LiveVarsLattice: liveVars=[flag, a, b]]
   if (flag == 0) // flag is only read here, not written!
     c = a;
   else
     c = b;
   return c;
}

How to debug[edit]

Trace the analysis[edit]

Turn it on

     liveDeadAnalysisDebugLevel = 1;
     analysisDebugLevel = 1;


// find code with  
 if(analysisDebugLevel>=1) ...

check the web page dump using a browser

 firefox index.html

How to read the trace file: start from the beginning: information is ordered based on the CFG nodes visited. The order could be forward or backward order. Check if the order is correct first, then for each node visited

 ==================================  
  Copying incoming Lattice 0: 
        [LiveVarsLattice: liveVars=[b]]
  To outgoing Lattice 0: 
        [LiveVarsLattice: liveVars=[]]
 ==================================  
  Transferring the outgoing  Lattice ... 
    liveLat=[LiveVarsLattice: liveVars=[b]]
    Dead Expression
        usedVars=<>
        assignedVars=<>
        assignedExprs=<>
        #usedVars=0 #assignedExprs=0
    Transferred: outgoing Lattice 0: 
        [LiveVarsLattice: liveVars=[b]]
    transferred, modified=0
 ==================================  
 Propagating/Merging the outgoing  Lattice to all descendant nodes ... 
    Descendants (1):
    ~~~~~~~~~~~~
    Descendant: 0x2b9e8c47f010[SgIfStmt | if(flag == 0) c = a;else c = b;]

        Propagating to Next Node: 0x2b9e8c47f010[SgIfStmt | if(flag == 0) c = a;else c = b;]
        Cur node: Lattice 0: 
            [LiveVarsLattice: liveVars=[b]]
        Next node: Lattice 0: 
            [LiveVarsLattice: liveVars=[a]]
        Next node's in-data modified. Adding...
        Propagated: Lattice 0: 
            [LiveVarsLattice: liveVars=[a, b]]
    propagated/merged, modified=1
    ^^^^^^^^^^^^^^^^^^ 

A real example: if (flag)  c = a; else c = b;  // liveness analysis, a, b are live in two branches, they are propagated backward to if-stmt

   ------------------
    Descendants (1):  // from c =a back to if-stmt (next node)
    ~~~~~~~~~~~~
    Descendant: 0x2ac8bb95c010[SgIfStmt | if(flag == 0) c = a;else c = b;]

        Propagating to Next Node: 0x2ac8bb95c010[SgIfStmt | if(flag == 0) c = a;else c = b;]
        Cur node: Lattice 0: 
            [LiveVarsLattice: liveVars=[a]]   // current node's lattice
        Next node: Lattice 0: 
            [LiveVarsLattice: liveVars=[]]   // next node's lattice before propagation
        Next node's in-data modified. Adding...
        Propagated: Lattice 0: 
            [LiveVarsLattice: liveVars=[a]]  // propagate a into if-stmt's lattice
    propagated, modified=1
    ^^^^^^^^^^^^^^^^^^ 

    ------------------
    Descendants (1):  // from c = b --> if-stmt 
    ~~~~~~~~~~~~
    Descendant: 0x2ac8bb95c010[SgIfStmt | if(flag == 0) c = a;else c = b;]

        Propagating to Next Node: 0x2ac8bb95c010[SgIfStmt | if(flag == 0) c = a;else c = b;]
        Cur node: Lattice 0: 
            [LiveVarsLattice: liveVars=[b]]
        Next node: Lattice 0: 
            [LiveVarsLattice: liveVars=[a]] 
        Next node's in-data modified. Adding...
        Propagated: Lattice 0: 
            [LiveVarsLattice: liveVars=[a, b]]  // now both a and b are propagated/ merged
    propagated, modified=1
    ^^^^^^^^^^^^^^^^^^ 

Dump cfg dot graph with lattices[edit]

A class analysisStatesToDot is provided generate a CFG dot graph with lattices information.

//AnalysisDebuggingUtils.C

  class analysisStatesToDOT : public UnstructuredPassIntraAnalysis
  {
    private:
      //    LiveDeadVarsAnalysis* lda; // reference to the source analysis
      Analysis* lda; // reference to the source analysis
      void printEdge(const DataflowEdge& e); // print data flow edge
      void printNode(const DataflowNode& n, std::string state_string); // print date flow node
      void visit(const Function& func, const DataflowNode& n, NodeState& state); // visitor function
    public:
      std::ostream* ostr; 
      analysisStatesToDOT (Analysis* l):  lda(l){ };
  };

namespace Dbg
{ 
//....
 void dotGraphGenerator (::Analysis *a) 
  {
    ::analysisStatesToDOT eas(a);
    IntraAnalysisResultsToDotFiles upia_eas(eas);
    upia_eas.runAnalysis();
  }

} // namespace Dbg

Example use[edit]

// Liao, 12/6/2011
#include "rose.h"

#include <list>
#include <sstream>
#include <iostream>
#include <fstream>
#include <string>
#include <map>

using namespace std;

// TODO group them into one header
#include "genericDataflowCommon.h"
#include "VirtualCFGIterator.h"
#include "cfgUtils.h"
#include "CallGraphTraverse.h"
#include "analysisCommon.h"
#include "analysis.h"
#include "dataflow.h"
#include "latticeFull.h"
#include "printAnalysisStates.h"
#include "liveDeadVarAnalysis.h"

int numFails = 0, numPass = 0;

//-----------------------------------------------------------
int
main( int argc, char * argv[] )
   {

     SgProject* project = frontend(argc,argv);

     initAnalysis(project);

   // generating  index.html for tracing the analysis
     Dbg::init("Live dead variable analysis Test", ".", "index.html");
     liveDeadAnalysisDebugLevel = 1;
     analysisDebugLevel = 1;

     LiveDeadVarsAnalysis ldva(project);
     UnstructuredPassInterDataflow ciipd_ldva(&ldva);
     ciipd_ldva.runAnalysis();
   // Output the dot graph  *********************
    Dbg::dotGraphGenerator (&ldva);
      return 0;
   }

TODO[edit]

  • Hard to use the generated lattices since many temporary expression objects are generated in lattices. But often users do not care about them (constant propagation, pointer analysis)
    • to see the problem: go to [build64/tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests]
    • run make check
    • see the dot graph dump of an analysis : run.sh test_ptr4.C_main_0x2b41e651c038_cfg.dot

Program Optimizations[edit]

ROSE provides the following program optimizations and tranformations:

  • loop transformation, including loop fusion, fisson, unrolling, blocking, loop interchange, etc.
  • inlining
  • outlining
  • constant folding
  • partial redundancy elimination

ROSE Projects[edit]

This page serves as a quite guide about what the major directories under rose/projects are:

Parsing

  • pragmaParsing: An example translator using the parsing building blocks provided by ROSE to parse pragmas

Translations:

  • autoTuning: a project to use ROSE's parameterized translators to facilitate empirical tuning (or autotuning)
  • DataFaultTolerance: a project to use source-to-source translation to make application resilient to memory faults
  • extractMPISkeleton: extract MPI communication skeletons
  • Fortran_to_C : A Fortran to C language translator

Static Analysis

  • compass: a static analysis tool to find errors in applications

Dynamic Analysis

  • RTED: runtime error detection using compiler instrumentation of library calls.

Binary Analysis:

  • BinaryCloneDetection: detect similarities between binary executables.
  • CloneDetection:


Optimizations of high-level abstractions

  • arrayOptimization: optimizations based on array abstractions
  • autoParallelization: A translator which can automatically insert OpenMP directives into serial code, based on dependence analysis and optionally semantics of abstractions.

Parallel Programming Models:

  • mint: a directive based programming model for GPUs
  • OpenMP_Translator: the first version of OpenMP implementation using ROSE. Not recommended for production use, kept just as an example.
  • UpcTranslation: a preliminarily example project to demonstrate how ROSE can be used to created a UPC compiler

minitermite[edit]

Problem: A student added some new IR nodes into ROSE. She is having trouble to pass make for minitermite

Solution: projects/minitermite/HOWTO_ADD_NEW_SGNODE_VARIANTS

Developer's Guide[edit]

We briefly describe the workflow of ROSE developers.

Basic skills for ROSE developers[edit]

Have experience with these, or be ready to learn them

  • Shell programming: Bash (default shell)
  • Unix commands: grep, find, ssh, etc.
  • C++ programming
  • GDB/Valgrind
  • Git
  • Build systems
    • Make
    • CMake The windows port uses this, we are looking into using it elsewhere
    • autoconf/automake
    • libtool
  • LaTeX: Documentation Standard
  • Compiler Techniques: Intuition is invaluable

Helpful Resources[edit]

Valued Contributions[edit]

The ROSE project values the following contributions:

Development:

  • Code: implementing new compiler features, improving existing work, passing code review and Jenkins. Only commits which were merged into the central master branch count as contributions.
    • Expanding Language Support
    • Analyses (AST analyses are a good place to start)
    • Optimizations
    • Build System
  • Bug Fixes: passing code review and Jenkins (in the future, Klocwork, Coverity, etc. analysis tools)
    • Users bugs found in the SciDAC outreach center's bug tracker
    • Internal bugs: Usually developer bugs, they can be found on github.com or redmine
  • Documentation:
    • How ROSE Works
    • Tutorial, Manual, FAQ, etc. (Updating these is most helpful)
    • Project Documentation (projects can be found in the projects directory)
    • Design/Architecture/API Documentation,
    • Workflow Documentation
  • System administration: Maintain and improve workflow components (mostly internal developer work, but suggestions are useful)
    • Website: rosecompiler.org
    • Git repository
    • Project management: Redmine
    • Code review: Github enterprise
    • Jenkins: Continuous integration, improving testings

Research:

  • Publications: technical reports, papers, presentations, posters
  • Slides from presentations (Upload slides to relevant Redmine project's @Files Tab@. (@.pptx@ format is required)

Proposal:

  • Collaborative proposals

Feedback: Remember that any problem you find is probably not unique to you

  • General struggles (administratively or implementation-wise)
  • General improvement/enhancement ideas for both the software and the people

Milestones for a ROSE developers[edit]

Having been working with some interns with us, we roughly identify the following milestones for a ROSE developer:

  • Development environment: pick a platform of your choice (Linux or Mac OS), and get familiar with that specific platform (shell, editors, environment variable setting, etc.)
    • Physical location: locations MATTER! Sit closer to people you should interact often. Make your desk/office accessible to others. Physically isolated office/desk may have very negative impact on your productivity.
  • Installing ROSE: being able to smoothly configure, compile, and install ROSE
  • Build system: being able to add a project (first skeleton) into ROSE by modifying Makefile.am, etc.
  • Contribution following ROSE Coding Standard and passing code review
    • Documentation: sufficient documentation about what you work is about
    • Software Engineering:
      • Style guidelines: Doxygen comments, naming conventions, where to put things, etc.
      • Interface: Does the code has a clean and simple interface to be used by users?
      • Algorithm design: documented by source comments how things are expected to work
      • Coding implementation: correctly implement the designed algorithm
    • Tests: Each contribution must have the accompanying tests to make sure it works as expected
  • Continuous integration: push commits to be code reviewed and tested by Jenkins every two or three weeks for your incremental development results.
    • Add a new test job if none of the existing ones tests your project
  • Confirm your commits are merged into the ROSE project's central master branch: github.com provides graphs for individual impact

Termination checklist[edit]

We often have interns/collaborators/subcontractors finishing up their official duties with us. Here a brief checklist before their termination

  • Complete the student program checklist (we have no idea what you need to do :-)
  • Complete the performance evaluation form provided by us: mostly provide objective facts to demonstrate contributions since subjective impressions can be very off.
  • Complete a short feedback form provided by us, where you can discuss anything related to developing ROSE or working with the ROSE team. Your candid feedback is essential to the future of our collaborative program.
  • Schedule a one-to-one meeting with at least one staff member two weeks before the official end dates to do status check and plan the exit
  • Turn in all documentations (LaTeX, word, powerpoint, etc) not in git repo by uploading them to the redmine project File tab
  • Stop developing any new features at least one week before the end date so we can focus on making sure all source code contributions can pass Jenkins
  • If you plan to continue collaborating with us, ask about getting internal access (e.g. VPN), or setup some other method for collaboration.

Code Review[edit]

See the Code Review section for details.

Working From An Internal LLNL Computer[edit]

Toolchain[edit]

There are many tools pre-installed on the /nfs/apps mount point:

$ ls /nfs/apps
apr        bin       etc   grace     java     mpc      neon     pygobject  sqlite      toolworks.old
asciidoc   binutils  flex  graphviz  libtool  mpfr     openssh  python     src         totalview
asymptote  blender   gcc   hdf5      m4       mpich    perl     qt         subversion  upc
autoconf   doc++     git   insure++  maple    mpich2   pgi      rdesktop   swig        visit
automake   doxygen   gmp   intel     matlab   mplayer  psi      ruby       texinfo     xemacs


The root of most of these tools contains a setup.sh file which you can source. This will correctly setup your library path ($LD_LIBRARY_PATH) and program path ($PATH):


GCC

$ source /nfs/apps/gcc/4.5.0/setup.sh

This GCC setup.sh file should also source MPFR and GMP, but if not, please do it manually:

$ source /nfs/apps/mpfr/3.0.0/setup.sh
$ source /nfs/apps/gmp/4.3.2/setup.sh

If you fail to properly source these dependencies, you may encounter this error:

/nfs/apps/gcc/4.3.2/libexec/gcc/x86_64-unknown-linux-gnu/4.3.2/f951: error while loading shared libraries: libmpfr.so.1: cannot open shared object file: No such file or directory

Workflow[edit]

Motivation and Goals[edit]

The goal of the ROSE workflow is to have a streamlined, simplified, and automated process to allow users and developers to:

  • Improve the quality of ROSE source code and documentation
  • improve our productivity allowing us to produce high quality work using less time and other resources than would otherwise be required

Development Guide[edit]

Developing a big, sophisticated project entails many challenges. To mitigate some of these challenges, we have adopted several best practices: incremental development, code review, and continuous integration.

  • Iterative and Incremental software development for early results, controllable risks, and better engagement of stakeholders
  • Code review for consistency, maintainability, usability, and quality
  • Continuous Integration for automated testing, easy release, and scalable collaboration

Incremental Development[edit]

Developing new functionality in small steps, where the resulting code at each step is a useful improvement over the previous state. Contrast to developing an entire feature fully elaborated, with no points along the way at which it's externally usable.

Each ROSE developer is expected to push his/her work at least once every three weeks.

Major benefits of doing things incrementally

  • You can have intermediate results along the path. So your sponsors will sleep better.
  • You will get feedback early and frequently about if you are heading to the right direction.
  • Your work will be tested and merged often into the master branch, avoiding the risks of merge conflicts.

See more tips about How to incrementally work on a project

Code Review[edit]

See Code Review in ROSE.

Continuous Integration[edit]

Incorporating changes from work in progress into a shared mainline as frequently as possible, in order to identify incompatible changes and introduced bugs as early as possible. The integrated changes need not be particular increments of functionality as far as the rest of the system is concerned.

In other words, incremental development is about making one's work valuable as early as possible, and potentially about getting a better sense of what direction it should take, while continuous integration is about reducing the risks that result from codebase divergence as multiple people do development in parallel.

The question of whether to conditionalize new code is an interesting one. By doing so, one narrows the scope of continuous integration to just checking for surface incompatibilities in merging the changed code. Without actually running the new code against the existing tests, the early detection of introduced bugs is lost. In exchange, multiple people working in the same part of the codebase become less likely to step on each other's toes, because the relevant code changes are distributed more rapidly.

See more at Continuous Integration

High Level Workflow[edit]

Requirement Analysis[edit]

External:

Internal: Need LC accounts to access

Design[edit]

  • Wikibook: community-based design documents and provoke discussion
  • Powerpoint slides: more formal communication about what is the design
  • Confluence: https://lc.llnl.gov/confluence/

Implementation[edit]

  • Redmine (http://hudson-rose-30:3000/): create projects based on milestones and user input, create and track tasks
    • Project-Specific Tasks
    • Private Issue Tracking
    • Private Documentation
      • Using redmine's wiki
  • Github:
    • Internal (http://github.llnl.gov/): for code review only,
    • External (https://github.com/rose-compiler/rose): public hosting code, pubic issue tracking for general ROSE bugs and features.
    • "Rosebot" to automate Github workflow: preliminary testing, policies (git-hooks), automatically add reviewers, etc.

Testing[edit]

Documentation[edit]

Publicity[edit]

Proposing Workflow Changes[edit]

Major workflow improvements and changes should be thoroughly tested and reviewed by staff members before deployment since they may have profound impact on the project

How to propose a workflow change

  • Submit a ticket on github.com's rose-public/rose issue tracker. In the ticket, provide the following information:
    • What is it: Explain what change is proposed
    • Why the changes: the long-term benefits for our productivity and quality of work
    • The cost of the changes: learning curve, maintainability, purchase cost

Reviewing Workflow Change Proposals[edit]

Review criteria[edit]

  • Optimize
    • Optimize our workflow to allow us to do more quality and use less time and other resources.
    • Address what is slowing us down or distracting us.
    • Simplify daily life. Compare how we can eliminate or automate using the proposed workflow improvements.
      • It is counterproductive to improve workflow by adding more hoops/steps/clicks into daily work.
  • Improve:
    • Allows the improvement of the quality of work incrementally:
    • Accepting incremental improvements is more realistic than asking for perfection in the first try.
    • Workflow should allow quick new contributions and fast revision of existing contributions
  • Automate:
    • Additions to the workflow should be automated as much as possible.
  • Preserve:
    • It must preserve existing work:
      • No creation of anything from scratch
    • Does it interact well with existing workflow
    • Is there a way to convert existing code/documents into the new form
  • Simplicity:
    • The more software tools we depend on, the harder to use and maintain our workflow. Similarly, the more formats/standards we enforce, the harder for developers to do their daily work
    • Adopting new required software components and new required technical formats/standards in our workflow should be very carefully reviewed for the associated long-term benefits and costs. Long-term means the range of 5 to 10 years and is not tied to a temporary thing we use now.
  • Preference of major contributors: Whoever contributes the most should has a little bit more weight to say
  • Documentation: We require major changes to be documented and reviewed before deployment. Writing down things can help us clarify details and solicit wider comments (instead of limited to face-to-face meeting)

Coding Standard[edit]

What to Expect and What to Avoid[edit]

This page documents the current recommended practice of how we should write code within the ROSE project. It also serves as a guideline for our code review process.

New code should follow the conventions described in this document from the very beginning.

Updates to existing code that follows a different coding style should only be performed if you are the maintainer of the code.

The order of sections in coding standard follows a top-down approach: big things first, then drill down to fine-grain details.

Six Principles[edit]

We use coding standard to reflect the principal things we value for all contributions to ROSE

  • Documentation: What are the commits about? Is this reflected in commit messages, README, source comments, or LaTex files within the same commits?
  • Style: Is the coding style consistent with the required and recommended formats? Is the code clean and pleasant and easy to read?
  • Interface: Does the code has a clean and simple interface to be used by users?
  • Algorithm: Does the code has sufficient comments about what algorithm is used? Is the algorithm correct and efficient (space and time complexity)?
  • Implementation: Does the implementation correctly implement the documented algorithms?
  • Testing: Does the code has the accompanying test translator and input to ensure the contributions do what they are supposed to do?
    • Is Jenkins being configured to trigger these tests? Local tests on developer's workstation do not count.

Avoid Coding Standard War[edit]

We directly quote text from http://www.parashift.com/c++-faq/coding-std-wars.html, as follows:

"Nearly every software engineer has, at some point, been exploited by someone who used coding standards as a power play. Dogmatism over minutia is the purvue of the intellectually weak. Don't be like them. These are those who can't contribute in any meaningful way, who can't actually improve the value of the software product, so instead of exposing their incompetence through silence, they blather with zeal about nits. They can't add value in the substance of the software, so they argue over form. Just because "they" do that doesn't mean coding standards are bad, however.

Another emotional reaction against coding standards is caused by coding standards set by individuals with obsolete skills. For example, someone might set today's standards based on what programming was like N decades ago when the standards setter was writing code. Such impositions generate an attitude of mistrust for coding standards. As above, if you have been forced to endure an unfortunate experience like this, don't let it sour you to the whole point and value of coding standards. It doesn't take a very large organization to find there is value in having consistency, since different programmers can edit the same code without constantly reorganizing each others' code in a tug-of-war over the "best" coding standard."

Must, Should and Can[edit]

The terms must, should and can have special meaning.

  • A must requirement must be followed,
  • A should is a strong recommendation,
  • A can is a general guideline.

Got New Ideas, Suggestions[edit]

This is not a place to write down the new ideas/concepts/suggestions to be used in the future. If you have suggestions, put into the discussion tab link of this page.

We do welcome suggestions for improvements and changes so we can do things faster and better.

Git Convention[edit]

Name and Email[edit]

Before you commit your local changes, you MUST ensure that you have correctly configured your author and email information (on all of your machines). Having a recognizable and consistent name and email will make it easier for us to evaluate the contributions that you've made to our project.

Guidelines:

  • Name: You MUST use your official name you commonly use for work/business, not nickname or alias which cannot be easily recognized by co-workers, managers, or sponsors.
  • Email: You MUST use your email commonly used for work. It can be either your company email or your personal email (gmail) if you DO commonly use that personal email for business purpose.

To check if your author and email are configured correctly:

  $ git config user.name
  <your name>
 
  $ git config user.email
  <your email>

Alternatively, you can just type the following to list all your current git configuration variables and values, including name and email information.

  $ git config -l


To set your name and email:

  $ git config --global user.name "<Your Name>"
  $ git config --global user.email "<your@email.com>"

Commit messages[edit]

It is important to have concise and accurate commit messages to help code reviewers do their work.

Example commit message, excerpt from link

(Binary Analysis) SMT solver statistics; documentation

* Replaced the SMT class-wide number-of-calls statistic with a
  more flexible and extensible design that also tracks the amount
  of I/O between ROSE and the SMT solver.  The new method tracks
  statistics on a per-solver basis as well as a class-wide basis, and
  allows the statistics to be reset at artibrary points by the user.

* More documentation for the new memory cell, memory state, and X86
  register state classes.
  • (Required) Summary: the first line of the commit message is a one line summary (<50 words) of the commit. Start the summary with a topic, enclosed in parentheses, to indicate the project, feature, bugfix, etc. that this commit represents.
  • (Optional) Use a bullet-list (using an asterisk, *) for each item to elaborate on the commit

Also see http://spheredev.org/wiki/Git_for_the_lazy#Writing_good_commit_messages.

Design Document[edit]

Overview[edit]

"The software design document is a written contract between you, your team, your project manager and your client. When you document your assumptions, decisions and risks, it gives the team members and stakeholders an opportunity to agree or to ask for clarifications and modifications. Once the software design document is approved by the appropriate parties, it becomes a baseline for limiting changes in the scope of the project." - How to Write a Software Design Document | eHow.com

We are still in the process of defining the requirements for design documents, but preliminarily, here are the initial rules for writing a design document for a ROSE module (an analysis, transformation, optimization, etc.).

(We thank Professor Vivek Sarkar at Rice University for his insightful comments for some of the initial design document requirements.)

Guideline[edit]

  • All new ROSE analyses, transformations, and optimizations must have an accompanying design document, to be peer-reviewed, before the actual implementation begins.
  • Be specific enough that someone with ROSE skills who is not the original designer could (in principle) implement the design just by looking at the document.
  • It's to be expected that different developers will make different low-level choices about data structures, etc

Requirement vs. Design Document[edit]

If the requirements document is the "why" of the software, then the technical design document is the "how to". For simplicity, we put both requirements and design into a single document for now. We allow a separated requirement analysis document if necessary.

The purpose of writing the technical design document is to guide developers in implementing (and fulfilling) the requirements of the software--it's the software's blueprint.

Format[edit]

Documents must be:

  • Written in LaTex for re-usability in publications and proposals.
  • Stored under version control to support collaborative writing.

Your document should, at a minimum, include these formal sections:

  • Title page
  • Author information: who participates in the major writing
  • Reviewer information: who reviews and approves the document
  • Table of contents
  • Page numbering format
  • Section numbers
  • Revision history

Content[edit]

Major Sections

  • Overview
    • Explain the motivation and goal of the module: what does this module do, the goal, the problem to address, etc.
  • Requirement analysis: what is required for this module
    • Define the interface: namespace, function names, parameters, return values. How others can call this module and obtain the returned results
    • Performance requirement: time and space complexity
    • Scope of input/test codes: what types of languages to be supported, the constructs of a language to be supported, the benchmarks to be used
  • Design considerations
    • Assumptions
    • Constraints
    • Tradeoffs and limitations: why this algorithm, what are the priorities, etc.
    • Non-standard elements: Definitions of any non-standard symbols, shapes, acronyms, and unique terms in the document
    • Game plan: How each requirement will be achieved
  • Internal software workflow
    • Diagrams: logical structure and logical processing steps: MUST have a UML diagram or power point diagram
    • Pseudo code: MUST have pseudo code to describe key data structures and high-level algorithm steps
    • Example: Must illustrate the designed algorithm by using at least one sample input code to go through the important intermediate results of the algorithm.
    • Error, alarm and warning messages, optional
  • Performance: MUST have complexity analysis. Estimate the time and space complexity of this module so users can know what to expect
  • Reliability (Optional)
  • Related work: cite relevant work in textbooks and papers

Development guidelines[edit]

  • Coding guidelines: standards and conventions.
  • Standard languages and tools
  • Definitions of variables and a description of where they are used

References[edit]

TODO[edit]

  • a sample design document

Testing[edit]

Rules

  • All contributions MUST have the accompanying test translator and input files to demonstrate the contributions work as expected.
  • All tests MUST be triggered by the "make check" rule
  • All test should have self-verification to make sure the correct results are generated
  • All tests MUST be activated by at least one of the integration tests of Jenkins (the test jobs used to check if something can be merged into our central repository's master branch)
    • This will ensure that no future commits can break your contributions.

Programming Languages[edit]

Core Languages[edit]

Only C++ is allowed. Any other programming language is an exception on a case-by-case basis.

Question: But Programming language XYZ is much better than C++ and I am really good at XYZ!!!

Answer: We can allow XYZ only if

  • You can teach at least one of old dogs (staff members) of our team the new tricks to efficiently use XYZ
  • You will be around in our team in the next 5 to 10 years to maintain all the code written in XYZ if none of the old dogs have time/interest to switch to XYZ
  • You can prove that XYZ can interact well with the existing C++ codes in ROSE

Scripting Languages[edit]

Only two scripting languages are allowed

  • bash shell scripting
  • perl

Again, this is just a preference of the staff members and what we have now. Allowing uncontrolled number of scripting languages in a single project will make the project impossible to maintain and hard to learn.

Naming Conventions[edit]

The order of sub-sections reflects a top-down approach for how things are added during the development cycle: from directory --> file --> namespace --> etc.

General[edit]

  • Language: all names should be written in English since it is the preferred language for development, internationally
  • fileName; // NOT: filNavn

Abbreviations and Acronyms[edit]

Avoid ambiguous abbreviations: obtain good balance between user-clarity and -productivity.

Abbreviations and acronyms should NOT be uppercase when used as name

  • exportHtmlSource(); // NOT: exportHTMLSource();
  • openDvdPlayer(); // NOT: openDVDPlayer();

Likewise, commonly-lowercase abbreviations and acronyms should NOT start with a lower-case letter when used in a CamelCase name:

  • SgAsmX86Instruction // NOT: SgAsmx86Instruction
  • myIpod // NOT: myiPod

File/Directory[edit]

Case:

  • camelCase like fileName.hpp: This is consistent with existing names used in ROSE

File Extension:

  • Header files: .h or .hpp
  • Source files: .cpp or .cxx
    • .C should be avoided to work with file systems which do not distinguish between lower or upper case.

Namespaces[edit]

  • A namespace should represent a logical unit, usually encapsulated in a single header file within a specific directory.
  • CamelCase for namespaces, such as SageInterface, SageBuilder, etc.
    • avoid lower case names, bad names: sage_interface
  • use singular for nouns within namespace names, avoid plural
  • use full words, avoid abbreviations
  • use at least two words to reduce name collision

Reason: the name convention of namespace is meant to be compatible with existing code and consistent with function names within namespaces.

  • CamelCase namespace can nice be used with doSomething() like: NameSpace::doSomething()
  • lower case namespace names may look inconsistent, such as name_space_1::doSomething()
  • many existing namespaces in ROSE already follow CamelCase, as shown at link

[Note] Leo: I believe this should be more discussed with ROSE Compiler Framework/ROSE API.

Types[edit]

MUST be in mixed case starting with an uppercase letter, as in SavingsAccount

Variables[edit]

  • Length: variables with a large scope should have long names, variables with a small scope can have short names
  • Temporary variables used for temporary storage (e.g. loop indices) are best kept short. A programmer reading such variables should be able to assume that its value is not used outside of a few lines of code. Common scratch variables for integers are i, j, k, m, n. Optionally, you can use ii, jj, kk, mm, and nn, which are easier to highlight when looking for indexing bugs.
  • Case: camelCase--mixed case starting with lowercase letter, as in functionDecl
    • Variables are purposely to start with lowercase letter as compared to upper case letter for Types. So it is clear by looking at the first letter to know if a name is a variable or a type.

Booleans[edit]

Negated boolean variable names must be avoided. The problem arises when such a name is used in conjunction with the logical negation operator as this results in a double negative. It is not immediately apparent what !isNotFound means.

bool isError; // NOT: isNoError
bool isFound; // NOT: isNotFound

Collections[edit]

Plural form should be used on names representing a collection of objects. This enhances readability since the name gives the user an immediate clue as to the type of the variable and the operations that can be performed on its elements.

For example,

vector<Point> points;
int values[];

Constants[edit]

Named constants (including enumeration values): MUST be all uppercase using underscore to separate words.

For example:

int MAX_ITERATIONS, COLOR_RED;
double PI;

In general, the use of such constants should be minimized. In many cases implementing the value as a method is a better choice:

int getMaxIterations() // NOT: MAX_ITERATIONS = 25
{
    return 25;
}

Generic[edit]

Generic variables should have the same name as their type. This reduces complexity by reducing the number of terms and names used. Also makes it easy to deduce the type given a variable name only. If for some reason this convention doesn't seem to fit it is a strong indication that the type name is badly chosen.

void setTopic(Topic* topic) // NOT: void setTopic(Topic* value)
                            // NOT: void setTopic(Topic* aTopic)
                            // NOT: void setTopic(Topic* t) 
 
void connect(Database* database) // NOT: void connect(Database* db)
                                 // NOT: void connect (Database* oracleDB)

Non-generic variables have a role. These variables can often be named by combining role and type:

Point  startingPoint, centerPoint;
Name   loginName;

Globals[edit]

Must always be fully qualified, using the scope-resolution operator ::.

For example, ::mainWindow.open() and ::applicationContext.getName()

In general, the use of global variables should be avoided. Instead,

  • Place variable into a namespace
  • Use singleton objects

Private class variables[edit]

Private class variables should have underscore suffix. Apart from its name and its type, the scope of a variable is its most important feature. Indicating class scope by using underscore makes it easy to distinguish class variables from local scratch variables.

For example,

class SomeClass {
  private:
    int length_;
}

An issue is whether the underscore should be added as a prefix or as a suffix. Both practices are commonly used, but the latter is recommended because it seem to best preserve the readability of the name. A side effect of the underscore naming convention is that it nicely resolves the problem of finding reasonable variable names for setter methods and constructors:

  void setDepth (int depth)
  {
    depth_ = depth;
  }

Methods and Functions[edit]

Names representing methods or functions: MUST be verbs and written in mixed case starting with lower case to indicate what they return and procedures (void methods) after what they do.

  • e.g. getName(), computeTotalWidth(), isEmpty()

A method name should avoid duplicated object name.

  • e.g. line.getLength(); // NOT: line.getLineLength();

The latter seems natural in the class declaration, but proves superfluous in use, as shown in the example.

The terms get and set must be used where an attribute is accessed directly.

  • e.g: employee.getName(); employee.setName(name); matrix.getElement(2, 4); matrix.setElement(2, 4, value);

The term compute can be used in methods where something is computed.

  • e.g: valueSet->computeAverage(); matrix->computeInverse()

Give the reader the immediate clue that this is a potentially time-consuming operation, and if used repeatedly, he might consider caching the result. Consistent use of the term enhances readability.

The term find can be used in methods where something is looked up.

  • e.g.: vertex.findNearestVertex(); matrix.findMinElement();

Give the reader the immediate clue that this is a simple look up method with a minimum of computations involved. Consistent use of the term enhances readability.

The term initialize can be used where an object or a concept is established.

  • e.g: printer.initializeFontSet();

The american initialize should be preferred over the English initialise. Abbreviation init should be avoided.

The prefix is should be used for boolean variables and methods.

  • e.g: isSet, isVisible, isFinished, isFound, isOpen

There are a few alternatives to the is prefix that fit better in some situations. These are the has, can and should prefixes:

  • bool hasLicense();
  • bool canEvaluate();
  • bool shouldSort();

Parameters should be separated by a single space character, with no leading or trailing spaces in the parameters list:

  • YES: void foo(int x, int y)
  • NO: void foo ( int x,int y )

Directories[edit]

Naming Convention[edit]

List of common names

  • src: to put source files, headers
  • include: to put headers if you have many headers and don't want to put them all into ./src
  • tests: put test inputs
  • docs: detailed documentation not covered by README

Please use camelCase for your directory name.

  • you should avoid leading Capitalization

Examples of preferred names

  • roseExtensions
  • roseSupport
  • roseAPI

What to avoid

  • rose_api
  • rose_support

Layout[edit]

TODO: big picture about where to put things within the ROSE git repository.


For each project directory under ./projects, it is our convention to have subdirectories for different files

  • README: must have this
  • ./src: for all your source files
  • ./include: for all your headers if you don't want to put them all into ./src
  • ./tests: for your test input files
  • ./doc: for your more extensive documentation if README is not enough

Files[edit]

A single file should contain one logical unit, or feature. Keep it modular!

Naming Conventions[edit]

A file name should be specific and descriptive about what it contains.

You should use camelCase (lowercase character in the beginning)

  • good example: fileName.h

What should be avoided

  • start with capitalization,
  • bad example using underscore: file_name.h

Bad file name

  • functions.h
  • file_name.h

References

Line Length[edit]

  • File content should be kept within 80 columns.

80 columns is a common dimension for editors, terminal emulators, printers and debuggers, and files that are shared between several people should keep within these constraints. It improves readability when unintentional line breaks are avoided when passing a file between programmers. If you write a tutorial with more than 80 columns it is likely to not fit on the page. This effectively makes the tutorial useless without having to go into the code base itself.

Indentation[edit]

Avoid tabs for your code indentation, except in cases where tabs (\t) are required, e.g. Makefiles.

2 or 4 spaces is recommended for code indentation.

for (i = 0; i < nElements; i++) 
  a[i] = 0;

Indentation of 1 is too small to emphasize the logical layout of the code. Indentation larger than 4 makes deeply nested code difficult to read and increases the chance that the lines must be split.

Characters[edit]

  • Special characters like TAB and page break must be avoided.

These characters are bound to cause problem for editors, printers, terminal emulators or debuggers when used in a multi-programmer, multi-platform environment.

We already have a built-in perl script to enforce this policy.

Header Files[edit]

File name:

  • must be camelCase: such as fileName.h or fileName.hpp
  • avoid file_name.h

Suffix

  • For C header files: Use .h
  • For C++ header files: Use .h or .hpp

Must have

  • protected preprocesssing directives to prevent the header from being included more than once, example
#ifndef _HEADER_FILE_X_H_
#define _HEADER_FILE_X_H_

#endif //_HEADER_FILE_X_H_
  • try to put your variables, functions, classes within a descriptive namespace.
  • Include statements must be located at the top of a file only.
    • Avoid unwanted compilation side effects by "hidden" include statements deep into a source file.

What to avoid

  • global variables, functions, or classes ; // they will pollute the global scope
  • using namespace std;
    • this will pollute the global scope for each .cpp file which includes this header. using namespace should only be used by .cpp files. More explanations are at link and link2
  • function definitions


References:

Source Files[edit]

Again, file names should follow the name convention

  • camelCase file name: e.g. sageInterface.cpp
  • Avoid capitalization, spaces, special characters

Preferred suffix

  • Use .c for C source files
  • Use .cpp or .cxx for C++ source files

What to avoid

  • capitalized .C for source files. This will cause some issue when porting ROSE to case-insensitive file systems.

References

README[edit]

All major directories within ROSE git repository should have a README file

  • projects/projectXYZ MUST have a README file.

File name should be README

what to avoid

  • README.txt
  • readme

Required Content[edit]

For all major directories in ROSE, there should be a README explaining

  • What is in this directory
  • What does this directory accomplish
  • Who added it and when

Each project directory must have a README to explain:

  • What this project is about
    • Name of the project
    • Motivation: Why do we have this project
    • Goal: What do we want to achieve
  • Design/Implementation: So next person can quickly catch up and contribute to this project
    • How do we design/implement it.
    • What is the major algorithm
  • Brief instructions about how to use the project
    • Installation
    • Testing
    • Or point out where to find the complete documentation
  • Status
    • What works
    • What doesn't work
  • Known limitations
  • References and citations: for the underlying algorithms
  • Authors and Dates

Format[edit]

Format of README

  • text format with clear sections and bullets
  • optionally, you can use styles defined by w:Markdown

Examples[edit]

An example README can be found at

Source Code Documentation[edit]

The source code of ROSE is documented using the Doxygen documentation system.

General Guidelines[edit]

  • English only
  • Use valid Doxygen syntax (see "Examples" below)
  • Make the code readable for a person who reads your code for the first time:
    • Document key concepts, algorithms, and functionalities
    • Cover your project, file, class/namespace, functions, and variables.
    • State your input and output clearly, specifically the meaning of the input or output
      • Users are more likely to use your code if they don't have to think about what the output means or what the input should be
    • Clever is often synonymous with obfuscated, avoid this form of cleverness in coding.

TODO, not ready yet

  • Test your documentation by generating it on your machine and then manually inspecting it to confirm its correctness

TODO: Generating Local Documentation

This does not work sometimes since we have a configuration file to indicate which directories to be scanned to generate the web reference html files

  $ make doxygen_docs -C ${ROSE_BUILD}/docs/Rose/

Use //TODO[edit]

This is a recommended way to improve your code's comments.

While doing incremental development, it is often to have something you decide to do in the next iterations or you know your current implementation/functions have some limitations to be fixed in the future.

A good way is to immediately put a TODO source comments (// TODO blar blar ..) into the relevant code when you make such kind of decisions so you won't forget here is something you want to do next time.

The TODOs also serve as some handy flags within the code for other people if they want to improve your work after you are gone.

Examples[edit]

Single Line[edit]

Often a brief single line comment is enough

//! Brief description.

Multiple lines[edit]

Doxygen supports comments with multiple lines.

/**
 
   ... text..
 
 */
 
/**
 *
 *  ... text..
 *
 */
 
 
/*******************************//**
 *         text
*********************************/
 
/////////////////////////////////////
///  ... text <= 80 columns in length
//////////////////////////////////////

Combined single line and multiple lines[edit]

Doxygen can generate a brief comment for a function and optionally show detailed comments if users click on the function.

Here are the options to support combined single-line and multiple-line source comments.

Option 1:

/**
 * \brief Brief description.
 *        Brief description continued.
 *
 * [Optional detailed description starts here.]
 */

Option 2:

/**
 \brief Brief description.
        Brief description continued.
 
 [Optional detailed description starts here.]
 */

---

Single line comment followed by multiple line comments':

You may extend an existing single line comment with a multiple line comments (Option 1 or 2). For example:

//! Brief description.
/**
 * Detailed description starts here.
 */


TODO: provide a full, combined example.

Functions[edit]

Rules

  • Except for simple functions like getXX() and setXX(), all other functions should have at least one line comment to explain what it does
  • Avoid global functions and global variables. Try to put them into a namespace.
  • A function should not have more than 100 lines of code. Please refactor big functions into smaller, separated functions.
  • Limit the unconditional printf() so your translator will not print hundreds lines of unnecessary text output when processing multiple input files
    • Use an if condition to control printf() for debugging purposes such as " if ( SgProject::get_verbose() > 0 ) "
  • The beginning part of the function should try to do sanity check for the function parameters.

Comments[edit]

Rules

  • Please follow Doxygen style comments
  • Please explain in sufficient detail how your function works and the steps in the algorithm.
    • Reviewers will read your commented information to understand your algorithm and then read your code to see if the code implements the algorithm correctly and efficiently.

Coding[edit]

Correctly implement the designed/documented algorithms. Future users won't have time to read your code directly to discern what it does.

Code should be efficient in terms of both time and space (memory) complexity.

Please be aware that your translator may handle thousands of statements with even more AST nodes.

Be aware that people other than you may use your code or develop it further. Please make this as easy as possible.

Classes[edit]

Try to use namespace when possible, avoid global variables or classes.

Name Equals Functionality[edit]

Name the class after what it is. If you can't think of what it is that is a clue you have not thought through the design well enough.

  • A class name should be a noun.

Compound names of over three words are a clue your design may be confusing various entities in your system. Revisit your design. Try a CRC card session to see if your objects have more responsibilities than they should.

Explicit Access[edit]

All sections (public, protected, private) should be identified explicitly. Not applicable sections should be left out.

Public Members First[edit]

The parts of a class should be sorted public, protected and private.

The ordering is "most public first" so people who only wish to use the class can stop reading when they reach the protected/private sections.

Class Variables[edit]

Class variables should NOT be declared public.

The concept of C++ information hiding and encapsulation is violated by public variables. Use private variables and access functions instead. One exception to this rule is when the class is essentially a data structure, with no behavior (equivalent to a C struct). In this case it is appropriate to make the class' instance variables public.

Avoid Structs[edit]

Structs are kept in C++ for compatibility with C only, and avoiding them increases the readability of the code by reducing the number of constructs used. Use a class instead.

Statements[edit]

Loops[edit]

Only loop control statements may be included in the for() construction, nothing else is allowed.

//Correct
sum = 0; 
for (i = 0; i < 100; i++) 
  sum += value[i]; sum += value[i];

//Incorrect
 for (i = 0, sum = 0; i < 100; i++) 

This increases maintainability and readability. It also allows future developers to make a clear distinction of what controls and what is contained in the loop.

Loop variables should be initialized immediately before the loop.

Type Conversions[edit]

Type conversions must always be done explicitly. Never rely on implicit type conversion.

  //Correct
  floatValue = static_cast<float>(intValue); 
  //Incorrect 
  floatValue = intValue;

By this, the programmer indicates that he is aware of the different types involved and that the mix is intentional.

Conditionals[edit]

The body of a conditional must be put on a separate line.

 if (isDone) 
 // NOT: if (isDone) doCleanup(); doCleanup();

This is for debugging purposes. When writing on a single line, it is not apparent whether the test is really true or not.

There must be a space separating the keyword if from the condition statement (isDone).

if (isDone)
  ^ space

Complex conditional expressions must be avoided. You must introduce temporary boolean variables instead

//recommended way
bool isFinished = (elementNo < 0) || (elementNo > maxElement); 
bool isRepeatedEntry = elementNo == lastElement; 
if (isFinished || isRepeatedEntry) { : } 

// NOT: if ((elementNo < 0) || (elementNo > maxElement)|| elementNo == lastElement) { : }

By assigning boolean variables to expressions, the program gets automatic documentation. The construction will be easier to read, debug and maintain. When the variables are well named, it also helps future developers understand what each part of the construction is accomplishing.

printf and cout[edit]

All screen output MUST be put into a if statement to be conditionally executed, either via verbose level or other debugging option.

They MUST not print out information by default.

TODO: this can be enforced by a simple Compass checker in the future.

switch[edit]

Carefully differentiate

  • things which are known to be allowed to ignore and
  • things which are not yet handled by the current implementation.
  switch(type->variantT())
 {
    case V_SgTypeDouble:
      {
        ...
      }
      break;
    case V_SgTypeInt:
      {
        ...
      }
      break;
   case V_SgTypeFloat: // things which are known to be allowed to be ignored.
      break;
   default:
    {
     //Things which are not yet explicitly handled
      cerr<<"warning, unhandled node type: "<< type->class_name()<<endl;
    }

assert[edit]

It is encouraged to use assert often to explicitly express and guarantee assumptions used in the code.

Please use ROSE_ASSERT() or assert().

For each occurrence of assertion, you MUST add a printf or cerr message to indicate where in the code and what goes wrong so users can immediately know the cause of the assertion failure, without going through a debugger to find out what went wrong.

Statements To Be Avoided[edit]

The following statements should usually be avoided:

  • Goto statements should not be used. Goto statements violate the idea of structured code. There are very few cases (for instance breaking out of deeply nested structures) where goto should be considered, and only if the equivalent structured counterpart is less readable.
  • Executable statements in conditionals should be avoided. Conditionals with executable statements are very difficult to read.
  File* fileHandle = open(fileName, "w"); 
  if (!fileHandle) { : } 
  // NOT: if (!(fileHandle = open(fileName, "w"))) { : }

Expressions[edit]

Guidelines for readability, simplicity and debuggability.

  • Ternary operators (?:) should be replaced with if/else.
  • Long expressions should be broken up into several simpler statements. Add assertion for each pointer value obtained along the process to assist later debugging.
  • Clever use of operator precedence, shortcut evaluation, assignment expressions, etc. should be rewritten to easy-to-understand alternative forms.
  • Always remember that future programmers will appreciate clear and simple code rather than obfuscated cleverness.

AST Translators[edit]

All ROSE-based translators should call AstTests::runAllTests(project) after all the transformation is done to make sure the translated AST is correct.

This has a higher standard than just correctly unparsed to compilable code. It is common for an AST to go through unparsing correctly but fail on the sanity check.

More information is at Sanity_check

References[edit]

We list some external resources which are influential for us to define ROSE's coding standard

Code Review Process[edit]

Code review using rose-github.llnl.gov
Connection between github and Jenkins

Please note: the URL of the internal github has changed! It is now https://rose-github.llnl.gov/, instead of https://github.llnl.gov/ .

Motivation[edit]

Without code review, developers have:

  • added unreadable contributions which do not conform to any consistent coding styles.
  • added undocumented contributions which cannot be understood by anybody else(essentially useless contributions).
  • added untested contributions (codes without accompanying tests) so the contributions do not work as expected or can be easily broken by other conflicting contributions (another essentially less useful contributions)
  • disabled tests to subvert our stringent Jenkins CI regression tests
  • added files into wrong directories, with improper names
  • committed hundreds of reformatted files
  • re-invented the wheel by implementing features that already exist
  • added 160MB MPI trace files into the git repository

See Phabricator's "Advantages of Review" document (a Facebook project).

Goals[edit]

Our primary goals for code reviewing ROSE are to:

  • share knowledge about the code: coder + reviewer will know the code, instead of just the coder
  • group-study: learn through studying other peoples' code
  • enforce policies for consistent usability and maintainability of ROSE code: documented and tested
  • avoid reinventing the wheel and eliminating unnecessary redundancy
  • safe-guarding the code: disallowing subversive attempts to disable or remove regression tests

Software[edit]

We are currently testing Github Enterprise and looking into the possibility of leveraging Redmine for internal code review.

In the past, we have looked at Google's Gerrit code review system.

Github[edit]

Releases: https://enterprise.github.com/releases

Support: https://support.enterprise.github.com

rosebot[edit]

(Under development)

An automated pull request analyzer to perform various tasks:

  • Automatically add reviewers to Pull Requests based on hierarchical configuration
  • "Pre-receive hook" analyses: file sizes, quantity of files, proprietary source, etc.
  • more...

Developer Checklist[edit]

Read these tips and guidelines before sending a request for code review.

Coding Standards[edit]

Please go to Coding Standard for the complete guideline. Here we only summary some key points.

Your code should be written in a way that makes it easily maintainable and reviewable:

  • write easy to understand code; avoid using exotic techniques which nobody can easily understand.
  • add sufficient documentation (source-code comments, README, etc.) to aid the understandability of your code, your documentation should cover
    • why do you do this (motivation)
    • how do you do it (design and/or algorithm)
    • where are the associated tests (works as expected)
  • before submission of your code for review, make sure
    • you have merged with the latest central repository's master branch without conflicts
    • your working copy can pass local tests via: make, make check, and make distcheck
    • you have fixed all compiler warnings of your code whenever possible
  • submit a logical unit of work (one or more commits); something coherent like a bug fix, an improvement of documentation, an intermediate stage for reaching a big new feature.
  • balance code submissions with a good ratio of [lines of code] and [complexity of code]. A good balance needs to be achieved to make the reviewer's life easier.
    • the time needed to review your code should not exceed 1 hour. Please avoid pushing thousands of lines at a time.
    • Please also avoid pushing any trivial (fixed a typo, commented out a single line etc.) to be reviewed.

One time setup[edit]

Steps for initializing code review:

1. Login to http://rose-github.llnl.gov using your OUN and PAC.

2. Fork your own clone of the ROSE repository from http://rose-github.llnl.gov/rose-compiler/rose.

3. Add Collaborators:

  • Go to http://rose-github.llnl.gov/<your_account>/rose
    • Click Admin
    • Click Collaborators
      • Add candidate code reviewers: liao6, too1. These developers will review and merge your work.
      • Add admins: hudson-rose. This user will automatically synchronize your master branch with /nfs/casc/overture/ROSE/git/ROSE.git:master.

4. Create your public-private SSH key pair using ssh-keygen, and add the public key to your rose-github.llnl.gov account. Refer to Generating SSH Keys or use a public key tat you already have. (rose-github.llnl.gov only supports the SSH protocol for now; HTTPS is not yet supported.)

5. Configure Auto-syncs: Contact the Jenkins administrator (too1 and liao6) to have your repository added to a white-list of repositories to be synced whenever new commits are integrated into ROSE's official master branch.

6. Setup polling job: Contact the Jenkins administrator (too1 and liao6) to have your Github repository polled for new changes on the master branch. When new changes are detected, your master branch will be pushed to the central repository (and added to the Jenkins testing queue) as <oun>-reviewd-rc.

Daily work process[edit]

  • have a local git repo to do your work and submit local commits, you have two choices:
    • clone it from /nfs/casc/overture/rose/rose.git as we usually do before
    • clone your fork on rose-github.llnl.gov to a local repo (only HTTPS is supported via LC)

Note: You may encounter SSL certificate problems. If you do, simply disable SSL verification in cURL using either export GIT_SSL_NO_VERIFY=false or configuring git:

$ git config --global http.sslVerify false
    • don't use branches, use separated git repositories for each of your tasks. So status/progress of one task won't interfere with other tasks.
  • When ready to push your commits, synchronize with the latest rose-compiler/master to resolve merge conflicts, etc.
    • type: git pull origin master # this should always work since master branches on rose-github.llnl.gov are automatically kept up-to-date
    • make sure your local changes can pass 1)make -j8, 2)make check -j8, and 3)make distcheck -j8
  • push your commits to your fork's non-master branch, (like bugfix-rc , featurex-rc, work-status, etc.) You have total freedom in creating any branches in your forked repo, with any names you like
  # If your local repository was cloned from /nfs/casc/overture/ROSE/rose.git. 
  # There is no need to discard it. You can just add the rose-github.llnl's repo as an additional remote repository and push things there:
  git remote add github-llnl-youraccount-rose http://rose-github.llnl.gov/youraccount/rose.git
  git push github-llnl-youraccount-rose HEAD:refs/heads/bugfix-rc
    • It is encouraged to push your work to a remote branch with a -status suffix, which will trigger a pre-screening Jenkins Job: http://hudson-rose-30:8080/view/Status/job/S0-pre-screening-before-code-review/. This is often useful to make sure your pushes can pass a minimum make check rules, including your own, before reviewers spend time on reading your code. Reviewers can also see both your code and your code's actions.
  • add a pull(merge) request to merge bugfix-rc into your own fork's master,
    • please note that the default pull request will use rose-compiler/rose's master as the base branch (destination of the merge). Please change it to be your own fork's master branch instead.
    • Also make sure the source (head) branch of the pull (merge) request is the one your want (bugfix-rc in this example)
    • Double check the diff tab of your pull request only shows the differences you made, without other things brought in from the central repo. Or your own repo's master is out-of-sync with the central repo's master. Notify system admin (too1) for the problem or manually fix it using the troubleshooting section of this page.
  • notify a reviewer that you have a pull request (requesting to merge your bugfix-rc into your master branch)
    • You can assign the pull request to the reviewer so an email notification will be automatically sent to the reviewer
    • Or you can add discussion within the pull request using @revieweraccount. NOTE: please only click "Comment on this issue" once and manually refresh the web page. Github Enterprise has a bug so it cannot automatically shown the newly added comment. bug79
    • Or you can just email the reviewer
  • waiting for reviewer's feedback:

Review results[edit]

  • Completion and Submission To Jenkins
    • If your code passes code review, the reviewer should have merged your bugfix-rc into your master. Jenkins will automatically poll your master and do the testing/merging
  • How To Make Changes
    • To implement changes make local edits, local commits, push to your remote branch, and send a merge request again
  • Taking Code Review Seriously
    • Remember code review is not an attack on you as a person. The purpose of code review is to allow a colleague to evaluate your code. This can take a reasonable amount of time, so respect their efforts and seriously look at your code anew.
    • Look through the reviewer comments and address them or comment the purpose of the code as it stands and wait for response
    • Some comments are mandatory changes, these must be addressed before you will pass code review
    • Some comments are suggestions. You should think about their suggestions carefully. If the reviewer suggests something you should form a rationalization for the difference or consider the implications of changing your code. ROSE is a team effort, we must take our colleagues seriously.
    • DO NOT CLOSE' the pull request. You can push your new commits to the same branch again and comment on the pull request to indicate there are new updates. Please review them again. This will avoid unnecessary repetition.

Benefits of Code Review[edit]

  • Avoiding coding a feature that is already present.
    • Remember you are coding for a user, and we must try our best to write clear code
    • Code coherency is extremely important in a large project. Coherent code allows the user to spent his or her time on their project rather than trying to find an answer in the doxygen page and finding seven or eight ways to do the same thing without knowing the consequences of the different approaches
  • Coding As A Team
    • If every coder hid away and coded by himself without regard to the features ROSE already has, it not only can confuse users but developers as well.
    • At a glance, the ROSE source directory weighs in at almost 1 GB. The compilation directory after make, make check, make install, make installcheck, make distcheck comes in at 19G. Lesson: ROSE is large and the chance of someone knowing everything about ROSE and its functionality is rather slim
    • As a team the size is quite large but manageable so long as the work each person does on their particular section and asks about possible feature duplication and code readability

Reviewer Checklist[edit]

What to look out for as a code reviewer?

  • Be familiar with the current Coding Standard as a general guideline to perform the code review.
  • Allocate up to 1 hour at a time to review approximately 500-1000 lines of code: a longer time may not pay off due to the attention span limits of human brains

What to check[edit]

Six major things to check:

  • Documentation: What are the commits about? Is this reflected in README, source comments, or LaTex files?
  • Style: Does the coding style follow our standard? Is the code clean, robust, and maintainable?
  • Interface: Does the code has a clean and simple interface to be used by users?
  • Algorithm: Does the code have sufficient comments about what algorithm is used? Is the algorithm correct and efficient (space and time complexity)?
  • Implementation: Does the code correctly implement the documented algorithm(s)?
  • Testing: Does the code have the accompanying test translator and input test codes to ensure the contributions do what they are supposed to do?
    • Is Jenkins being configured to trigger these tests (your work may require new pre-requisite software or configure options)? Local tests on developer's workstation do not count.

More details, quick summary from Coding Standard

  • Naming conventions: File and directory names follow our standards; clear and intuitive
    • Directory structure: source code, test code, and documentation files are added into the correct locations
  • Maintainability: clarity of code; can somebody who did not write the code easily understand what the code does?
    • No looong functions: a function with hundreds of lines of code is a no-no
    • Architecture/design: the reasons and motivations for writing the code, and its design.
  • No duplication: similar code may already exist or can be extended
  • Re-use: can part of the code be refactored to be reusable by others?
  • Unit tests: make check rules are associated with each new feature to ensure the new feature will be tested and verified for expected behaviors
  • Sanity: no turning off, or relaxing, other tests to make the developer's commits pass Jenkins. In other words, no cheating.

Commenting[edit]

Reviewer comments should be clearly delimited into these three well-defined sections:

1. Mandatory: the details of the comment must be implemented in a new commit and added to the Pull Request before the code review can be completed.

2. Recommended: the details of the comment could represent a best-practice or, simply, it could be intended to provide some insight to the developer that they may have not thought about.

Both Mandatory and Recommended can be accompanied by the keyword Nitpick:

3. Nitpick: the details of the comment represent a fix that usually involves a spelling/grammatical or coding style correction. The main purpose of the nitpick indication is to let the developer know that you're not trying to be on their case and make their life difficult, but an error is an error, or there's a better way to do something.

Decisions[edit]

Make a clear and definitive decision for the code review:

  • Pass: The code does what it is supposed to do with clear documentation and test cases. Merge and close the pull request.
  • Pass but with future tasks. The commits are accepted. But some additional tasks are needed in the future to improve the code. They can be put into a separate set of commits and pushed later on.
  • Fail. Additional work is needed, such as better naming, better places to put files, more source comments, add regression tests, etc. Notify the developers of the issues and ask for a new set of commits to be pushed addressing the corrections or improvements.

Giving negative feedback[edit]

We directly quote from http://www.mediawiki.org/wiki/Code_review_guide#Giving_negative_feedback

" Here are a few guidelines in the event you need to reject someone's submission or ask them to clean up their work:

  1. Focus your comments on the code and any objectively-observed behavior, not motivations; for example, don't state or imply assumptions about motivating factors like whether the developer was just too lazy or stupid to do things right.
  2. Be empathetic and kind. Recognize that the developer has probably put a lot of work in their idea, and thank them for their contribution if you feel comfortable and sincere in doing so (and try to muster the comfort and sincerity). Most importantly, put yourself in their shoes, and say something that indicates you've done so.
  3. Help them schedule their work. If their idea is a "not yet" kind of idea, try to recommend the best way you know of to get their idea on a backlog (i.e. the backlog most likely to eventually get revisited).
  4. Let them know where they can appeal your decision. For example, if the contributor doesn't have a history of being disruptive or dense, invite them to discuss the issue on wikitech-l.
  5. Be clear. Don't sugarcoat things so much that the central message is obscured.
  6. Most importantly, give the feedback quickly. While tactful is better (and you should learn from past mistakes), you can always apologize for a poorly-delivered comment with a quick followup. Don't just leave negative feedback to someone else or hope they aren't persistent enough to make their contribution stick."

Who should review what[edit]

Ideally, every ROSE contributor should participate in code review as a reviewer at some point so the benefits of peer-review can fully be fulfilled.

However, due to the limited access to our internal github enterprise server, we currently have a centralized review process in which ROSE staff members (liao6, too1) serve as the default code reviewers. They are responsible for either reviewing the code themselves or delegate to other developers who either has better knowledge about the contributions or should be aware of the contributions.

We am actively looking at better options and will gradually expand the pool of reviewers so the reviewing step won't become a bottleneck.

TODO: use rosebot to automatically assign reviewers according to a hierarchical configuration of the source-tree.

What to avoid[edit]

  • Judging code by whether it's what the reviewer would have written
    • Given a problem, there are usually a dozen different ways to solve it. And given a solution, there's a million ways to render it as code.
  • degenerating into nitpicks:
    • perfectionism may hurt the progress. we should allow some non-critical improvements to be done in the next version/commits.
  • feel obligated to say something critical: it is perfectly fine to say "looks good, pass"
  • delay in review: we should not rush it but we should keep in mind that somebody is waiting for the review to be done to move forward

Criticism[edit]

Code reviews often degenerate into nitpicks. Brainstorming and design reviews to be more productive.

  • This makes sense, the early we catch the problems, the better. Design happens earlier. Design should be reviewed. The same idea applies to requirement analysis also.
  • To mitigate this risk, we now have rules for design document in our coding standard.

Troubleshooting[edit]

master is out-of-sync[edit]

The master branch of each developer's git repository (http://rose-github.llnl.gov) should be automatically synchronized with the central git repository's master branch (/nfs/casc/overture/ROSE/git/ROSE.git). In rare cases, it could be out-of-sync. Here is an example to perform a manual synchronization:

1. Clone your Github repository:

$ cd ~/Development/projects/rose
$ git clone git@github.com:<user_oun>/rose.git
Cloning into ROSE...
remote: Counting objects: 216579, done.
remote: Compressing objects: 100% (55675/55675), done.
remote: Total 216579 (delta 159850), reused 211131 (delta 155786)
Receiving objects: 100% (216579/216579), 296.41 MiB | 35.65 MiB/s, done.
Resolving deltas: 100% (159850/159850), done.

2. Add the central repository as a remote repository:

$ git remote add central /nfs/casc/overture/ROSE/git/ROSE.git
$ git fetch central
From /nfs/casc/overture/ROSE/git/ROSE.git
 * [new branch]      master     -> central/master
 ...

3. Push the central master branch to your Github's master branch:

-bash-3.2$ git push central central/master:refs/heads/master
Total 0 (delta 0), reused 0 (delta 0)
To git@rose-github.llnl.gov:<user_oun>/rose.git
   16101fd..563b510  central/master -> master

master cannot be synchronized[edit]

In rare cases, your repository's master branch cannot be automatically synchronized. This is most likely due to merge conflicts. You will receive an error message through an automated email, resembling the following (last updated on 7/24/2012):

To git@rose-github.llnl.gov:lin32/rose.git
! [rejected]        origin/master -> master (non-fast forward)
error: failed to push some refs to 'git@rose-github.llnl.gov:lin32/rose.git'

---

Your master branch at [rose-github.llnl.gov:lin32/rose.git] cannot be automatically updated with [/nfs/casc/overture/ROSE/git/ROSE.git:master]

Please manually force the update:

Add the central repository as a remote, call it "nfs":

  $ git remote add nfs /nfs/casc/overture/ROSE/git/ROSE.git

1. First, try to manually perform a merge in your local repository:

  # 1. Checkout and update your Github's master branch
  $ git checkout master
  $ git pull origin master

  # 2. Merge the central master into your local master
  $ git pull nfs master
  <no merge conflicts>

  # 3. Synchronize your local master to your Github's master
  $ git push origin HEAD:refs/head/master

2. Otherwise, try to resolve the conflict.

3. Finally, if all else fails, force the synchronization:

  $ git push --force origin nfs/master:refs/heads/master

  WARNING: your master branch on Github will be overriden so make sure
  you have sufficient backups, and take precaution.

Please simply follow the email's instructions to force the update of your Github's master branch.

Past Software Experience[edit]

In the past, we have experimented with other code review tools:

Gerrit (Google)[edit]

In short:

  • Gerrit's user interface is not user-friendly (it's complex and therefore, more confusing). This is true, when compared to Github's Pull Request mechanism for code review.
  • Gerrit's remote API was not mature enough to handle our workflow. Additionally, we had to hack several things in order to slightly suit our needs. On the other hand, Github has a great remote API which is easily accessible through Ruby scripting, a very popular language for the domain of web interfaces and development.
  • Gerrit is not as popular as Github, which is important for our project to gain traction. Also, more people are familiar with Github so it makes it easier for them to use.

TODO[edit]

  • TOP-PRIORITY: add pre-screening Jenkins job before manual code review kicks in
  • Research, install, and test Facebook's Phabricator: http://phabricator.org/

Connection to Jenkins[edit]

See Continuous_Integration#Connection_to_Code_Review

References[edit]

Continuous Integration[edit]

ROSE Continuous integration using Git and Jenkins (Code Review Omitted for simpler explanation)

Motivation[edit]

Without automated continuous integration, we had frequent incidents like:

  • Developer A commits something to our central git repository's master branch. The commits contain some bugs which break our build and take a long time to have a fix. Then the central master branch is left to a corrupted state for weeks so nobody can check out/in anything.
  • Developer A does a lot of wonderful work offline for months. But his work later is found to be incompatible with another developer's work. His work has unsolvable merge conflicts.

Overview[edit]

The ROSE project uses a workflow that automates the central principles of continuous integration in order to make integrating the work from different developers a non-event. Because the integration process only integrates with ROSE the changes that passes all tests we encourage all developers to stay in sync with the latest version.

For now, there are three instances of Jenkins

A high level overview of the development model used by ROSE developers.

  • Step 1: Taking advantage of the distributed source code repositories based on git, each developer should first clone his/her own repository from our central git repository (or its mirrors/clones/forks).
  • Step 2: Then a feature or a bugfix can be developed in isolation within the private repository. He can create any number of private branches. Each branch should relate to a feature that this developer is working on and be relatively short-lived. The developer can commit changes to the private repository without maintaining an active connection to the shared repository.
  • Step 3: When work is finished and locally tested (make, make check, and make distcheck -j#n), he can pull the latest commits from the central repo's master branch
  • Step 4: He then can push all accumulated commits within the private repository to his branch within the shared repository. We create a dedicated branch within the central repository for each developer and establish access control of the branch so only an authorized developer can push commits to a particular branch of the shared repository.
  • Step 5-6 (automated): Any commits from a developer’s private repository will not be immediately merged to the master branch of the shared repository.

In fact, we have access control to prevent any developer from pushing commits to the master branch within the shared repository. A continuous integration server called Jenkins is actively monitoring each developer’s branch within the central repository and will initiate comprehensive commit tests upon the branch once new commits are detected. Finally, Jenkins will merge the new commits to the master branch of the central repository if all tests pass. If a single test fails, Jenkins will report the error and the responsible developer should address the error in his private repository and push improved commits again.

As a result, the master branch of the central git repository is mostly stable and can be a good candidate for our external release. On top of the master branch of the central git repository, we further have more comprehensive release tests in Jenkins. If all the release tests pass, an external release based on the master branch will be made available outside.

Tests on Jenkins[edit]

We use Jenkins ( http://hudson-rose-30:8080/ ) to test commits added to developer's release candidate branches at the central git repository.

The tests are organized into three categories

  • Integration: tests used to check if the new commits can pass various "make check" rules, compatibility tests, portability tests, configuration tests, and so on. If all tests pass, the commits will be merged (or integrated) into the master branch of the central repository.
  • Release: tests used to test the updated master branch of the central repository for additional set of tests using external benchmarks. If all tests pass, the head of the master will be released as a stable snapshot for public file package releases(generated by "make dist").
  • Others: for informational purpose now, not being used in our production workflow.

So for each push (one or more commits to a -rc branch), it will go through two stages: Integration test and Release test stage.

It is each developer's responsibility to make sure their commits can pass BOTH stage by fixing any bugs discovered by the tests.


status test using latest master branch[edit]

Example status jobs:

Installed Software Packages[edit]

Here we list software packages installed and used by Jenkins

  • Yices: /export/tmp.hudson-rose/opt/yices/1.0.34

Check Testing Results[edit]

It is possible to manually tracking down how you commits are doing within the test pipeline within Jenkins (http://hudson-rose-30:8080/). But it can be tedious and overwhelming.

So we provide a dashboard ( http://sealavender:4000/) to summarize the commits to your release candidate branch(-rc) and the pass/fail status for each integration tests.

Note: It's possible that all of your testing jobs (finally) pass, but the actual integration is not performed. This typically occurs when one of your jobs have a system failure, for instance, so it has to be manually re-started. If you see that all of your jobs have passed, but your work has not been integrated, please let the Jenkins administrator know.

Frequently Failed Jobs[edit]

See details at ROSE Compiler Framework/Jenkins Failures

Connection to Code Review[edit]

Connection between Github Enterprise and Jenkins

In reality, most LLNL developers are now asked to push things to Github Enterprise for code review first instead of directly pushing to our central git repository. The synchronization between the Github Enterprise's code review repositories and our Central Git repo are automated.

Auto Pull[edit]

Auto pull: we have another Jenkins at (https://hudson-rose-30:8443/jenkins/) which serves as the bridge between Github Enterprise and our main production Jenkins.

  • For each private repositories on Github Enterprise, we have a Jenkins job to monitor the master branch for approved pull (merge) request. If there is any new approved commits, the job will transfer the commits to the central repository's -reviewed-rc branch for that developer.

Configuration of the auto pull job:

  • Source code management
    • git: git@github.llnl.gov:account_name/rose.git
    • branches to be build: github/master
  • Build Trigger: Poll SCM , schedule "* * * * *"
  • Execution shell
##
## Add /nfs as remote
##
## `|| true`: don't error if remote exists
##
git remote add nfs /nfs/casc/overture/ROSE/git/ROSE.git || true
git fetch nfs

##
## Push to /nfs *-rc
##
if [ -n "$(git log --oneline nfs/master..github/master)" ]; then
  git push --force nfs "$GIT_BRANCH":refs/heads/oun-reviewed-rc
fi

Auto Push[edit]

Auto push: A Jenkins job is responsible for propagating latest central master contents to all private repositories on github.llnl.gov

The Job configuration

  • source Code Management:
    • Git: /nfs/casc/overture/ROSE/git/ROSE.git
    • Branches to build: */master
  • Build Trigger: Build after other projects are built: Commit
  • Execute Shell
USERS="\
user1\
user2
"

for user in $USERS; do
  tmpfile="$(mktemp)"
  ( git push git@github.llnl.gov:"$user"/rose.git origin/master:refs/heads/master 2>"$tmpfile" ) || true
  set +e
  cat "$tmpfile"
  cat "$tmpfile" | grep -q "non-fast.*forward"
  if [ $? -eq 0 ]; then
    echo "Sending error email to [${user}@llnl.gov] because their github/master is non-fast-forwardable"
    # email details are omitted here.
  fi
done

TODO[edit]

High priority

  • Add a pre-screening job before manual code review kicks in. the pre-screening job can make sure the code to be reviewed will be compiled with minimum warning messages and with required make check rules to run tests.
  • enable email notification for the final results of each test:
  • incrementally add more compilation tests using external benchmarks to be integration tests.
    • Initial two jobs: spec cpu benchmark + NPB Fortran benchmarks
  • Better integration with Github Enterprise

Third Party software installed for testing in Jenkins.

  • Yices (http://yices.csl.sri.com/)
    • Download Yices1, the lasted version is better.
    • untar the tarball package of yices, then it is YICES_INSTALL, which is name like yices-1.0.34
    • Type --with-yices=YICES_INSTALL with ROSE/configure option
    • setup YICES_INSTALL/lib into LD_LIBRARY_PATH for Linux and DYLD_LIBRARY_PATH for mac users, it is like add Boost/lib into LD_LIBRARY_PATH

References[edit]

  • Files used to generate the figure: feel free to add new versions as new slides: link

Frequently Asked Questions (FAQ)[edit]

We collect a list of frequently asked questions about ROSE, mostly from the rose-public mailing list link

General[edit]

How to search rose-public mailinglist for previously asked questions?[edit]

Use the following command on google search
site:https://mailman.nersc.gov/pipermail/rose-public $(ADD_YOUR_SEARCH_TERM_HERE)

Why can't ROSE staff members answer all my questions?[edit]

It can feel very frustrating when you get no responses to your questions submitted to the rose-public@nersc.gov mailing list. You may wonder why the ROSE staff cannot help neither sometimes.

Here are some possible excuses:

  • They are just as busy as everybody else in the research and development fields. They may be working around the clock to meet deadlines for proposals, papers, project reviews, deliverables, etc.
  • They don't know every corner of their own compiler, given the breadth and depth of contributions made to ROSE by collaborators, former staff members, post-docs, and interns. Moreover, most contributions lack good documentation--something that should be remedied in the future.
  • Some questions are simply difficult and open research and development questions. They may have no clue, either.
  • They just feel lazy sometimes or are taking a thing called vacation.

Possible alternatives to have your questions answered and your problems solved in a timely fashion:

  • Please do you own homework first (e.g. Google).
  • The ROSE team is actively addressing the documentation problem, through an internal code review process to enforce well-documented contributions going forward.
  • Help others to help yourself. Answer questions on the rose-public@nersc.gov mailing list and contribute to this community-editable Wikibook.
  • Find ways to formally collaborate with, or fund, the ROSE team. Things go faster when money is flowing :-) Sad, but true, reality in this busy world.

How many lines of source code does ROSE have?[edit]

Excluding the EDG submodule and all source code comments, the core of ROSE (rose/src) has about 674,000 lines of C/C++ source code as of July 11, 2012.

Including tests, projects, and tutorial directories, ROSE has about 2 Million lines of code.

Some details are shown below:

[rose/src]./cloc-1.56.pl .
    3076 text files.
    2871 unique files.                                          
     716 files ignored.

http://cloc.sourceforge.net v 1.56  T=26.0 s (91.7 files/s, 39573.3 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C++                            908          75280          93960         354636
C                              123          12010           3717         199087
C/C++ Header                   915          28302          38412         121373
Bourne Shell                    17           3346           4347          25326
Perl                             4            743           1078           7888
Java                            18           1999           4517           7096
m4                               1            747             20           6489
Python                          34           1984           1174           5363
make                           148           1682           1071           3666
C#                              11            899            274           2546
SQL                              1              0              0           1817
Pascal                           5            650             31           1779
CMake                          168           1748           4880           1702
yacc                             3            352            186           1544
Visual Basic                     6            228            421           1180
Ruby                            11            281            181            809
Teamcenter def                   3              3              0            606
lex                              2            103             47            331
CSS                              1             95             32            314
Fortran 90                       1             34              6            244
Tcl/Tk                           2             29              6            212
HTML                             1              8              0             15
-------------------------------------------------------------------------------
SUM:                          2383         130523         154360         744023
-------------------------------------------------------------------------------

How large is ROSE?[edit]

To show top level information only (in MB): du -msl * | sort -nr

170     tests
109     projects
90      src
19      docs
16      winspecific
16      ROSE_ResearchPapers
15      binaries
7       scripts
5       LicenseInformation
4       tutorial
4       autom4te.cache
2       libltdl
2       exampleTranslators
2       configure
2       config
2       ChangeLog

Sort directories by their sizes in MegaBytes

 du -m | sort -nr >~/size.txt
709     .
250     ./.git
245     ./.git/objects
243     ./.git/objects/pack
170     ./tests
109     ./projects
90      ./src
76      ./tests/CompileTests
50      ./tests/RunTests
40      ./tests/RunTests/FortranTests
34      ./tests/RunTests/FortranTests/LANL_POP
29      ./tests/RunTests/FortranTests/LANL_POP/netcdf-4.1.1
27      ./src/3rdPartyLibraries
23      ./tests/roseTests
23      ./src/frontend
22      ./tests/CompileTests/Fortran_tests
21      ./tests/CompilerOptionsTests
19      ./docs
18      ./tests/CompileTests/RoseExample_tests
18      ./src/midend
18      ./docs/Rose
16      ./winspecific
16      ./ROSE_ResearchPapers
15      ./tests/CompileTests/Fortran_tests/gfortranTestSuite
15      ./binaries/samples
15      ./binaries
14      ./tests/CompileTests/Fortran_tests/gfortranTestSuite/gfortran.dg
14      ./src/roseExtensions
11      ./projects/traceAnalysis
10      ./tests/CompileTests/A++Code
10      ./tests/CompilerOptionsTests/testCpreprocessorOption
10      ./tests/CompilerOptionsTests/A++Code
10      ./src/roseExtensions/qtWidgets
10      ./src/frontend/Disassemblers
10      ./projects/symbolicAnalysisFramework
10      ./projects/SATIrE
10      ./projects/compass
9       ./winspecific/MSVS_ROSE
9       ./tests/RunTests/A++Tests
9       ./tests/roseTests/binaryTests
9       ./src/frontend/SageIII
9       ./projects/symbolicAnalysisFramework/src
9       ./docs/Rose/powerpoints
8       ./winspecific/MSVS_project_ROSETTA_empty
8       ./projects/simulator
7       ./tests/RunTests/FortranTests/LANL_POP_OLD
7       ./tests/CompileTests/Cxx_tests
7       ./src/midend/programTransformation
7       ./src/midend/programAnalysis
7       ./src/3rdPartyLibraries/libharu-2.1.0
7       ./scripts
7       ./projects/symbolicAnalysisFramework/src/mpiAnal
7       ./projects/RTC
6       ./winspecific/MSVS_ROSE/Debug
6       ./tests/RunTests/FortranTests/LANL_POP/netcdf-4.1.1/ncdap_test
6       ./tests/roseTests/programAnalysisTests
6       ./src/3rdPartyLibraries/ckpt
6       ./src/3rdPartyLibraries/antlr-jars
6       ./projects/SATIrE/src
5       ./tests/RunTests/FortranTests/LANL_POP/pop-distro
5       ./tests/RunTests/FortranTests/LANL_POP/netcdf-4.1.1/libcf
5       ./tests/CompileTests/ElsaTestCases
5       ./src/ROSETTA
5       ./src/3rdPartyLibraries/qrose
5       ./projects/DatalogAnalysis
5       ./projects/backstroke
5       ./LicenseInformation
5       ./docs/Rose/AstProcessing

To list files based on size

 find . -type f -print0 | xargs -0 ls -s | sort -k1,1rn
241568 ./.git/objects/pack/pack-f366503d291fc33cb201781e641d688390e7f309.pack
13484 ./tests/CompileTests/RoseExample_tests/Cxx_Grammar.h
10240 ./projects/traceAnalysis/vmp-hw-part.trace
6324 ./tests/RunTests/FortranTests/LANL_POP_OLD/poptest.tgz
5828 ./winspecific/MSVS_ROSE/Debug/MSVS_ROSETTA.pdb
4732 ./.git/objects/pack/pack-f366503d291fc33cb201781e641d688390e7f309.idx
4488 ./binaries/samples/bgl-helloworld-mpicc
4488 ./binaries/samples/bgl-helloworld-mpixlc
4080 ./LicenseInformation/edison_group.pdf
3968 ./projects/RTC/tags
3952 ./src/frontend/Disassemblers/x86-InstructionSetReference-NZ.pdf
3908 ./tests/CompileTests/RoseExample_tests/trial_Cxx_Grammar.C
3572 ./winspecific/MSVS_project_ROSETTA_empty/MSVS_project_ROSETTA_empty.ncb
3424 ./src/frontend/Disassemblers/x86-InstructionSetReference-AM.pdf
2868 ./.git/index
2864 ./projects/compassDistribution/COMPASS_SUBMIT.tar.gz
2864 ./projects/COMPASS_SUBMIT.tar.gz
2740 ./ROSE_ResearchPapers/2007-CommunicatingSoftwareArchitectureUsingAUnifiedSingle-ViewVisualization-ICECC
S.pdf
2592 ./docs/Rose/powerpoints/rose_compiler_users.pptx
2428 ./src/3rdPartyLibraries/ckpt/wrapckpt.c
2408 ./projects/DatalogAnalysis/jars/weka.jar
2220 ./scripts/graph.tar
1900 ./src/3rdPartyLibraries/antlr-jars/antlr-3.3-complete.jar
1884 ./src/3rdPartyLibraries/antlr-jars/antlr-3.2.jar
1848 ./src/midend/programTransformation/ompLowering/run_me_defs.inc
1772 ./src/3rdPartyLibraries/qrose/docs/QROSE.pdf
1732 ./tests/CompileTests/Cxx_tests/longFile.C
1724 ./src/midend/programTransformation/ompLowering/run_me_task_defs.inc
1656 ./ChangeLog
1548 ./tests/roseTests/binaryTests/yicesSemanticsExe.ans
1548 ./tests/roseTests/binaryTests/yicesSemanticsLib.ans
1480 ./ROSE_ResearchPapers/1997-ExpressionTemplatePerformanceIssues-IPPS.pdf
1408 ./docs/Rose/powerpoints/ExaCT_AllHands_March2012_ROSE.pptx

...

Compilation[edit]

Cannot download the EDG binary tar ball[edit]

Three possible reasons

  • the website hosting EDG binaries is down (there is a manual way to get the binary)
  • we don't support the platform you use so there is no EDG binary is available for you.
  • you cloned your rose from an un-official repo so the build process cannot figure out the right version of EDG binary for you. (there is a solution mentioned below)

It is possible that the rosecompiler.org website is down for maintenance.

So you may encounter the following error message:

make[3]: Entering directory `/home/leo/workspace/github-rose/buildtree/src/frontend/CxxFrontend' test -d /nfs/casc/overture/ROSE/git/ROSE_EDG_Binaries && cp /nfs/casc/overture/ROSE/git/ROSE_EDG_Binaries/roseBinaryEDG-3-3-i686-pc-linux-gnu-GNU-4.4-32fe4e698c2e4a90dba3ee5533951d4c.tar.gz . || wget http://www.rosecompiler.org/edg_binaries/roseBinaryEDG-3-3-i686-pc-linux-gnu-GNU-4.4-32fe4e698c2e4a90dba3ee5533951d4c.tar.gz --2012-08-05 12:58:29-- http://www.rosecompiler.org/edg_binaries/roseBinaryEDG-3-3-i686-pc-linux-gnu-GNU-4.4-32fe4e698c2e4a90dba3ee5533951d4c.tar.gz Resolving www.rosecompiler.org... 128.55.6.204 Connecting to www.rosecompiler.org|128.55.6.204|:80... failed: No route to host. make[3]: *** [roseBinaryEDG-3-3-i686-pc-linux-gnu-GNU-4.4-32fe4e698c2e4a90dba3ee5533951d4c.tar.gz] Error 4

In this case, you should ask for the missing tar ball or find it on our backup location

You don't have to clone the entire edge binary repo since it is big. You can just download the one you need (click raw file link on github.com).

Once you get the bar ball, copy it to your build tree's CxxFrontend subdirectory:

  • buildtree/src/frontend/CxxFrontend

Then you should be able to normally build rose by typing make.

TODO: automate the search using the alternative path to obtain edg binary

Another possible reason is that you cloned your local rose repo from an unofficial repository.

  • In order to maintain the correct matching between rose source and EDG binary, we require a canonical repository to be available.

make[3]: Leaving directory `/global/project/projectdirs/rosecompiler/rose-project-workspace/xomp-instr/buildtree/src/frontend/CxxFrontend/Clang'
Unable to find a remote tracking a canonical repository.  Please add a
canonical repository as a remote and ensure it is up to date.  Currently
configured remotes are:

   origin => git@xxx.com/myrose.git

Potential canonical repositories include:

   anything ending with "rose.git" (case insensitive)
Unable to find a remote tracking a canonical repository.  Please add a
canonical repository as a remote and ensure it is up to date.  Currently
configured remotes are:

   origin => git@xxx.com/myrose.git

Potential canonical repositories include:

   anything ending with "rose.git" (case insensitive)
make[3]: Entering directory `/global/project/projectdirs/rosecompiler/rose-project-workspace/xomp-instr/buildtree/src/frontend/CxxFrontend'
test -d /nfs/casc/overture/ROSE/git/ROSE_EDG_Binaries && cp /nfs/casc/overture/ROSE/git/ROSE_EDG_Binaries/roseBinaryEDG-3-3-x86_64-pc-linux-gnu-GNU-4.3-.tar.gz  . || wget http://www.rosecompiler.org/edg_binaries/roseBinaryEDG-3-3-x86_64-pc-linux-gnu-GNU-4.3-.tar.gz
--2013-02-15 17:26:42--  http://www.rosecompiler.org/edg_binaries/roseBinaryEDG-3-3-x86_64-pc-linux-gnu-GNU-4.3-.tar.gz
Resolving www.rosecompiler.org... 128.55.6.204
Connecting to www.rosecompiler.org|128.55.6.204|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2013-02-15 17:26:42 ERROR 404: Not Found.

make[3]: *** [roseBinaryEDG-3-3-x86_64-pc-linux-gnu-GNU-4.3-.tar.gz] Error 1
make[3]: Leaving directory `/global/project/projectdirs/rosecompiler/rose-project-workspace/xomp-instr/buildtree/src/frontend/CxxFrontend'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/global/project/projectdirs/rosecompiler/rose-project-workspace/xomp-instr/buildtree/src/frontend/CxxFrontend'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/global/project/projectdirs/rosecompiler/rose-project-workspace/xomp-instr/buildtree/src/frontend'
make: *** [all-recursive] Error 1
make: Leaving directory `/global/project/projectdirs/rosecompiler/rose-project-workspace/xomp-instr/buildtree/src'

Solution: add an official rose repo as an additional remote repo of your local repo

  • add a canonical repository, like the one at github: git add remote official-rose https://github.com/rose-compiler/rose.git
  • git fetch official-rose // to retrieve hash numbers etc in the canonical repository
  • Now you can build rose again. it should find the canonical repo you just added and use it to find a matching EDG binary

How to access EDG or EDG-SAGE connection code?[edit]

From page 5 of http://rosecompiler.org/ROSE_UserManual/ROSE-UserManual.pdf

The connection code that was used to translate EDG’s AST to SAGE III was derived loosely from the EDG C++ source generator and has formed the basis of the SAGE III translator from EDG to SAGE III’s IR.

Under the license we have, the EDG source code and the translation from the EDG AST in distributions are excluded from source release and are made available through a binary format. No part of the EDG work is visible to the user of ROSE. The EDG source are available only to those who have the EDG research or commercial license.

Chapter 2.6 "Getting a Free EDG License for Research Use" of the manual has instructions about how to obtain the EDG license.

Once you obtain the license, please contact the staff members of ROSE to verify your license. After that, they will give you more instructions about how to proceed.

How to speedup compiling ROSE?[edit]

Question It takes hours to compile ROSE, how can I speed up this process?

Answer:

  • if you have multi-core processors, try to use make -j4 (make by using four processes or even more if you like).
  • also try to only build librose.so under src/ by typing make -C src/ -j4
  • Or only try to build the language support you are interested in during configure, such as
    • ../sourcetree/configure --enable-only-c # if you are only interested in C/C++ support
    • ../sourcetree/configure --enable-only-fortran # if you are only interested in Fortran support
    • ../sourcetree/configure --help # show all other options to enable only a few languages.

Can ROSE accept incomplete code?[edit]

https://mailman.nersc.gov/pipermail/rose-public/2011-July/001015.html

ROSE does not handle incomplete code. Though this might be possible in the future. It would be language dependent and likely depend heavily on some of the language specific tools that we use internally. This is however, not really a priority for our work. If you want to for example demonstrate how some of the internal tools we are using or alternative tools that we could use might handle incomplete code, this might be interesting and we could discuss it.

For example, we are not presently using Clang, but if it handled incomplete code that might be interesting for the future. I recall that some of the latest EDG work might handle some incomplete code, and if that is true then that might be interesting as well. I have not attempted to handle incomplete code with OFP, so I am not sure how well that could be expected to work. Similarly, I don't know what the incomplete code handling capabilities of ECJ Java support is either. If you know any of these questions we could discuss this further.

I have some doubts about how much meaningful information can come from incomplete code analysis and so that would worry me a bit. I expect it is very language dependent and there would be likely some constraints on the incomplete code. So understanding the subject better would be an additional requirement for me.

Can ROSE analyze Linux Kernel sources?[edit]

https://mailman.nersc.gov/pipermail/rose-public/2011-April/000856.html

Question: I'm trying to analyze the Linux kernel. I was not sure of the size of the code-base that can be handled by ROSE, and could not find references as to whether it has been tried on the Linux kernel source. As of now I'm trying to run the identity translator on the source, and would like to know if it can be done using ROSE, and if it has been successfully tested before.

Short answer: Not for now

Long answer: We are using EDG 3.3 internally by default and this version of EDG does not handle the GNU specific register modifiers used in the asm() statements of the Linux Kernel code. There might be other problems, but that was at least the one that we noticed in previous work on this some time ago. But we are working on upgrading the EDG frontend to be a more recent version 4.4.

Can ROSE compile C++ Boost library?[edit]

https://mailman.nersc.gov/pipermail/rose-public/2010-November/000544.html

not yet.

I know of a few cases where ROSE can't handle parts of Boost. In each case it is an EDG problem where we are using an older version of EDG. We are trying to upgrade to a newer version of EDG (4.x), but that version's use within ROSE does not include enough C++ support, so it is not ready. The C support is internally tested, but we need more time to work on this.

AST[edit]

How to find XYZ in AST?[edit]

The usually steps to retrieve information from AST are:

  • prepare a simplest (preferrably 5-10 lines only), compilable sample code with the code feature you want to find (e.g array[i][j] if you are curious about how to find use of multi-dimensional arrays in AST), avoid including any headers (#include file.h) to keep the code small.
    • Please note: don't include any headers in the sample code. A header (#include <stdio.h> for example) can bring in thousands of nodes into AST.
  • use dotGeneratorWholeASTGraph to generate a detailed AST dot graph of the input code
  • use zgrviewer-0.8.2's run.sh to visulize the dot graph
  • visually/manually locate the information you want in the dot graph, understand what to look and where to look
  • use code (AST member functions, traversal, SageInteface functions, etc) to retrieve the information.

How to filter out header files from AST traversals?[edit]

https://mailman.nersc.gov/pipermail/rose-public/2010-April/000144.html

Question: I want to exclude functions in #include files from my analysis/transformations during my processing.

By default, AST traversal may visit all AST nodes, including the ones come from headers.

So AST processing classes provide three functions :

  • T traverse (SgNode * node, ..): traverse full AST , nodes which represent code from include files
  • T traverseInputFiles(SgProject* projectNode,..) traverse the subtree of AST which represents the files specified on the command line
  • T traverseWithinFile(SgNode* node,..): only the nodes which represent code of the same file as the start node

Should SgIfStmt::get_true_body() return SgBasicBlock?[edit]

https://mailman.nersc.gov/pipermail/rose-public/2011-April/000930.html

Both true/false bodies were SgBasicBlock before.

Later, we decided to have more faithful representation of both blocked (with {...}) and single-statement (without { ..} ) bodies. So they are SgStatement (SgBasicBlock is a subclass of SgStatement) now.

But it seems like the document has not been updated to be consistent with the change.

You have to check if the body is a block or a single statement in your code. Or you can use the following function to ensure all bodies must be SgBasicBlock.

//A wrapper of all ensureBasicBlockAs*() above to ensure the parent of s is a scope statement with list of statements as children, otherwise generate a SgBasicBlock in between.

SgLocatedNode * SageInterface::ensureBasicBlockAsParent (SgStatement *s)

How to handle #include "header.h", #if, #define etc. ?[edit]

It is called preprocessing info. within ROSE's AST. They are attached before, after, or within a nearby AST node (only the one with source location information.)

An example translator is provided to traverse the input code's AST and dump information about the found preprocessing information. The source code of this translator is https://github.com/rose-compiler/rose/blob/master/exampleTranslators/defaultTranslator/preprocessingInfoDumper.C .

To use the translator:

buildtree/exampleTranslators/defaultTranslator/preprocessingInfoDumper -c main.cxx
-----------------------------------------------
Found an IR node with preprocessing Info attached:
(memory address: 0x2b7e1852c7d0 Sage type: SgFunctionDeclaration) in file
/export/tmp.liao6/workspace/userSupport/main.cxx (line 3 column 1)
-------------PreprocessingInfo #0 ----------- :
classification = CpreprocessorIncludeDeclaration:
  String format = #include "all_headers.h"

relative position is = before

SgClassDeclaration::get_definition() returns NULL?[edit]

If you look at the whole AST graph carefully, you can find defining and non-defining declarations for the same class.

A symbol is usually associated with a non-defining declaration. A class definition is associated with a defining declaration.

You may want to get the defining declaration from the non-defining declaration before you try to grab the definition.

How to add new AST nodes?[edit]

There is a section named "1.7 Adding New SAGE III IR Nodes (Developers Only)" in ROSE Developer’s Guide (http://www.rosecompiler.org/ROSE_DeveloperInstructions.pdf)

But before you decide adding new nodes, you may consider if AstAttribute (user defined objects attached to AST) would be sufficient for your problem.

For example, the 1st version of the OpenMP implementation in ROSE (rose/projects/OpenMP_Translator) started by using AstAttribute to represent information parsed from pragmas. Only in the 2nd version we introduced dedicated AST nodes.

There are two separate steps when new kinds of IR nodes are added into ROSE:

  • First step (declaration): Adding class declaration/implementation into ROSE for the new IR nodes. This step is mostly related to ROSETTA.
  • Second step (creation): Creating those new IR nodes at some point: such as somewhere within frontend, midend, or even backend if desired. So this step is decided case by case.

If the new types of IR come from their counterparts in EDG, then modifications to the EDG/SAGE connection code are needed. If not, the EDG/SAGE connection code may be irrelevant.

If you are trying to add new nodes to represent pragma information, you can create your new nodes without involving EDG or its connection to ROSE. You just parse the pragma string in the original AST and create your own nodes to get a new version of AST. Then it should be done.

How does the AST merge work?[edit]

tests that demonstrate the AST Merge are in the directory:

    tests/CompileTests/mergeAST_tests

(run "make check" to see hundreds of tests go by).

parent vs. scope[edit]

An AST node can have a parent node which is different from the its scope.

For example: the struct declaration's parent is the typedef declaration. But the struct's scope is the scope of the typedef declaration.

typedef struct frame {int x;} s_frame;

Parsing text into AST[edit]

There is some experimental support to parse simple code text into AST pieces. It is not intended to parse entire source codes. But the support should be able to be extended to handle more types of input.

Some documentation about this work:

Example project using the parser building blocks

  • projects/pragmaParsing should work.

Translation[edit]

Can ROSE identityTranslator generate 100% identical output file?[edit]

https://mailman.nersc.gov/pipermail/rose-public/2011-January/000604.html

Questions: Rose identityTranslator performs some modifications, "automatically".

These modifications are:

  • Expanding the assert macro.
  • Adding extra brackets around constants of typedef types (e.g. c=Typedef_Example(12); is translated in the output to c = Typedef_Example((12));)
  • Converting NULL to 0.

How can I avoid these modifications?

Answer: No.

There is no easy way to avoid these changes currently. Some of them are introduced by the cpp preprocessor. Others are introduced by the EDG front end ROSE uses. 100% faithful source-to-source translation may require significant changes to preprocessing directive handling and the EDG internals.

We have had some internal discussion to save raw token strings into AST and use them to get faithful unparsed code. But this effort is still at its initial stage as far as I know.

How to build a tool inserting function calls?[edit]

https://mailman.nersc.gov/pipermail/rose-public/2010-July/000319.html

Question: I am trying to build a tool which insert one or more function calls whenever in the source code there is a function belonging to a certain group (e.g. all functions beginning with foo_*). During the ast traversal, how can I find the right place, i.e., there is a function in ROSE that searches for a string pattern or something similar?

Answers:

  • In Chapter 28 AST Construction of the ROSE tutorial, there are examples to instrument function calls into the AST using traversals or a queryTree. I would approach this by checking the node for the specific SgFunctionDefinition (or whatever you need) and then check the name of the node to find its location.
  • You can
    • use the AST query mechanism to find all functions and store them in a container. e.g Rose_STL_Container<SgNode*> nodeList = NodeQuery::querySubTree(root_node,V_Sg????);
    • Then iterate the container to check each function to see if the function name matches what you want.
    • use SageBuilder namespace's buildFunctionCallStmt() to create a function call statement.
    • use SageInterface namespace's insertStatement () to do the insertion.

How to insert a header into an input file?[edit]

There is an SageInterface function for doing this:

// Insert include "filename" or include <filename> (system header) into the global scope containing the current scope, right after other include XXX.
PreprocessingInfo *     SageInterface::insertHeader (const std::string &filename, PreprocessingInfo::RelativePositionType position=PreprocessingInfo::after, bool isSystemHeader=false, SgScopeStatement *scope=NULL) 

How to copy/clone a function?[edit]

https://mailman.nersc.gov/pipermail/rose-public/2011-April/000919.html

We need to be more specific about the function you want to copy. Is it just a prototype function declaration (non-defining declaration in ROSE's term ) or a function with a definition (defining declaration in ROSE's term)?

  • Copying a non-defining function declaration can be achieved by using the following function:
// Build a prototype for an existing function declaration (defining or nondefining is fine).
SgFunctionDeclaration* SageBuilder::buildNondefiningFunctionDeclaration (const SgFunctionDeclaration *funcdecl, SgScopeStatement *scope=NULL)
  • Copying a defining function declaration is semantically a problem since it introduces redefinition of the same function.

It is at least a hack to first introduce something wrong and later correct it. Here is an example translator to do the hack (copy a defining function, rename it, fix its symbol):


#include <rose.h>
#include <stdio.h>
using namespace SageInterface;

int main(int argc, char** argv)
{
  SgProject* project = frontend(argc, argv);
  AstTests::runAllTests(project);

// Find a defining function named "bar" under project

  SgFunctionDeclaration* func=
findDeclarationStatement<SgFunctionDeclaration> (project, "bar", NULL,
true);
  ROSE_ASSERT (func != NULL);

// Make a copy and set it to a new name
  SgFunctionDeclaration* func_copy =
isSgFunctionDeclaration(copyStatement (func));
  func_copy->set_name("bar_copy");

// Insert it to a scope
  SgGlobal * glb = getFirstGlobalScope(project);
  appendStatement (func_copy,glb);

#if 0  // fix up the missing symbol, this should be optional now since SageInterface::appendStatement() should handle it transparently. 
  SgFunctionSymbol *func_symbol =  glb->lookup_function_symbol
("bar_copy", func_copy->get_type());
  if (func_symbol == NULL);
  {
    func_symbol = new SgFunctionSymbol (func_copy);
    glb ->insert_symbol("bar_copy", func_symbol);
  }
#endif
  AstTests::runAllTests(project);
  backend(project);
  return 0;
}

ROSE's unparser checks for Sg_File_Info objects of AST pieces before it decides to print out text format of the AST pieces. Only the AST coming from the same file of the input file or AST generated by transformation should be unparsed by default. For example, some AST subtrees come from an included header. But it is often not desired to unparse the content of an included header.

If the file info is still the original file info, the solution is to set the copied AST to be transformation-generated:

// Recursively set source position info(Sg_File_Info) as transformation generated.
SageInterface::setSourcePositionForTransformation (SgNode *root) 

Can I transform code within a header file?[edit]

https://mailman.nersc.gov/pipermail/rose-public/2011-May/000971.html

No. ROSE does not unparse AST from headers right now. A summer project tried to do this. But it did not finish.

https://mailman.nersc.gov/pipermail/rose-public/2010-August/000344.html

I guess ROSE does not support writing out changed headers for safety/practical reasons. A changed header has to be saved to another file since writing to the original header is very dangerous (imaging debugging a header translator which corrupts input headers). Then all other files/headers using the changed header have to be updated to use the new header file.

Also all files involved have to be writable by user's translators.

As a result, the current unparser skips subtrees of AST from headers by checking file flags (compiler_generated and/or output_in_code_generation etc.) stored in Sg_File_Info objects.

How to work with formal and actual arguments of functions?[edit]

https://mailman.nersc.gov/pipermail/rose-public/2011-June/001008.html

     //Get the actual arguments
     SgExprListExp* actualArguments = NULL;
     if (isSgFunctionCallExp(callSite))
         actualArguments = isSgFunctionCallExp(callSite)->get_args();
     else if (isSgConstructorInitializer(callSite))
         actualArguments = isSgConstructorInitializer(callSite)->get_args();
     ROSE_ASSERT(actualArguments != NULL);

     const SgExpressionPtrList& actualArgList = 
actualArguments->get_expressions();

     //Get the formal arguments.
     SgInitializedNamePtrList formalArgList;
     if (calleeDef != NULL)
         formalArgList = calleeDef->get_declaration()->get_args();

     //The number of actual arguments can be less than the number of 
formal arguments (with implicit arguments) or greater
     //than the number of formal arguments (with varargs)

How to translate multiple files scattered in different directories of a project?[edit]

Expected behavior of a ROSE Translator:

A translator built using ROSE is designed to act like a compiler (gcc, g++,gfortran ,etc depending on the input file types). So users of the translator only need to change the build system for the input files to use the translator instead of the original compiler.

If the original compiler used by you implicitly include or link anything, you may have to make the include or linking paths explicit after the change. For example, if mpiCC transparently links to /path/to/mpilib.a, you have to add this linking flag into your modified Makefile.

On 07/25/2012 11:20 AM, Fernando Rannou wrote:
> > Hello
> >
> > We are trying to use ROSE to refactor  a big project consisting of
> > several  *.cc and *.hh files, located at various directories. Each
> > class is defined in a *.hh file and implemented in a *.cc file.
> > Classes include (#include) other class definitions. But we have only
> > found single file examples.
> >
> > Is this possible? If so, how?
> >
> >
> > Thanks

Unparsing[edit]

Generate code into different files[edit]

https://mailman.nersc.gov/pipermail/rose-public/2012-August/001742.html Question: I wonder is it possible for ROSE to generate two files (.c and .cl) when it translates C-to-OpenCL ?

Answer: The ROSE outliner has an option to output the generated function into a new file.

https://github.com/rose-compiler/rose/blob/master/src/midend/programTransformation/astOutlining/Outliner.hh

...
// Generate the outlined function into a separated new source file
// -rose:outline:new_file
extern bool useNewFile;
...

You may want to check how this option is used in the outliner source files to get what you want.

Binary Analysis[edit]

How is the binary analysis capability in ROSE?[edit]

Question: how is the binary analysis capability in ROSE? Is it just disassembly? is it possible to associate the binary code with the source if combined with ROSE source code analysis?

Answer:

ROSE has various binary disassemblers (x86, ARM, MIPS, PowerPC) that, like source code analysis, create an internal representation of the binary in the form of an AST. Although the types of AST nodes for source and binaries are largely disjoint, one can analyze the binary AST using concepts similar to source analysis. ROSE has a few binary analyses. Here are some off the top of my head:

  • Control flow graphs, both virtual and using Boost Graph Library.
  • Function call graphs.
  • Operations on control flow graphs: dominator, post-dominator
  • Pointer detection analysis that tries to figure out which memory locations are used as if they were pointers in a higher level language.
  • Instruction partitioning: figuring out how to group instructions into basic blocks, and how to group basic blocks into functions when all you have is a list of instructions. Its accuracy on automatically partitioning stripped, obfuscated code has been shown to be better than the best disassemblers that use debugging info and symbol tables.
  • Instruction semantics for x86. This is an area of active development but supports only 32-bit integer instructions. We plan to add floating point, SIMD, 64-bit, other architectures, and a simpler API. But even as it stands, it is complete enough to simulate entire ELF executables (even "vi"). See next bullet
  • An x86 simulator for ELF executables. This project is able to simulate how the Linux kernel loads an executable, and the various system calls made by the executable. It it complete enough to simulate many Linux programs, but also provides callback points for the user to insert various kinds of analyses. For instance, you could use it to disassemble an entire process after it has been dynamically linked. There are many examples in the projects/simulator directory. In contrast to simulators like Qemu, Bochs, valgrind, VirtualBox, VMware, etc. where speed is a primary design driver, the ROSE simulator is designed to provide user-level access to as many aspects of execution as possible.
  • Plugins for instruction semantics. Instruction semantics is written in such a way that different "semantic domains" can be plugged in. ROSE has a symbolic domain, an interval domain, and a partial-symbolic domain. The symbolic domain can be used in conjunction with an SMT solver (currently supporting Yices). The interval domain is actually sets of intervals, and is binary-arithmetic-aware (i.e., correctly handles overflows, etc on a fixed word size). The partial-symbolic domain uses single-node expressions in order to optimize for speed and size at the expense of accuracy. Users can and have written other domains, and a new API (in the works) will make this even easier.
  • Examples of data-flow analysis (e.g., the pointer analysis already mentioned), but not a well defined framework yet (someone is working on one). Currently, data-flow type analyses are implemented using the instruction semantics support: as each instruction is "executed" the domain in which it executes causes the data to flow in the machine state. Each analysis provides its own flow equation to handle the points where control flow joins from two or more directions; and provides its own "next-instruction" function to iterate over the control flow graph.
  • Clone detection of various formats: various forms of syntactic, including one using locality-sensitive hashing; and semantic clone detection via fuzz testing in a simulator.

By Robb


You ask about combining source and binary analysis... Its certainly possible since ROSE can hold both the binary and source ASTs in memory at the same time. But I'm not aware of any analysis that "sews" them together. We do support parsing DWARF info from ELF executables, so you might be able to use that to sew the two ASTs together.

--Robb

Daily work[edit]

git clone returns error: SSL certificate problem?[edit]

Symptom:

git clone https://github.com/rose-compiler/rose.git
Cloning into rose...
error: SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed while accessing https://github.com/rose-compiler/rose.git/info/refs

fatal: HTTP request failed

The reason may be that you are behind a firewall which tweaks the original SSL certification.

Solutions: Tell cURL to not check for SSL certificates:

#Solution 1: Environment variable (temporary)
      $ env GIT_SSL_NO_VERIFY=true git pull

# Solution 2: git-config (permanent)
      # set local configuration
      $ git config --local http.sslVerify false

# Solution 2:  set global configuration
      $ git config --global http.sslVerify false

What is the best IDE for ROSE developers?[edit]

https://mailman.nersc.gov/pipermail/rose-public/2010-April/000115.html

There may not be a widely recognized best integrated development environment. But developers have reported that they are using

  • vim
  • emacs
  • KDevelop
  • Source Navigator
  • Eclipse
  • Netbeans

The thing is that ROSE is huge and has some ridiculously large generated source file (CxxGrammar.h and CxxGrammar.C are generated in the build tree for example). So many code browsers may have trouble in handling ROSE.

Portability[edit]

What is the status for supporting Windows?[edit]

We do maintain some preliminary Windows Support of building ROSE/src to generate librose.so by leveraging cmake. However, the work is not finished.

To build librose under windows, type the following command lines in the top level source tree

 mkdir ROSE-build-cmake
 cd ROSE-build-cmake
 cmake .. -DBOOST_ROOT=${ROSE_TEST_BOOST_PATH}  // Example: boost installation path /opt/boost_1_40_0-inst

https://mailman.nersc.gov/pipermail/rose-public/2011-December/001349.html

We have not finished the Windows work yet. IT is on our list of things to do. It was started and ROSE internally compiles using MS Visual Studio (using project files generated from the Cmake build that we maintain and test within our release process for ROSE) but does not pass our tests. So it is not ready. The distribution of the EDG binaries for Windows is another step that would come after that. We don't know at present when this will be done, it is important, but not a high priority for our DOE specific work, but important for other work. The effort required is something that we could discuss. If you want to call me that would be the best way to proceed. Send me email off of the main list and we can set that up.

https://mailman.nersc.gov/pipermail/rose-public/2011-March/000798.html

Under Windows ROSE uses CMake. This is a project that is currently under development. As of November 2010 we are able to compile and link the src directory. We are also able to run example programs that link against librose and execute the frontend and backend. {\em However, this is an internal capability and not available externally yet since we don't distribute the Windows generated EDG binaries that would be required. Also the current support for Windows is still incomplete, ROSE does not yet pass its internal tests under Windows.}

How-tos[edit]

Quick, short, and focused tutorials about how to do common tasks as a ROSE developer.

Please create a new wikibook page for each how-to topic. Each how-to wiki page should NOT contain any level one (=) or level two(==) heading so it can be included at the correct levels in the print version of this wikibook.

How to write a How-to[edit]

Quick, short, and focused tutorials about how to do common tasks as a ROSE developer. Please create a new wikibook page for each how-to topic. Each how-to wiki page should NOT contain any level one (=) or level two(==) heading so it can be included at the correct levels in the print version of this wikibook.

Create a new page[edit]

==[[ROSE Compiler Framework/How to write a How-to|How to write a How-to]]==
{{:ROSE Compiler Framework/How to write a How-to}}
  • rename three places of the pasted text with the desired page name, for example
==[[ROSE Compiler Framework/How to do XYZ|How to do XYZ]]==
{{:ROSE Compiler Framework/How to do XYZ}}
  • click save page
  • You will see red text trying to link to the not yet existing How to do XYZ page
  • click any of the red text, it will bring you to an editing window to add content of your new how-to page
  • you can now add new content and save it.
    • Again, each how-to wiki page should NOT contain any level one (=) or level two(==) heading so it can be included at the correct levels in the print version of this wikibook.

Insert image to wiki page[edit]

Rules of the content[edit]

  • Only level three headings (===) and higher are allowed in a how-to page. This is necessary for the how-to page to be correctly included into the final one-page print version of this wikibook. Sorry about this restriction.
    • Again, please don't use level one (=) or level two (==) headings in a how-to page!
  • Keep each how-to short and focused. Readers are expected to only spend 30-minutes or much less to quickly learn how to do something using ROSE.
  • After you created a new how-to page and saved your contributions. Please go to the print version to make sure it shows up correctly.
  • please specify the how-to topic is the current practice or the proposed new ways of doing things. So we can have clear guideline for code review for what is mandatory and what is optional.

How to incrementally work on a project[edit]

Developing a big, sophisticated project entails many challenges. To mitigate some of these challenges, we have adopted several best practices: incremental development, code review, and continuous integration.

Here are some tips on how to divide up a big project into smaller, bite-sized pieces so each piece can be incrementally developed, code reviewed, and integrated.

  • Input: define different sets of test inputs based on complexity and difficulty. Tackle simpler sets first.
  • Output: define intermediate results leading to the final output. Often, results A and B are needed to generate C. So the project can have multiple stages, based on the intermediate results.
  • Algorithm: complex compiler algorithms are often just enhanced versions of more fundamental algorithms. Implement the fundamental algorithms first to gain insight and experience. Then, afterward, you can implement the full-blown versions.
  • Language: for projects dealing with multiple languages, focus on one language at a time.
  • Platform: limit the scope of supported platforms: Linux, Ubuntu, OS X (TODO: add reference to ROSE supported platforms)
  • Performance: Start with a basic, working implementation first. Then try to optimize its performance, efficiency.
  • Scope: your translator could first focus on working at a function scope, then grow to handle an entire source file, or even multiple files, at the same time.
  • Skeleton then meat: a project should be created with the major components defined first. Each component can be enriched separately later on.
  • Annotations (manual vs. automated): Performing one compiler task often requires results from many other tasks being developed. Defining source code annotations as the interface between two tasks can decouple these dependencies in a clean manner. The annotations can be first manually inserted. Later the annotations can be automatically generated by the finished analysis.
  • Optional vs. Default: introducing a flag to turn on/off your feature. Make it as a default option when it matures.

How to visualize AST[edit]

Overview[edit]

Three things are needed to visualize ROSE AST:

  • Sample input code: you provide it
  • a dot graph generator to generate a dot file from AST: ROSE provides dot graph generators
  • a visualization tool to open the dot graph: ZGRViewer and Graphviz are used by ROSE developers

If you don't want to install ROSE+ZGRview + Graphvis from scratch, you can directly use ROSE virtual machine image, which has everything you need installed and configured so you can just visualize your sample code.

Sample input code[edit]

Please prepare simplest input code without including any headers so you can get a small enough AST to digest.

Dot Graph Generator[edit]

We provide ROSE_INSTALLATION_TREE/bin/dotGeneratorWholeASTGraph (complex graph) and dotGenerator (a simpler version) to generate a dot graph of the detailed AST of input code.

Tools to generate AST graph in dot format. There are two versions

  • dotGenerator: simple AST graph generator showing essential nodes and edges
  • dotGeneratorWholeASTGraph: whole AST graph showing more details. It provides filter options to show/hide certain AST information.

command line:

 dotGeneratorWholeASTGraph  yourcode.c  // it is best to avoid include any header into your sample code to have a small enough tree to visualize!
dotGeneratorWholeASTGraph --help
   -rose:help                     show this help message
   -rose:dotgraph:asmFileFormatFilter           [0|1]  Disable or enable asmFileFormat filter
   -rose:dotgraph:asmTypeFilter                 [0|1]  Disable or enable asmType filter
   -rose:dotgraph:binaryExecutableFormatFilter  [0|1]  Disable or enable binaryExecutableFormat filter
   -rose:dotgraph:commentAndDirectiveFilter     [0|1]  Disable or enable commentAndDirective filter
   -rose:dotgraph:ctorInitializerListFilter     [0|1]  Disable or enable ctorInitializerList filter
   -rose:dotgraph:defaultFilter                 [0|1]  Disable or enable default filter
   -rose:dotgraph:defaultColorFilter            [0|1]  Disable or enable defaultColor filter
   -rose:dotgraph:edgeFilter                    [0|1]  Disable or enable edge filter
   -rose:dotgraph:expressionFilter              [0|1]  Disable or enable expression filter
   -rose:dotgraph:fileInfoFilter                [0|1]  Disable or enable fileInfo filter
   -rose:dotgraph:frontendCompatibilityFilter   [0|1]  Disable or enable frontendCompatibility filter
   -rose:dotgraph:symbolFilter                  [0|1]  Disable or enable symbol filter
   -rose:dotgraph:emptySymbolTableFilter        [0|1]  Disable or enable emptySymbolTable filter
   -rose:dotgraph:typeFilter                    [0|1]  Disable or enable type filter
   -rose:dotgraph:variableDeclarationFilter     [0|1]  Disable or enable variableDeclaration filter
   -rose:dotgraph:variableDefinitionFilter      [0|1]  Disable or enable variableDefinitionFilter filter
   -rose:dotgraph:noFilter                      [0|1]  Disable or enable no filtering
Current filter flags' values are: 
         m_asmFileFormat = 0 
         m_asmType = 0 
         m_binaryExecutableFormat = 0 
         m_commentAndDirective = 1 
         m_ctorInitializer = 0 
         m_default = 1 
         m_defaultColor = 1 
         m_edge = 1 
         m_emptySymbolTable = 0 
         m_expression = 0 
         m_fileInfo = 1 
         m_frontendCompatibility = 0 
         m_symbol = 0 
         m_type = 0 
         m_variableDeclaration = 0 
         m_variableDefinition = 0 
         m_noFilter = 0 


Dot Graph Visualization[edit]

To visualize the generated dot graph, you have to install

Please note that you have to configure ZGRViewer to have correct paths to some commands it uses. You can do it from its configuration/setting menu item. Or directly modify the text configuration file (.zgrviewer).

One example configuration is shown below (cat .zgrviewer)

<?xml version="1.0" encoding="UTF-8"?>
<zgrv:config xmlns:zgrv="http://zvtm.sourceforge.net/zgrviewer">
    <zgrv:directories>
        <zgrv:tmpDir value="true">/tmp</zgrv:tmpDir>
        <zgrv:graphDir>/home/liao6/svnrepos</zgrv:graphDir>
        <zgrv:dot>/home/liao6/opt/graphviz-2.18/bin/dot</zgrv:dot>
        <zgrv:neato>/home/liao6/opt/graphviz-2.18/bin/neato</zgrv:neato>
        <zgrv:circo>/home/liao6/opt/graphviz-2.18/bin/circo</zgrv:circo>
        <zgrv:twopi>/home/liao6/opt/graphviz-2.18/bin/twopi</zgrv:twopi>
        <zgrv:graphvizFontDir>/home/liao6/opt/graphviz-2.18/bin</zgrv:graphvizFontDir>
    </zgrv:directories>
    <zgrv:webBrowser autoDetect="true" options="" path=""/>
    <zgrv:proxy enable="false" host="" port="80"/>
    <zgrv:preferences antialiasing="false" cmdL_options=""
        highlightColor="-65536" magFactor="2.0" saveWindowLayout="false"
        sdZoom="false" sdZoomFactor="2" silent="true"/>
    <zgrv:plugins/>
    <zgrv:commandLines/>
</zgrv:config>

You have to configure the run.sh script to have correct path also

cat run.sh

#!/bin/sh

# If you want to be able to run ZGRViewer from any directory,
# set ZGRV_HOME to the absolute path of ZGRViewer's main directory
# e.g. ZGRV_HOME=/usr/local/zgrviewer

ZGRV_HOME=/home/liao6/opt/zgrviewer-0.8.1

java -jar $ZGRV_HOME/target/zgrviewer-0.8.1.jar "$@"

Example session[edit]

A complete example

# make sure the environment variables(PATH, LD_LIBRARY_PATH) for the installed rose are correctly set
which dotGeneratorWholeASTGraph
~/workspace/masterClean/build64/install/bin/dotGeneratorWholeASTGraph

# run the dot graph generator
dotGeneratorWholeASTGraph -c ttt.c

#see it
which run.sh
~/64home/opt/zgrviewer-0.8.2/run.sh

run.sh ttt.c_WholeAST.dot

How to create a translator[edit]

Translator basically converts one AST to another version of AST. The translation process may add, delete, or modify the information stored in AST.

Overview[edit]

A ROSE-based translator usually has the following steps

  1. Search for the AST nodes you want to translate.
  2. Perform the translation action on the found AST nodes. This action can be one of two major variants
  • Updating the existing AST nodes
  • Creating new AST nodes to replace the original ones. This is usually cleaner approach than patching up existing AST and is better supported by SageBuilder and SageInterface functions.
  • Deep copying existing AST subtrees to duplicate the code. May expression subtrees should not be shared. So deep copy them is required to get the correct AST.
  • Optionally update other related information for the translation.

First Step[edit]

Get familiar with the ASTs before and after your translation. So you know for sure what your code will deal with and what AST you code will generate.

The best way is to prepare simplest sample codes and carefully examine the whole dot graphs of them.

Design considerations[edit]

It is usually a good idea to

  • separate the searching step from the translation step so one search (traversal) can be reused by all sorts of translations.
  • When design the order of searching and translation, be careful about if the translation will negatively impact on the searching
    • Please void pre-order traversal since you may end up modifying AST nodes to be visited later on, similar to the effect of iterator invalidation.
    • please use post-order, or reverse order of pre-order for your traversal hooked up with translation

Searching for the AST node[edit]

There are multiple ways to find things you want to translate in AST.

AST Query[edit]

  • Via AST Query: Node query returns a list of AST nodes in the same type. This is often enough to simple translations
Rose_STL_Container<SgNode*> ProgramHeaderStatementList = NodeQuery::querySubTree (project,V_SgProgramHeaderStatement);
for (Rose_STL_Container<SgNode*>::iterator i = ProgramHeaderStatementList.begin(); i != ProgramHeaderStatementList.end(); i++)
{
    SgProgramHeaderStatement* ProgramHeaderStatement = isSgProgramHeaderStatement(*i);
    ...
}


More information about AST Query can be found at "6 Query Library" of the ROSE User Manual pdf.

AST Traversal[edit]

  • Through AST traversal: walks through whole AST using different orders (pre-order or post order). Post-order traversal is recommended to avoid modifying things the traversal will hit later on (similar problem as iterator invalidation in C++)
    • The AST traversal gives visit() functions to hook up your translation functions. A switch statement is can be used for handling different types of AST node.
class f2cTraversal : public AstSimpleProcessing
{
  public:
    virtual void visit(SgNode* n);
};

void f2cTraversal::visit(SgNode* n)
{
  switch(n->variantT())
  {
    case V_SgSourceFile:
      {
        SgFile* fileNode = isSgFile(n);
        translateFileName(fileNode);
      }
      break;
    case V_SgProgramHeaderStatement:
      {
        ...
      }
      break;
    default:
      break;
  }
}

More information about AST Traversal can be found at "7 AST Traversal" of the ROSE User manual pdf online.

Performing Translation[edit]

Before you write your translator, please read Chapter 32 AST Construction of ROSE tutorial pdf documentation (http://rosecompiler.org/ROSE_Tutorial/ROSE-Tutorial.pdf). It contains essential information for any translation writers.

The translations you want to do often depend on the types of the AST nodes you visit. For example you can have a set of translation functions defined in your namespace

  • void translateForLoop(SgForLoop* n)
  • void translateFileName(SgFile* n)
  • void translateReturnStatement(SgReturnStmt* n), and so on

Other tips

Updating Tree[edit]
  • You might need to handle some details, like removing symbol, updating parent, and symbol table.
  • Be careful to use deepDelete() and deepCopy(). Some information might not be updated properly. For example, deepDelete might not update your symbol table.

Verify the correctness[edit]

You can use wholeAST graph to verify your translation.

All ROSE-based translators should call AstTests::runAllTests(project) after all the transformation is done to make sure the translated AST is correct.

This has a higher standard than just correctly unparsed to compilable code. It is common for an AST to go through unparsing correctly but fail on the sanity check.

More information is at Sanity_check

Sample translators[edit]

Here we list a few sample translators which can grow to more sophisticated ones you want.

Find pragmas[edit]

/*
toy code
by Liao, 12/14/2007
*/
#include "rose.h"
#include <iostream>
using namespace std;

class visitorTraversal : public AstSimpleProcessing
{
  protected:
    virtual void visit(SgNode* n);
};

void visitorTraversal::visit(SgNode* node)
{
  if (node->variantT() == V_SgPragmaDeclaration) {
      cout << "pragma!" << endl;
  }
}

int main(int argc, char * argv[])
{
  SgProject *project = frontend (argc, argv);
  visitorTraversal myvisitor;
  myvisitor.traverseInputFiles(project,preorder);

  return backend(project);
}

Loop transformation[edit]

SageInterface namespace (http://rosecompiler.org/ROSE_HTML_Reference/namespaceSageInterface.html) has many translation functions, such as those for loops.

For example, there is a loop tiling function defined in https://github.com/rose-compiler/rose/blob/master/src/frontend/SageIII/sageInterface/sageInterface.C :

//     Tile the n-level (starting from 1) loop of a perfectly nested loop nest using tiling size s.
bool     loopTiling (SgForStatement *loopNest, size_t targetLevel, size_t tileSize)


An example Test translator is provided to test this function:

And it has a test input file:

How to build your translator[edit]

See How to set up the makefile for a translator

How to create a cross-language translator[edit]

In this HOW-to, it presents the steps of generating a cross-language translator. We will use Fortran to C translator as an example here.

Change the sourcefile information[edit]

  • change the output file name. The suffix name has to be changed with this following function.
void SgFile::set_unparse_output_filename (std::string unparse_output_filename ) 
  • change the output language type.
void SgFile::set_outputLanguage(SgFile::outputLanguageOption_enum outputLanguage)       
  • Set the output to be target-language only.
 We use set_C_only for the Fortran to C translation.  This process might be optional.
void SgFile::set_C_only(bool C_only)

Identify language-dependent AST node[edit]

  • Example: ROSE AST uses different AST nodes to present a loop in C and Fortran. The following two figures represent the same loop for different languages.
C uses SgForStatement for the for loops.
C SgForStatement
Fortran uses SgFortranDo for the do loops.
Fortran SgFortranDo

Implement the translation functions[edit]

  • Use the wholeAST as reference to implement the translation function.
  • Generate the new AST node by copy required information from the original AST node.
  • Remove the original node, and make sure the parent/child relationship in AST is setup properly.

Testing output code[edit]

  • If compiler is available to test the output code, run the backend to generate object by the backend compiler.
  • If compiler is not available for the target language, make sure output codes can be generated from the testing cases. It is suggested to run the compilation tests for all the testing output.

How to set up the makefile for a translator[edit]

In this How-to, you will create a makefile to compile and test your own custom ROSE translator.

You may want to first look at "How-to install ROSE": ROSE Compiler Framework/Installation.

Environment variables[edit]

You must have the proper environment variable set so you translator can find the librose.so during execution.

export LD_LIBRARY_PATH=${ROSE_INSTALL}/lib:${BOOST_INSTALL}/lib:$LD_LIBRARY_PATH

Translator Code[edit]

Here is a simplest ROSE translator.

// ROSE translator example: identity translator.
//
// No AST manipulations, just a simple translation:
//
//    input_code > ROSE AST > output_code

#include <rose.h>

int main (int argc, char** argv)
{
    // Build the AST used by ROSE
    SgProject* project = frontend(argc, argv);

    // Run internal consistency tests on AST
    AstTests::runAllTests(project);

    // Insert your own manipulations of the AST here...

    // Generate source code from AST and invoke your
    // desired backend compiler
    return backend(project);
}

Makefile[edit]

Here is a sample makefile. Please make sure replacing some leading spaces of make rules with leading Tabs if you copy & paste this sample.

## A sample Makefile to build a ROSE tool.
##
## Important: remember that Makefile rules must contain tabs:
##
##     <target>: [ <dependency > ]*
##         [ <TAB> <command> <endl> ]+
## 
## Please replace space with TAB if you copy & paste this file to create your Makefile!!

## ROSE installation contains
##   * libraries, e.g. "librose.la"
##   * headers, e.g. "rose.h"
ROSE_INSTALL=/path/to/rose/installation

## ROSE uses the BOOST C++ libraries, the --prefix path
BOOST_INSTALL=/path/to/boost/installation

## Your translator
TRANSLATOR=my_translator
TRANSLATOR_SOURCE=$(TRANSLATOR).cpp

## Input testcode for your translator
TESTCODE=input_code_ifs.cpp

#-------------------------------------------------------------
# Makefile Targets
#-------------------------------------------------------------

all: $(TRANSLATOR)

# compile the translator and generate an executable
# -g is recommended to be used by default to enable debugging your code
$(TRANSLATOR): $(TRANSLATOR_SOURCE)
        g++ -g $(TRANSLATOR_SOURCE) -o $(TRANSLATOR) -I$(BOOST_INSTALL)/include -I$(ROSE_INSTALL)/include -L$(ROSE_INSTALL)/lib -lrose

# test the translator
check: $(TRANSLATOR)
        ./$(TRANSLATOR) -c -I. -I$(ROSE_INSTALL)/include $(TESTCODE) 

clean:
        rm -rf $(TRANSLATOR) *.o rose_* *.dot

A complete example[edit]

The sample Makefile prepared within ROSE virtual machine image.

demo@ubuntu:~/myTranslator$ cat makefile 
## A sample Makefile to build a ROSE tool.
##
## Important: remember that Makefile recipes must contain tabs:
##
##     <target>: [ <dependency > ]*
##         [ <TAB> <command> <endl> ]+
## So you have to replace spaces with Tabs if you copy&paste this file from a browser!


## ROSE installation contains
##   * libraries, e.g. "librose.la"
##   * headers, e.g. "rose.h"
ROSE_INSTALL=/home/demo/opt/rose-inst

## ROSE uses the BOOST C++ libraries
BOOST_INSTALL=/home/demo/opt/boost-1.40.0

## Your translator
TRANSLATOR=myTranslator
TRANSLATOR_SOURCE=$(TRANSLATOR).cpp

## Input testcode for your translator
TESTCODE=hello.cpp

#-------------------------------------------------------------
# Makefile Targets
#-------------------------------------------------------------

all: $(TRANSLATOR)

# compile the translator and generate an executable
# -g is recommended to be used by default to enable debugging your code
# Note: depending on the version of boost, you may have to use something like -I $(BOOST_ROOT)/include/boost-1_40 instead. 
$(TRANSLATOR): $(TRANSLATOR_SOURCE)
        g++ -g $(TRANSLATOR_SOURCE) -I$(BOOST_INSTALL)/include -I$(ROSE_INSTALL)/include -L$(ROSE_INSTALL)/lib -lrose -o $(TRANSLATOR)

# test the translator
check: $(TRANSLATOR)
        ./$(TRANSLATOR) -c -I. -I$(ROSE_INSTALL)/include $(TESTCODE) 

clean:
        rm -rf $(TRANSLATOR) *.o rose_* *.dot


demo@ubuntu:~/myTranslator$ make check
g++ -g myTranslator.cpp -I/home/demo/opt/boost-1.40.0/include -I/home/demo/opt/rose-inst/include -L/home/demo/opt/rose-inst/lib -lrose -o myTranslator
./myTranslator -c -I. -I/home/demo/opt/rose-inst/include hello.cpp 

How to debug a translator[edit]

It is rare that your translator will just work after your finish up coding. Using gdb to debug your code is indispensable to make sure your code works as expected. This page shows examples of how to debug your translator.

Preparations[edit]

First and foremost, make sure your ROSE installation and your translator was built with -g and without GCC optimizations turned on. This will ensure all debug information will be best preserved.

You can use "make V=1" to check the actual compilation options. If the default ROSE's configure options are not desirable, you can turn off optimizations by:

 configure --with-C_OPTIMIZE=no --with-CXX_OPTIMIZE=no 

Before you debug your own translators, you may want to doublecheck if ROSE's builtin translator (identityTranslator) can handle your input code properly. If not, you should report the bug to the ROSE team.

If identityTranslator can handle it but your customized translator cannot. The problem may be caused by the customizations you introduced in your translators.

Another thing is to reduce your input code to be as small as possible so it can just trigger the error you are interested in. This will simplify the bug hunting process dramatically. It is very difficult to debug a translator processing thousands of lines of code.

A translator not built by ROSE's build system[edit]

If the translator is built using a makefile without using libtool. The debugging steps of your translator are just classic steps to use gdb.

  • Make sure your translator is compiled with the GNU debugging option -g so there is debugging information in your object codes

These are the steps of a typical debugging session:

1. Set a break point

2. Examine the execution path to make sure the program goes through the path that you expected

3. Examine the local data to validate their values

# how to print out information about a AST node
#-------------------------------------
(gdb) print n
$1 = (SgNode *) 0xb7f12008

# Check the type of a node
#-------------------------------------
(gdb) print n->sage_class_name()
$2 = 0x578b3af "SgFile"

(gdb) print n->get_parent()
$7 = (SgNode *) 0x95e75b8  

# Convert a node to its real node type then call its member functions
#---------------------------
(gdb) isSgFile(n)->getFileName ()

#-------------------------------------
# When displaying a pointer to an object, identify the actual (derived) type of the object 
# rather than the declared type, using the virtual function table. 
#-------------------------------------
(gdb) set print object on
(gdb) print astNode
$6 = (SgPragmaDeclaration *) 0xb7c68008


# unparse the AST from a node
# Only works for AST pieces with full scope information
# It will report error if scope information is not available at any ancestor level.
#-------------------------------------
(gdb) print n->unparseToString()

# print out Sg_File_Info 
#-------------------------------------
(gdb) print n->get_file_info()->display()

A translator shipped with ROSE[edit]

ROSE turns on -O2 and -g by default so the translators shipped with ROSE should already have some debugging information available. But some variables may be optimized away. To preserve the max debugging information, you may have to reconfigure/recompile rose to turn off optimizations.

../sourcetree/configure --with-C_OPTIMIZE=no --with-CXX_OPTIMIZE=no  ...


ROSE uses libtool so the executables in the build tree are not real -- they're simply wrappers around the actual executable files. You have two choices:

  • Find the real executable in the .lib directory then debug the real executables there
  • Use libtool command line as follows:
$ libtool --mode=execute gdb --args ./built_in_translator file1.c

If you can set up alias command alias debug='libtool --mode=execute gdb -args' then all your debugging sessions can be as simple as

$ debug ./built_in_translator file1.c


The remaining steps are the same as a regular gdb session with the typical operations, such as breakpoints, printing data, etc.

Example 1: Fixing a real bug in ROSE

1. Reproduce the reported bug:

$ make check
...
./testVirtualCFG \
    --edg:no_warnings -w -rose:verbose 0 --edg:restrict \
    -I$ROSE/tests/CompileTests/virtualCFG_tests/../Cxx_tests \
    -I$ROSE/sourcetree/tests/CompileTests/A++Code \
    -c $ROSE/sourcetree/tests/CompileTests/virtualCFG_tests/../Cxx_tests/test2001_01.C

...
lt-testVirtualCFG: $ROSE/src/frontend/SageIII/virtualCFG/virtualCFG.h:111:
    VirtualCFG::CFGEdge::CFGEdge(VirtualCFG::CFGNode, VirtualCFG::CFGNode):
    Assertion `src.getNode() != __null && tgt.getNode() != __null' failed.

Ah, so we've failed an assertion within the virtualCFG.h header file on line 111:

Assertion `src.getNode() != __null && tgt.getNode() != __null' failed

And the error was produced by running the lt-testVirtualCFG libtool executable translator, i.e. the actual translator name is testVirtualCFG (without the lt- prefix).

2. Run the same translator command line with Libtool to start a GDB debugging session:

$ libtool --mode=execute gdb --args ./testVirtualCFG \
    --edg:no_warnings -w -rose:verbose 0 --edg:restrict \
    -I$ROSE/tests/CompileTests/virtualCFG_tests/../Cxx_tests \
    -I$ROSE/sourcetree/tests/CompileTests/A++Code \
    -c $ROSE/sourcetree/tests/CompileTests/virtualCFG_tests/../Cxx_tests/test2001_01.C

GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-42.el5_8.1)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from ${ROSE_BUILD_TREE}tests/CompileTests/virtualCFG_tests/.libs/lt-testVirtualCFG...done.
(gdb)

The GDB session has started, and we're provided with a command line prompt to begin our debugging.

3. Let's run the program, which will hit the failed assertion:

(gdb) r
Starting program: \
    ${ROSE_BUILD_TREE}/tests/CompileTests/virtualCFG_tests/.libs/lt-testVirtualCFG \
    --edg:no_warnings -w -rose:verbose 0 --edg:restrict \
    -I${ROSE}/tests/CompileTests/virtualCFG_tests/../Cxx_tests \
    -I../../../../sourcetree/tests/CompileTests/A++Code
    -c   ${ROSE}/tests/CompileTests/virtualCFG_tests/../Cxx_tests/test2001_01.C
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x2aaaaaaab000
[Thread debugging using libthread_db enabled]
lt-testVirtualCFG: ${ROSE}/src/frontend/SageIII/virtualCFG/virtualCFG.h:111: 

VirtualCFG::CFGEdge::CFGEdge(VirtualCFG::CFGNode, VirtualCFG::CFGNode): Assertion `src.getNode() != __null && tgt.getNode() != __null' failed.

Program received signal SIGABRT, Aborted.
0x0000003752230285 in raise () from /lib64/libc.so.6

Okay, we've reproduced the problem in our GDB session.

4. Let's check the backtrace to see how we wound up at this failed assertion:

(gdb) bt
#0  0x0000003752230285 in raise () from /lib64/libc.so.6
#1  0x0000003752231d30 in abort () from /lib64/libc.so.6
#2  0x0000003752229706 in __assert_fail () from /lib64/libc.so.6

#3  0x00002aaaad6437b2 in VirtualCFG::CFGEdge::CFGEdge (this=0x7fffffffb300, src=..., tgt=...)
     at ${ROSE}/../src/frontend/SageIII/virtualCFG/virtualCFG.h:111
#4  0x00002aaaad643b60 in makeEdge<VirtualCFG::CFGNode, VirtualCFG::CFGEdge> (from=..., to=..., result=...)
     at ${ROSE}/../src/frontend/SageIII/virtualCFG/memberFunctions.C:82
#5  0x00002aaaad62ef7d in SgReturnStmt::cfgOutEdges (this=0xbfaf10, idx=1)
     at ${ROSE}/../src/frontend/SageIII/virtualCFG/memberFunctions.C:1471
#6  0x00002aaaad647e69 in VirtualCFG::CFGNode::outEdges (this=0x7fffffffb530)
     at ${ROSE}/../src/frontend/SageIII/virtualCFG/virtualCFG.C:636
#7  0x000000000040bf7f in getReachableNodes (n=..., s=...) at ${ROSE}/tests/CompileTests/virtualCFG_tests/testVirtualCFG.C:13
...

5. Next, we'll move backwards (or upwards) in the program to get to the point of assertion:

(gdb) up
#1  0x0000003752231d30 in abort () from /lib64/libc.so.6

(gdb) up
#2  0x0000003752229706 in __assert_fail () from /lib64/libc.so.6

(gdb) up
#3  0x00002aaaad6437b2 in VirtualCFG::CFGEdge::CFGEdge (this=0x7fffffffb300, src=..., tgt=...)
     at ${ROSE}/src/frontend/SageIII/virtualCFG/virtualCFG.h:111
111         CFGEdge(CFGNode src, CFGNode tgt): src(src), tgt(tgt) \
                   { assert(src.getNode() != NULL && tgt.getNode() != NULL); }

Okay, so the assertion is inside of a constructor for CFGEdge:

CFGEdge(CFGNode src, CFGNode tgt): src(src), tgt(tgt) \
{
    assert(src.getNode() != NULL && tgt.getNode() != NULL);  # This is the failed assertion
}

Unfortunately, we can't tell at a glance which of the two conditions in the assertion is failing.

6. Figure out why the assertion is failing:

Let's examine the two conditions in the assertion:

(gdb) p src.getNode()
$1 = (SgNode *) 0xbfaf10

So src.getNode() is returning a non-null pointer to an SgNode. How bout tgt.getNode()?

(gdb) p tgt.getNode()
$2 = (SgNode *) 0x0

Ah, there's the culprit. So for some reason, tgt.getNode() is returning a null SgNode pointer (0x0).

From here, we used the GDB up command to backtrace in the program to figure out where the node returned by tgt.getNode() was assigned a NULL value.

We eventually found a call to SgReturnStmt::cfgOutEdges which returns a variable, called enclosingFunc. In the source code, there's currently no assertion to check the value of enclosingFunc, and that's why we received the assertion later on in the program. As a side note, it is good practice to add assertions as soon as possible in your source code so in times like this, we don't have to spend time unnecessarily back-tracing.

After adding the assertion for enclosingFunc, we run the program again to reach this new assertion point:

lt-testVirtualCFG: ${ROSE}sourcetree/src/frontend/SageIII/virtualCFG/memberFunctions.C:1473: \
    virtual std::vector<VirtualCFG::CFGEdge, std::allocator<VirtualCFG::CFGEdge> > \
    SgReturnStmt::cfgOutEdges(unsigned int): \

    Assertion `enclosingFunc != __null' failed.

Okay, it's failing so we know that the assignment to enclosingFunc is NULL.

# enclosingFunc is definitely NULL (0x0)
(gdb) p enclosingFunc
$1 = (SgFunctionDefinition *) 0x0

# What is the current context?
(gdb) p this
$2 = (SgReturnStmt * const) 0xbfaf10

Okay, we're inside of an SgReturnStmt object. Let's set a break point where enclosingFunc is being assigned to:

Breakpoint 1, SgReturnStmt::cfgOutEdges (this=0xbfaf10, idx=1) at ${ROSE}/src/frontend/SageIII/virtualCFG/memberFunctions.C:1472
1472              SgFunctionDefinition* enclosingFunc = SageInterface::getEnclosingProcedure(this);

So this is the line we're examining:

SgFunctionDefinition* enclosingFunc = SageInterface::getEnclosingProcedure(this);

So the NULL value must be coming from SageInterface::getEnclosingProcedure(this);.

After code reviewing the function getEnclosingProcedure, we discovered a flaw in the algorithm.

The function tries to return a SgNode which is the enclosing procedure of the specified type, SgFunctionDefinition. However, upon checking the function's state at the point of return, we see that it incorrectly detected a SgBasicBlack as the enclosing procedure for the SgReturnStmt.

(gdb) p parent->class_name()
$12 = {static npos = 18446744073709551615,
   _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x7cd0e8 "SgBasicBlock"}}

Specifically, the last piece: 0x7cd0e8 "SgBasicBlock".

But this is wrong because we're looking for SgFunctionDefinition, not SgBasicBlock.

Upon further examination, we figured out that the function simply returned the first enclosing node it found, and not the first enclosing node that matched the user's criteria.

We added the necessary logic to make the function complete, tested it to verify its correctness, and then called it a day.

Phew! Not so bad, right?

How to add a new project directory[edit]

Most code development that is layered above the ROSE library starts out its life as a project in the projects directory. Some projects are eventually refactored into the ROSE library once they mature. This chapter describes how one adds a new project to ROSE.

Required Files[edit]

A ROSE project encapsulates a complete program or set of related programs that use the ROSE library. Each project exists as a subdirectory of the ROSE "projects" directory and should include files "README", "config/support-rose.m4", "Makefile.am", and any necessary source files, scripts, tests, etc.

  • The "README" should provide an explanation about the project purpose, algorithm, design, implementation, etc.
  • The "support-rose.m4" integrates the project into the ROSE build system in a manner that allows the project to be an optional component (they can be disabled, renamed, deleted, or withheld from distribution without changing any ROSE configuration files). Most older projects are lacking this file and are thus more tightly coupled with the build system.
  • The "Makefile.am" serves as the input to the GNU automake system that ROSE employs to generate Makefiles.
  • Each project should also include all necessary source files, documentation, and test cases.

Setting up support-rose.m4[edit]

The "config/support-rose.m4" file integrates the project into the ROSE configure and build system. At a minimum, it should contain a call to the autoconf AC_CONFIG_FILES macro with a list of the project's Makefiles (without the ".am" extension) and its doxygen configuration file (without the ".in" extension). It may also contain any other necessary autoconf checks that are not already performed by ROSE's main configure scripts, including code to enable/disable the project based on the availability of the project's prerequisites.

Here's an example:

dnl List of all makefiles and autoconf-generated                          -*- autoconf -*-
dnl files for this project
AC_CONFIG_FILES([projects/DemoProject/Makefile
                 projects/DemoProject/gui/Makefile
                 projects/DemoProject/doxygen/doxygen.conf
                ])

dnl Even if this project is present in ROSE's "projects" directory, we might not have the
dnl prerequisites to compile this project.  Enable the project's makefiles by using the
dnl ROSE_ENABLE_projectname automake conditional.  Many prerequisites have probably already
dnl been tested by ROSE's main configure script, so we don't need to list them here again
dnl (although it usually doesn't hurt).
AC_MSG_CHECKING([whether DemoProject prerequisites are satisfied])
if test "$ac_cv_header_gcrypt_h" = "yes"; then
        AC_MSG_RESULT([yes])
        rose_enable_demo_project=yes
else
        AC_MSG_RESULT([no])
        rose_enable_demo_project=
fi
AM_CONDITIONAL([ROSE_ENABLE_DEMO_PROJECT], [test "$rose_enable_demo_project" = yes])

Since all configuration for the project is encapsulated in the "support-rose.m4" file, renaming, disabling, or removing the project is trivial: a project can be renamed simply by renaming its directory, it can be disabled by renaming/removing "support-rose.m4", or it can be removed by removing its directory. The "build" and "configure" scripts should be rerun after any of these changes.

Since projects are self-encapsulated and optional parts of ROSE, they need not be distributed with ROSE. This enables end users to drop in their own private projects to an existing ROSE source tree without modifying any ROSE files, and it allows ROSE developers to work on projects that are not distributed publicly. Any project directory that is not part of ROSE's main Git repository will not be distributed (this includes not distributing Git submodules, although the submodule's placeholder empty directory will be distributed).

Setting up Makefile.am[edit]

Each project should have at least one Makefile.am, each of which is processed by GNU automake and autoconf to generate a Makefile. See documentation for automake for details about what these files should contain. Some important variables and targets are:

  • include $(top_srcdir)/config/Makefile.for.ROSE.includes.and.libs: This brings in the definitions from the higher level Makefiles and is required by all projects. It should be near the top of the Makefile.am.
  • SUBDIRS: This variable should contain the names all the project's subdirectories that have Makefiles. It may be omitted if the project's only Makefile is in that project's top-level directory.
  • INCLUDES: This would have the the flags that need to be added during compilation (flags like -I$(top_srcdir)/projects/RTC/include). Your flags should be placed before $(ROSE_INCLUDES) to ensure the correct files are found. This brings in all the necessary headers from the src directory to your project.
  • lib_*: These variables/targets are necessary if you are creating a library from your project, which can be linked in with other projects or the src directory later. This is the recommended way of handling projects.
  • EXTRA_DIST: These are the files that are not listed as being needed to build the final object (like source and header files), but must still be in the ROSE tarball distribution. This could include README or configuration files, for example.
  • check-local: This is the target that will be called from the higher level Makefiles when make check is called.
  • clean-local: Provides you with a step to perform manual cleanup of your project, for instance, if you manually created some files (so Automake won't automatically clean them up).

A basic example[edit]

Many projects start as a translator, analyzer or optimizer, which takes into input code and generate output.

A basic sample commit which adds a new project directory into ROSE: https://github.com/rose-compiler/rose/commit/edf68927596960d96bb773efa25af5e090168f4a

Please look through the diffs so you know what files to be added and changed for a new project.

Essentially, a basic project should contain

  • a README file explaining what this project is about, algorithm, design, implementation, etc
  • a translator acts as a driver of your project
  • additional source files and headers as needed to contain the meat of your project
  • test input files
  • Makefile.am to
    • compile and build your translator
    • contain make check rule so your translator will be invoked to process your input files and generate expected results

To connect your project into ROSE's build system, you also need to

  • Add one more subdir entry into projects/Makefile.am for your project directory
  • Add one line into config/support-rose.m4 for EACH new Makefile (generated from each Makefile.am) used by your projects.

Installing project targets[edit]

Install your project's content to a separate directory within the user's specified --prefix location. The reason behind this is that we don't want to pollute the core ROSE installation space. By doing so, we can reduce the complexity and confusion of the ROSE installation tree, while eliminating cross-project file collisions. It also keeps the installation tree modular.

Example[edit]

This example uses a prefix for installation. It also maintains Semantic Versioning.

From projects/RosePoly:

  ## 1. Version your project properly (http://semver.org/)
  rosepoly_API_VERSION=0.1.0

  ## 2. Install to separate directory
  ##
  ##    Installation tree should resemble:
  ##
  ##    <--prefix>
  ##    |--bin      # ROSE/bin
  ##    |--include  # ROSE/include
  ##    |--lib      # ROSE/lib
  ##    |
  ##    |--<project>-<version>
  ##       |--bin      # <project>/bin
  ##       |--include  # <project>/include
  ##       |--lib      # <project>/lib
  ##
  exec_prefix=${prefix}/rosepoly-$(rosepoly_API_VERSION)

  ## Installation/include tree should resemble:
  ##   |--<project>-<version>
  ##      |--bin      # <project>/bin
  ##      |--include  # <project>/include
  ##         |--<project>
  ##      |--lib      # <project>/lib
  librosepoly_la_includedir = ${exec_prefix}/include/rosepoly

Generate Doxygen Documentation[edit]

0. Install Doxygen tool

Using MacPorts for Apple's Mac OS:

  $ port install doxygen
 
  # set path to MacPort's bin/
  # ...

Using one of the LLNL machines:

  $ export PATH=/nfs/apps/doxygen/latest/bin:$PATH


1. Create a Doxygen configuration file

  $ doxygen -g
 
Configuration file `Doxyfile' created.
 
Now edit the configuration file and enter
 
  doxygen Doxyfile
 
to generate the documentation for your project


2. Customize the configuration file (Doxyfile):

...

# If the EXTRACT_ALL tag is set to YES doxygen will assume all entities in
# documentation are documented, even if no documentation was available.
# Private class members and static file members will be hidden unless
# the EXTRACT_PRIVATE and EXTRACT_STATIC tags are set to YES

EXTRACT_ALL            = YES

...

# If the value of the INPUT tag contains directories, you can use the
# FILE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp
# and *.h) to filter out the source-files in the directories. If left
# blank the following patterns are tested:
# *.c *.cc *.cxx *.cpp *.c++ *.d *.java *.ii *.ixx *.ipp *.i++ *.inl *.h *.hh
# *.hxx *.hpp *.h++ *.idl *.odl *.cs *.php *.php3 *.inc *.m *.mm *.dox *.py
# *.f90 *.f *.for *.vhd *.vhdl

FILE_PATTERNS          = *.cpp *.hpp

# The RECURSIVE tag can be used to turn specify whether or not subdirectories
# should be searched for input files as well. Possible values are YES and NO.
# If left blank NO is used.

RECURSIVE              = YES

...


3. Generate the Doxygen documentation

  # Invoke from your top-level directory
  $ doxygen Doxyfile


4. View and verify the HTML documentation

  $ firefox html/index.html &

5. Add target to your Makefile.am to generate the documentation

.PHONY: docs
docs:
    doxygen Doxyfile # TODO: should be $(DOXYGEN)

How to fix a bug[edit]

If you are trying to fix a bug ( your own or a bug assigned to you to fix). Here are high level steps to do the work

Reproduce the bug[edit]

You can only fix a bug when you can reproduce it. This step may be more difficult than it sounds. In order to reproduce a bug, you have to

  • find a proper input file
  • find a proper translator: a translator shipped with ROSE is easy to find. But be patient and sincere when you ask for a translator written by users.
  • find a similar/identical software and hardware environment: a bug may only appear on a specific platform when a specific software configuration is used

Possible results for this step:

  • You can reproduce the bug reliably. Bingo! Go to the next step.
  • You cannot reproduce the bug. Either the bug report is invalid or you have to keep trying.
  • You can reproduce the bug once a while (random errors). Oops. This is kind of difficult situation.

Find causes of the bug[edit]

Once you can reproduce the bug. You have to identify the root cause of the bug using a debugger like gdb.

Common steps involved

  • simplify the input code as much as possible: It can be very hard to debug a problem with a huge input. Always try to prepare the simplest possible code which can just trigger the bug.
    • Often, you have to use a binary search approach to narrow down the input code: only use half of the input at a time to try. Recursively cut the input file into two parts until no further cut is possible while you can still trigger the bug.
  • forward tracking: for the translator, it usually takes input and generate intermediate results before the final output is generated. Using a debugger to set break points at each critical stages of the code to check if the intermediate results are what you expect.
  • backwards tracking: similar to the previous techniques. But you just back tracking the problem.

Fix the bug[edit]

Any bug fix commit should contain

  • a regression test: so make check rules can make sure the bug is actually fixed and no further code changes will make the bug relapse.

How to add a ROSE commandline option[edit]

Often a feature added into ROSE comes with a set of command line options. These options can enable and customize the feature.

For example, the OpenMP support in ROSE is disabled by default. A special option is need to enable it. Also, the support can be as little as simply parsing the OpenMP directive or as complex as translating into multithreaded code.

This HOWTO quickly go through key steps to add options.

internal flags[edit]

Options need to be stored somewhere. There are two choices for the storage, as a data member of SgProject v. a data member of SgFile. If the option can be as specific as per file, it is recommended to add a new data member to SgFile to save the option value.

For example, here is a command line option to turn on the UPC language support:

ROSE/src/ROSETTA/src/support.C // add a date member for SgFile

    // Liao (6/6/2008): Support for UPC model of C , 6/19/2008: add support for static threads compilation
    File.setDataPrototype         ( "bool", "UPC_only", "= false",
                                    NO_CONSTRUCTOR_PARAMETER, BUILD_ACCESS_FUNCTIONS, NO_TRAVERSAL, NO_DELETE);

ROSETTA process this information to automatically generate a member and the corresponding member access functions (set/get_member()).

process the option[edit]

Command line options should be handled within src/frontend/SageIII/sage_support/cmdline.cpp .

File level options are handled by void SgFile::processRoseCommandLineOptions ( vector<string> & argv )


Example code for processing the -rose:openmp option

     set_openmp(false);
     ROSE_ASSERT (get_openmp() == false);
       ...
     if ( CommandlineProcessing::isOption(argv,"-rose:","(OpenMP|openmp)",true) == true )
        {
          if ( SgProject::get_verbose() >= 1 )
               printf ("OpenMP option specified \n");
          set_openmp(true);
         //side effect for enabling OpenMP, define the macro as required
           argv.push_back("-D_OPENMP");
        }


ROSE commandline options should be removed after being processed, to avoid confusing the backend compiler

SgFile::stripRoseCommandLineOptions ( vector<string>& argv ) should have the code to strip off the option.

use the option[edit]

In your code, you can use the automatically generated access functions to set/retrieve the stored option values.

For example

  if (sourceFile->get_openmp())
     //... do something here ....

document the option[edit]

Any option should be explained by the online help output.

Please add brief help text for your option in void SgFile::usage ( int status ) of ./src/frontend/SageIII/sage_support/cmdline.cpp:

Lessons Learned[edit]

Here we collect things to do due to some past lessons.

Do Not Format/Indent other people's code[edit]

Lesson:

  • A developer tried to understand a staff member's source code. But he found that the code's indentation was not right for him. So he re-formatted the source files and committed the changes. Later, the staff member found that his code was changed too much and he could not read it anymore.
  • Even worse, people will have difficulties in merging changes mixed with indentation changes and real changes.

Solution:

  • Please don't reformat code you do not own or will not maintain.

Physical locations matter[edit]

Lesson

  • we had a student who was assigned a desk which was in a deep corner of a big room. The desk was also far away from other interns. As a result, that student had less interactions with others. He had to solve problems with less help.

Solution:

  • Locations MATTER! Sit closer to people you should interact often. Make your desk/office accessible to others. Physically isolated office/desk may have very negative impact on your productivity.

Choose your development platform carefully[edit]

Lesson

  • Somehow new inters were assigned Mac OS X machines by default. But some of them may not be familiar with Apple machines or even dislike Mac OS X's user interface, including keyboard, window system, etc (a love-hate thing for Apple products). So they felt stuck with an uncomfortable development platform. We had interns who could not type smoothly on Mac keyboard even after one month. This is unnecessary.

Solution

  • Provide choice up front: Linux or Mac OS X. Reminder people that they have freedom to choose the platform they personally enjoy.

Use different git repositories for different tasks[edit]

Lesson:

  • A developer used different branches of the same git repository to do different tasks: fixing bugs, adding a new feature, and documenting something. Later on he found that he could not commit and push the work for one task since the changes for other tasks are not ready.

Solution:

  • using separated git repositories for different tasks. So the status of one task won't interfere with the progress of other tasks.

Introducing software dependencies very carefully[edit]

Lesson

  • ROSE did not depend on boost C++ library in the beginning. But later on, some developers saw the benefits of Boost and advocated for it. Eventually, Boost becomes the required software to use ROSE.
  • But Boost library has its disadvantages: hard to install (just see how many boost issues on our public mailing list), lack of backward compatibility (codes using older version of boost break on new versions), huge header files with complex C++ templates slowing down compilation or even breaking some compilers.
  • We still have internal debates about what to do with Boost. It is often a painful and emotional process.

Solution:

  • Introducing big software dependency very carefully. Or you will get stuck easily.
  • At least ask people who advocate for new software dependency to be responsible for maintaining it for 5 years and providing an option to turn it off at the same time.

Create Exacting Tests Early and Often[edit]

Lesson:

  • A developer created tests that were too broad, mostly because they were included late in development. This led to passes that should not have passed, that is passing all tests even though the code had been broken.

Solution:

  • Make sure that tests check results carefully. This is made much easier by making sure your functions have precisely ONE intention. E.g. if you need to transform data and operate on the transformed data, split the transformation and the operation into two functions (at least).

Keep Code Readable While Coding[edit]

Lesson:

  • A developer wrote code without commenting initially, then came back to the code and had to go through the arduous task of understanding

his own unreadable code.

Solution:

  • Keep variable and function names meaningful. Do full documentation as you go, do not leave it for later.

Think Before You Code[edit]

Lesson:

  • A developer wrote code without minding the structure. This led to bloated and unreadable code that would have to be

refactored several times.

Solution:

  • A programmer must code AND design, not just code. Well structured code is much easier to read then badly structured code

Remember The User[edit]

Lesson: A developer wrote the code without knowing what the users actually needed. This led to serious refactoring that could have been avoided, or at least made simpler, if he had concentrated on the user at all times.

Solution: Whenever possible ask users for their input. It will save you a lot of trouble in the long run.

The User is Paramount[edit]

Lesson: A developer wrote a rather obtuse component without understanding exactly what the user might want this for

Solution: At the very least check that the input and output are what the user wanted, this will save much time and aggravation

references[edit]

http://www.projectsmart.co.uk/lessons-learned.html

Testing[edit]

ROSE uses Jenkins to implement a contiguous integration software development process. It leverages a range of software packages to test its correctness, robustness, and performance.

make check rules[edit]

we leverage make check rules to do internal testing.

check exist status of pipeline command[edit]

In bash scripting, we can use pipelines | as follows:

  • command1 | command2 : the output of each command in the pipeline is connected to the input of the next command

each command is executed in its own subshell, exit status: the last command's exit status

To catch any command's return code, please use ${PIPESTATUS[0]}

For example: Using pipeline will only return the last command 'folds status. we add a test to catch the first command's return status

 ../autoPar -c $(srcdir)/$(@:.o=.c) | fold >$(@:.o=.out); test ${PIPESTATUS[0]} = 0

Benchmarks[edit]

The software used by the ROSE's Jenkins include:

  • SPEC CPU 2006 benchmark: a subset is supported for now
  • SPEC OMP benchmark: a subset is supported for now
  • NAS parallel benchmark: developed by NASA Ames Research Center. Both C (customized version) and OpenMP versions are used
  • Plum Hall C and C++ Validation Test Suites: a subset is supported for now
  • Jt++ - Java conformance testing: http://modena.us/

Modena Jt++ Test Suite[edit]

1. Clone the Modena test suite repository:

  $ git clone ssh://rose-dev@rose-git/modena

2. Autotools setup

  $ cd modena
  $ ./build.sh
  + libtoolize --force --copy --ltdl --automake
  + aclocal -I ./acmacros -I ./acmacros/ac-archive -I /usr/share/aclocal
  + autoconf
  + automake -a -c
  configure.ac:4: installing `./install-sh'
  configure.ac:4: installing `./missing'

3. Environment bootstrap

  $ source /nfs/apps/python/latest/setup.sh

4. Build and test!

  $ mkdir buildTree
  $ cd buildTree
  $ ../configure \
          --with-sqlalchemy=${HOME}/opt/python/sqlalchemy/0.7.5/lib64/python2.4/site-packages \
          --with-target-java-interpreter=java \
          --with-target-java-compiler=testTranslator \
          --with-target-java-compiler-flags="-ecj:1.6" \
          --with-host-java-compiler-flags="-source 1.6"

Jenkins[edit]

Using External Benchmarks[edit]

The way we set it up is to

  • In the benchmark, we change the benchmark's build system to call the ROSE tool (identityTranslator or your RTED tool) installed.
  • In the Jenkins test job,
    • Build and install the tested ROSE, prepare environment variables.
    • Go to the benchmark with modified build system. Build and run the benchmark.

Basically, the test job should simulate how a ROSE tool would be used by end-users, not by tweaking ROSE for each different benchmarks.

NAS Parallel Benchmarks[edit]

We have NPB as part of our regression tests.

To get the benchmark:

git clone rose-dev@rosecompiler1.llnl.gov:testsuite/npb-c-parallel.git

Within the benchmark, there is a make.def file with configuration about which compiler and options to use. The configuration should be correct but some path may be changed to point to your version of ROSE.

After that, type "make suite" to build the entire benchmark suite. Or type "make mg class=A" to just build the benchmark in question.

Git[edit]

Introduction[edit]

The ROSE project has been through multiple stages of source content management, starting from CVS, then subversion, and now Git.

Git becomes the official source code version control software due to its unique features, including

  • Distributed source code management. Developers can have a self-contained local repository to do their work anywhere they want, without the need for active connection to a central repository.
  • Easy merge. Merging using Git is as simple as it can get.
  • Backup. Since easy clone of our central repository can serve as a standalone repository. We no longer worry too much about losing the central repository.
  • Integrity. Hashing algorithm used by Git ensures that you will get out what you have put into the repository.

Many other prominent software projects have also been through the similar switch from Subversion to Git, including

A more comprehensive list of Git users is given by https://git.wiki.kernel.org/index.php/GitProjects

In summary, Git IS the state-of-the-art for source code management.

git 1.7.10 or later for github.com[edit]

github requires git 1.7.10 or later to avoid HTTPS cloning errors, as mentioned at https://help.github.com/articles/https-cloning-errors

Ubuntu 10.04's package repository has git 1.7.0.4. So building later version of git is needed. But you still need an older version of git to get the latest version of git.

 apt-get install git-core

Now you can clone the latest git

 git clone https://github.com/git/git.git

Install all prerequisite packages needed to build git from source files(assuming you already installed GNU tool chain with GCC compiler, make, etc.)

 sudo apt-get install gettext zlib1g-dev asciidoc libcurl4-openssl-dev
 $ cd git  # enter the cloned git directory
 $ make configure ;# as yourself
 $ ./configure --prefix=/usr ;# as yourself
 $ make all doc ;# as yourself
 # make install install-doc install-html;# as root

Converting from a Subversion user[edit]

If you're coming from a centralized system, you may have to unlearn a few of the things you've become accustomed to.

  • For example, you generally don't checkout out a branch from a central repo, but rather clone a copy of the entire repository for your own local use.
  • Also, rather than using small, sequential integers to identify revisions, Git uses a cryptographic hash (SHA1), although in general you only need to ever write the first few characters of the hash--just enough to uniquely identify a revision.
  • Finally, the biggest thing to get used to: ALL(!) work is done on local branches--there's no such thing in the DSCM world as working directly on a central branch, or checking your work directly into a central branch.

Having said that, distributed revision control is a superset of centralized revision control, and some projects, including ROSE, set up a centralized repository as a policy choice for sharing code between developers. When a developer works on ROSE, they generally clone from this central location, and when they've made changes, they generally push those changes back to the same central location.

Git Convention[edit]

Name and Email[edit]

Before you commit your local changes, you MUST ensure that you have correctly configured your author and email information (on all of your machines). Having a recognizable and consistent name and email will make it easier for us to evaluate the contributions that you've made to our project.

Guidelines:

  • Name: You MUST use your official name you commonly use for work/business, not nickname or alias which cannot be easily recognized by co-workers, managers, or sponsors.
  • Email: You MUST use your email commonly used for work. It can be either your company email or your personal email (gmail) if you DO commonly use that personal email for business purpose.

To check if your author and email are configured correctly:

  $ git config user.name
  <your name>
 
  $ git config user.email
  <your email>

Alternatively, you can just type the following to list all your current git configuration variables and values, including name and email information.

  $ git config -l


To set your name and email:

  $ git config --global user.name "<Your Name>"
  $ git config --global user.email "<your@email.com>"

Branch Naming Convention[edit]

All developer central repository branches should be named using the following pattern

  • LOGIN-PURPOSE-OPTION
    • NAME is typically a login name or surname.
    • PURPOSE is a single-word description of the type of work performed on that branch, such as "bugfixes".
    • OPTION is information for ROSE robots with regards to your branch.
      • -test Changes to the branch are automatically tested
      • -rc Changes are tested and if they pass then they're merged into the "master" branch (like "trunk" in Subversion).
  • EXAMPLE:
    • The "matzke-bugfixes-rc" branch is "owned" by Robb Matzke (i.e., he's the one that generally makes changes to that branch), it probably contains only bug fixes or minor edits, and it's being automatically tested and merged into the master branch for eventual release to the public.

Commit messages[edit]

It is important to have concise and accurate commit messages to help code reviewers do their work.

Example commit message, excerpt from link

(Binary Analysis) SMT solver statistics; documentation

* Replaced the SMT class-wide number-of-calls statistic with a
  more flexible and extensible design that also tracks the amount
  of I/O between ROSE and the SMT solver.  The new method tracks
  statistics on a per-solver basis as well as a class-wide basis, and
  allows the statistics to be reset at artibrary points by the user.

* More documentation for the new memory cell, memory state, and X86
  register state classes.
  • (Required) Summary: the first line of the commit message is a one line summary (<50 words) of the commit. Start the summary with a topic, enclosed in parentheses, to indicate the project, feature, bugfix, etc. that this commit represents.
  • (Optional) Use a bullet-list (using an asterisk, *) for each item to elaborate on the commit

Also see http://spheredev.org/wiki/Git_for_the_lazy#Writing_good_commit_messages.

Push[edit]

Creating and deleting branches on the remote repository is accomplished with git-push.

This is its general form:

$ git push <remote> <source-ref>:<destination-ref>
  • When you clone a repository, the default <remote> is called "origin"
  • The <source-ref> is the branch in your local repository (cloned from <remote>) that you want to create or synchronize with the <remote>
  • The <destination-ref> is the branch that you want to create on the <remote>

Create remote branch[edit]

Example:

$ git remote -v
origin  https://github.com/rose-compiler/rose.git (fetch)
origin  https://github.com/rose-compiler/rose.git (push)

$ git branch
* master

# Method 1
$ git push origin master:refs/heads/master

# Method 2 - The currently checked out branch (see git-branch) is also called the <tt>HEAD</tt>
$ git push origin HEAD:refs/heads/master

# Method 3 - Git is pretty smart -- if you only specify one name, it will use it as both
# the source and destination.
$ git push origin master

Delete remote branch[edit]

Deleting a remote branch is simply a matter of specifying nothing as the <source-ref>. To delete the branch my-branch, issue this git-push command:

$ git push origin :refs/heads/my-branch

Rebase[edit]

It is recommended to rebase your branch before pushing your work. So your local commits will be moved to the head of the latest master branch, instead of being interleaved with commits from master.

git pull origin master
git rebase master

From http://gitready.com/intermediate/2009/01/31/intro-to-rebase.html

Rebase helps to cut up commits and slice them into any way that you want them served up, and placed exactly where you want them. You can actually rewrite history with this command, be it reordering commits, squashing them into bigger ones, or completely ignoring them if you so desire.

Why is this helpful?

  • One of the most common use cases is that you’ve been working on your own features/fixes/etc in separate branches. Instead of creating ugly merge commits for every change that is brought back into the master branch, you could create one big commit and let rebase handle attaching it.
  • Another frequent use of rebase is to pull in changes from a project and keep your own modifications in line. Usually by doing merges, you’ll end up with a history in which commits are interleaved between upstream and your own. Doing a rebase prevents this and keeps the order in a more sane state.

Modifying a submodule[edit]

ROSE uses submodule to link to EDG files.

the default checked out version of submodule is on a ghost branch

  [youraccount@yourmachine:~/rose/src/frontend/CxxFrontend/EDG]git branch
  * (no branch)
    master

You have to create a local branch before you can change the submodule, you should create a new branch based on the ghost branch, which may or may not correspond to the remote master. In our settings, we push EDG changes to non-master branches so most likely the ghost branch is tied to a non-master branch.

$ git checkout -b fix-up

Do you changes then

Always commit and push the submodule changes first

  $ git commit -a -m "Updated the submodule from within the superproject."  // commit locally
  $ git push origin HEAD:refs/heads/your-account/edg4x-rc  // push to your own remote branch

Finally change the super project's link to the changed submodule

  $ cd ..           # back down to its parent repository's path
  $ git add EDG  # Please note: NEVER use "git add EDG/"  !!!! , this will add all files under EDG/ to the super project!!
  $ git commit -m "Updated submodule EDG."
  $ git push

References[edit]

Lattices[edit]

Introduction[edit]

Lattices are mathematical structures. They can be used as a general way to express an order among objects. This data can be exploited in data flow analysis.

Lattices can describe transformations effected by basic blocks on data flow values also known as flow functions.

Lattices can describe data flow frameworks when instantiated as algebraic structures consisting of a set of data flow values, a set of flow functions, and a merge operator.

Poset[edit]

Partial ordering: \le

A partial ordering is a binary relation \le over a set P which is reflexive, antisymmetric and transitive, i.e.

  • Reflexive x<=x
  • Anti-Symmetric, if x\le y, y\le x then x=y
  • Transitive: if  x\le y, y\le z then  x \le z

Partial orders should not be confused with total orders. A total order is a partial order but not vice versa. In a total order any two elements in the set P can be compared. This is not required in a partial order. Two elements that can be compared are said to be comparable

A partially ordered set, also known as a poset, is a set with a partial order.

Given a poset there may exist an infimum or a supremum. However, not all posets contain these.

Given a poset P with set X and order  \le :

An infimum of a subset S of X is an element a of X such that

  •  a\le x for all x in S and
  • for all y in X, if for all x in S,  y \le x then  y\le a

The dual of this notion is the supremum which has the definition of infimum if you switch  \le with  \ge

If we simply pick an element of X that satisfies the first condition we have a lower bound. The second condition ensures that we have (if it exists) the unique greatest lower bound. Similarly for suprema.

A lattice is a particular kind of poset. In particular, a lattice L is a poset P(X, \le where For any two elements of the lattice a and b, the set {a, b} has a join and a meet

The join and meet operations MUST satisfy the following conditions

  • 1) The join and meet must commute
  • 2) The join and meet are associative
  • 3) The join and meet are idempotent, that is, x join itself or x meet itself are both x.

If the lattice contains a meet it is a meet-semilattice, if a lattice contains a join it is a join-semilattice, similarly there exists a meet-semilattice

(Definitions obtained from wikipedia with minimal modification)

Lattice Definition[edit]

Definition of a Lattice (L, \land ,  \lor )

  • L is a poset under \le such that
    • Every pair of elements has a unique greatest lower bound (meet) and least upper bound (join)
    • Not every poset is a lattice: greatest lower bounds and least upper bounds need not exist in a poset.

Infinite vs. Finite lattices[edit]

  • Infinite: An infinite lattice does not contain an 0 (bottom) or 1 (top) element, even though every pair of elements contains a greatest lower bound and a least upper bound on the entire underlying set. By the definition of unbounded or infinite sets we know that given X an unbounded set given any x in X we can find an x' that is greater than x (under some ordering, in this case the lattice). Similarly for greatest lower bounds.
  • a finite/bounded lattice: the underlying set itself has a greatest lower bound and a least upper bound, For now we will call the greatest lower bound 0 and the least upper bound 1.
    • if a\le x, for all x in L, then a is the 0 element of L, \bot, recall that this is a unique element
    • if a\ge x for all x from L, then a is the 1 element of L, \top


Meet  \land is a binary operation such that a  \land b take the greatest lower bound of the set (this is guaranteed by the definition lattice.

Similarly Join  \lor returns the least upper bound of the set, guaranteed to exist by the definition of a lattice.

To recap, a lattice L is a triple {X, \land, \lor} composed of a set, a Meet function, and a Join function

Properties of Meet and \land.

  • We refer to the \lor as \lor and the \land as J
  • Closure: If x and y belong to L, then there exists a unique z and a unique w from L such that x \lor y = z, and x \land y = w
  • Commutativity: for all x, y in L, x \lor y = y meet x, x \land y = y \land x:
  • Associativity: (x \lor y) \lor z = x \lor (y \lor z), similarly in the \land operation
  • There are two unique elements of L called bottom ( _|_ ), and top (T) , such that for all x, x \lor _|_ = _|_ and x \land T = T
  • Many lattices, with some exceptions, notably the lattice corresponding to constant propagatioin, are also distributive: x \lor y \landz = (x \landz) \lor (y \landz)

Lattices and partial order:

x \sqsubseteq y if and only if x \sqcap y = x

A strictly ascending chain is a sequence of elements of a set X such that, for x_i in X, x_1, x_2, ..., x_n has the property \bot = x_1 \sqsubset x_2 \sqsubset ... \sqsubset x_n = \top . The greatest is the chain with final index n such that n is the greatest such final index among all strictly ascending chains.

The height of a lattice is defined as the length of the longest strictly ascending chain it contains.

If a data-flow analysis lattice has a finite height and a monotonic flow function then we know that the associated data flow analysis algorithm will terminate.

  • Example: If the greatest strictly ascending chain of a lattice L is finite and it takes finitely many steps to reach the top, we can infer that the associated data flow algorithm terminates.

(wikipedia used for definitions)

Example: Bit vector Lattices[edit]

  • The elements of the set are bit vectors
  • The bottom is the 0 vector
  • The top is a 1 vector
  • Meet is a bitwise And
  • Join is a bitwise Or

BV^n denotes the lattice of bit vectors of length n.

Constructing complex lattices from multiple less complex lattices

  • Example: The product operation which combines (concatenates) lattices elementwise
    • The product of two lattices L1 and L2 with meet operators M1, M2, respectively: L1 x L2
    • The elements in the lattice: {<x1, x2> | x1 from L1, x2 from L2}
  • The meet operator: <x1, x2> M <y1, y2> = <x1 M y1, x2 M y2>
  • The join operation: <x1, x2> J <y1, y2> = <x1 J y1, x2 J y2>
  • Example:
    • BV^n is the product of n copies of the trivial bit vector attice BV^1 with bottom 0 and top 1

Graphical Representation BV^3

          111
     /     |    \
110       101      011
 |    x        x   \
100       010      001
    \     |     /
          000


Here meet and join operators induce a partial order on the lattice elements

x is less than or equal to (<=) y if an only if x M y = x

For the BV^3: 000<= 010 <= 101<=111


The partial order on the lattice is:

  • Transitive x <= y and y <= z, then x <=z
  • Antisymmetric: if x<=y and y<=x, then x = y
  • Reflexive: for all x: x<=x:

The height of the lattice is the length of its longest strictly ascending chain:

  • The maximal n such that there exists a strictly ascending chain x1, x2, ..., xn such that
  • Bottom = x1 < x2 < xn = Top

For BV^3 lattice, height = 4

Monotonic Functions[edit]

A monotonic function is a function that preserves an ordering.

Examples[edit]

A function f from L to itself, f: L -> L, is monotonic if for all x, y from L, x<=y ==> f(x)<=f(y)

f: BV^3 -> BV^3: f (<x1 x2 x3>) -> <x1 1 x3>

Lattice Tuples[edit]

Simple analyses may require complex lattices:

  • Problem:
    • Reaching Constants: V 2^(v*c) where v is the number of variables and c is the constants
  • Solution:
    • Construct a tuple of lattices where each lattice corresponds to a variable

V = constant U {Top, Bottom}


integer value: ICP[edit]

This is used in constant propagation Elements: Top, Bottom, Integers, Booleans

  • n M Bottom = Bottom
  • n J Top = Top
  • n J n = n M n = n
  • Integers and booleans m,n, if m != n, then m M n = Bottom, m J n = Top
    • The lattice has three levels: the top element, all other elements, the bottom element
    • Join operation: Higher level to lower level
    • Meet operation: Lower level to higher level

Relevance to data flow analysis[edit]

A lattice provides a set of flow values to a particular data flow analysis.

Lattices are used to argue the existence of a solution obtainable through fixed-point iteration

  • At each program point a lattice represents an IN[p] or OUT[p] set (flow value)
  • meet: merge flow values, e.g. set union, deal with control flow branches merge
  • Top usually represents the best information (initial flow value). Note people can use top to represent worst-base information also!!
  • The bottom value represents the worst-base information
  • if BOTTOM <= x <= y <= TOP , then x is a conservative approximation of y. e.g. x is a superset

e.g. liveness analysis[edit]

bitvector for all variables x_1, x_2, ..., x_n

First step: design the lattice values

  • top value: empty set {}, initial value, knowing nothing
  • bottom value: all set {x_1, x_2, ..., x_n}: max possible value, knowing every variable is live

n = 3, 3 variable case: a flow value==> a set of live variable at a point

S = {v1, v2, v3}

value set: 2^3 = { empty, {v1},{v2}, {v3}, {v1, v2}, {v1,v3}, {v2, v3}, {v1, v2, ve} }

Design lattice

  • top value, best case: none live { T } // top
  • bottom value, worst ase: all live {v1, v2, v3}

Design meet: set Union (Or operation): bring the value down to the bottom, context insensitive

  • design partial order <= --> \supseteq


In between, a partial order: inferior/conservative solutions are lower on the lattice

         Top
      /    |   \
   {v1}   {v2}  {v3}
    |    x      x   |  
{v1, v2}  {v1,v3}  {v2,v3}
      \     |      /
      {v1, v2, v3} = Bottom


Flow function F:  f_n(X) = Gen_n \cup (X -Kill_n), \forall_n }

reaching definition[edit]

Value: 2^n n = number of all definitions

top: empty set: knowing nothing, initial value

bottom: all set: all definitions are reaching definition

Meet operation: set union: bring down the levels of values, from unknowing to knowing

C++ Programming[edit]

ROSE is written in C++. Some users have suggested to mention the major C++ programming techniques used in ROSE so they can have more focused learning experiences as C++ beginners.


Design Patterns: ROSE uses some common design patterns

Good API Design[edit]

Google: "How to Design a Good API and Why it Matters" by Joshua Bloch

TODO: convert from Markdown

Characteristics of a Good API[edit]

  • Easy to learn
  • Easy to use, even without documentation
  • Hard to misuse
  • Easy to read and maintain code that uses it
  • Sufficiently powerful to satisfy requirements
  • Easy to extend
  • Appropriate to audience

The Process of API Design[edit]

  • Gather true requirements in the form of use-cases
  • Start with a short 1-page specification
    • Agility trumps completeness
    • Collect a lot of feedback
  • Use your API early and often

[T]he repetition of a very short development cycle: first the developer writes a failing automated test case that defines a desired improvement or new function, then produces code to pass that test and finally refactors the new code to acceptable standards.

    • Doubles as examples/tutorials and unit tests
  • Maintain realistic expectations
    • You won't be able to please everyone... aim to displease everyone equally
    • Expect to evolve API; mistakes happen; real-world usage is necessary

General Principles[edit]

  > [A] measurement of actual performance [power / weight]
  • Don't give users a gun to shoot themselves with
    • Information hiding: minimize the accessibility of everything

Documentation Matters[edit]

  • Class: what an instance represents
* Method: contract between method and calling client (preconditions, postconditions, and side-effects)
* Parameter: indicate units, form, ownership

Pre- and Post- Conditions

  • The precondition statement indicates what must be true before the function is called.
  • The postcondition statement indicates what will be true when the function finishes its work.
  /// \post <return_value>.empty() == false
  

API vs. Implementation[edit]

Implementation details should not impact the API. Don't let implementation details "leak" into the API.

Performance

  • Design for usability, refactor for performance
  • Do not warp the API to gain performance
  • Effects of API design decisions on performance are real and permanent:
    • Component.getSize() returns Dimension
    • Dimension is mutable
    • Each getSize call must allocate Dimension
    • Causes millions of needless object allocations

"Harmonize"[edit]

  • API must coexist peacefully with platform
    • Do what is customary (standard)
    • Avoid obsolete parameter and return types
    • Mimic patterns in core APIs and language
    • Take advantage of API-friendly features: generics, varargs, enums, default arguments
  • Don't make the client do anything the module could do
    • Reduce need for boilerplate code
  • Don't violate the [Principle of Least Astonishment](http://en.wikipedia.org/wiki/Principle_of_least_astonishment)
  > The design should match the user's experience, expectations, and mental models...aims to exploit users' pre-existing knowledge as a way to minimize the learning curve
  • Provide programmatic access to all data available in string form => no client string parsing necessary
  • Overload with care: ambiguous overloadings

Names Matter[edit]

  • Largely self-explanatory (avoid cryptic abbreviations)
  • Be consistent (e.g. same word means same thing)
  • Strive for symmetry
  • Should read like [prose](http://en.wikipedia.org/wiki/Prose)
 > [T]he most typical form of language, applying ordinary grammatical structure and natural flow of speech rather than rhythmic structure (as in traditional poetry).
      if (car.speed() > 2 * SPEED_LIMIT)
        generateAlert("Watch out for cops!");
    

Input Parameters[edit]

  • interface types over classes: flexibility, performance
  • most specific possible type: moves error from runtime to compile time
  • use double (64 bits) rather than float (32 bits): precision loss is real, performance loss negligible
  • consistent ordering:
      #include <string.h>
      char *strcpy (char *dest, char *src);
      void bcopy   (void *src,  void *dst, int n); // bad!
    
  • short parameter lists: 3 or fewer; more and users will have to refer to docs; identically typed params harmful
    • Two techniques for shortening: 1) break up method, 2) create helper class to hold parameters

Return Values[edit]

  • Avoid values that demand exceptional processing
 > For example, return a `zero-length array` or `empty collection`, not `null`

Exceptions[edit]

  • don't force client to use exceptions for control flow
  • don't fail silently
  • favor unchecked exceptions
  • include failure-capture diagnostic information
  • Fail fast: report errors ASAP
    • Compile time: static typing, generics
    • Run time: error on first bad method invocation (should be failure-atomic)

Who is using ROSE[edit]

We are aware of the following ROSE users (people who write their own ROSE-based tools). They are the reason of the ROSE's existence. Feel free to add your name if you are using ROSE.

Universities[edit]

  • University of California, San Diego, CUDA code generator link
  • University of Utah, compiler-based parameterized code transformation for autotuning
  • University of Oregon, performance tool TAU
  • University of Wyoming, OpenMP error checking
  • Tokyo Institute of Technology
  • RENCI (RENaissance Computing Institute)
  • Indian Institute of Technology Kanpur

DOE national laboratories[edit]

  • Argonne National Laboratory, performance modeling

Companies[edit]

  • Samsung: its research center at San Jose uses ROSE for multicore research and development.

TODO List[edit]

What is missing (so you can help if you want)

How to backup/mirror this wikibook?[edit]

Just in case this website is down, how to download a backup of this wiki book?

How to set up a mirror wiki website containing the wikibook of ROSE?

Maintain the print version[edit]

It is possible that new chapters are added but they are not reflected in the one-page print version. So periodical synchronization is needed by including more chapters or re-arranging their order in the one-page print version.

Observations:

  • A print version is similar to a source file with included contents, each included chapter will have a first level of heading
  • Because the first level heading (=) is used by the print version page to include all chapters, all included pages/chapters should NOT contain any first level heading.

With the basic understanding of how this work, you can now edit the print version's wiki page:

  • Print version

More at: http://en.wikibooks.org/wiki/Help:Print_versions

Maintain the better pdf file[edit]

The pdf version automatically generated from the print version page is rudimentary. It has no table of content and pagination etc.

So we used a manual process to generate better pdf file. We need to occasionally repeat this process to have a up-to-date and better pdf file.

Here are the manual steps:

  • Use your web browser to open and save the print version to your own computer as "web page complete"
  • use the HTML-compatible word processor of your choice to open the html file, convert html to a format the word processor, and add paginate the book.
    • In Microsoft Word, this can done by
      • opening the saved HTML file
      • saving it to a word file
      • adding table of content by selecting Insert > Field > Index and Tables > TOC or Preferences-> Table of contents for Word 2012 or later.
      • adding page numbers to the footer
      • save it to a pdf file with a name like ROSE_Compiler_Framework.pdf
      • upload to wikibooks

To add a link to your wikibook page, insert

{{PDF version|pdf file name without .pdf|size kb, number pages|file description}}

For example

{{PDF version|ROSE_Compiler_Framework|840 kb, 48 pages|ROSE_Compiler_Framework}}

More background about pdf verions: at: http://en.wikibooks.org/wiki/Help:Print_versions

Documentation Alternatives[edit]

1. Google Docs: comments, different output formats, easy collaboration 2. AsciiDoc

Sandbox[edit]

Some common tricks to write things on wikibooks/wikipedia (both are using the mediawiki software).

How to create a new page[edit]

Usually you have to start a new page from an existing wikipage.

Go to the wiki page you want to have a link to the new page you want to create

  • click the edit tab the existing page
  • at the place you want to have a link to the new page, use
     [[ROSE_Compiler_Framework/name of the page]]
    
    .
  • If there is already a page with the desired name. It will become a link to the page.
  • If not, the link is red so you can click the red link to enter editing model to add content to the page.

Please link the new page to the print version of this wikibook so it can be visible in the print out.

How to do XYZ in wiki?[edit]

The best way is to goto en.wikipedia.com and find a page with the output you want. Then pretend to edit the page (by clicking edit) to see the source used to generate the output.

For example, you want to know how C++ syntax highlighting is obtained in wikibook. Go to en.wikipedia.com and find the page for C++. There must be sample code snippet.

Then you pretend to edit it to see the source: http://en.wikipedia.org/w/index.php?title=C%2B%2B&action=edit&section=6

You will see the source code generating the syntax highlighting:

<source lang="cpp">

# include <iostream>

int main()
{
   std::cout << "Hello, world!\n";
}
</source>

How to add comments which are only visible to editor, not readers of a page?[edit]

Use the HTML comments: for example, the following comment will not show up in the paper rendered. But it is visible to editor to reminder why things are done in certain way.

<!-- Please keep the pixel size to 400 so they are clean in the pdf version, Thanks!  -->
[[File:Rose-compiler-code-review-1.png|thumb|400px|Code review using github.llnl.gov]]

Syntax highlighting[edit]

Copied from http://en.wikipedia.org/w/index.php?title=C%2B%2B&action=edit&section=6

<source lang="cpp">

# include <iostream>

int main()
{
   std::cout << "Hello, world!\n";
}
</source>

Can generate the following highlighted code:

# include <iostream>
 
int main()
{
   std::cout << "Hello, world!\n";
}

Math formula[edit]

You can pretend to edit this section to see how math formula are written.

More resources are at

\sum_{j=1}^N (Si,j) =1


 \begin{matrix}
 \log_2(n!) & = & \log_2(n) + \log_2(n-1) + \log_2(n-2) + ... + \log_2(1) \\
            &  &  \qquad \ \ \ \ \log_2(n) + \log_2(n) + \log_2(n) + ... + \log_2(n) \\
            &  &  \qquad \ \ \ \ \qquad \ \ \ \ \qquad \ \ \ \ \qquad \ \ \ \ \qquad \ \ \ \ \ \ \ n log_2(n)
\end{matrix}

 \begin{align}
\log_2(n!) &= \log_2(n) + \log_2(n-1) + \log_2(n-2) + ... + \log_2(1) \\
&< \log_2(n) + \log_2(n) + \log_2(n) + ... + \log_2(n) \\
&= n log_2(n)
\end{align}


\begin{array}{lcr}
  z        & = & a \\
  f(x,y,z) & = & x + y + z     
\end{array}


  \operatorname{erfc}(x) =
  \frac{2}{\sqrt{\pi}} \int_x^{\infty} e^{-t^2}\,dt =
  \frac{e^{-x^2}}{x\sqrt{\pi}}\sum_{n=0}^\infty (-1)^n \frac{(2n)!}{n!(2x)^{2n}}