Torque[edit | edit source]
Torque is an open source resource manager based on the original PBS project (http://www.pbsworks.com/). It is responsible to start, delete or to monitor jobs and thus supports a scheduler that could not manage the jobs without these functions otherwise. Therefore Torque brings it its own scheduler (pbs_sched), but you can also use other. Torque is flexible enough to perform space planning, but is used mostly in clusters. How to install and configure Torque for simple jobs on a cluster is described below. To install the latest version of Torque, you should not use the package from Ubuntu, but the package from the following website: http://www.adaptivecomputing.com/products/open-source/torque/.
Download Torque[edit | edit source]
Download the files in the master (here we used version 4.1.4).
[edit | edit source]
$ tar -xzvf torque-4.1.4.tar.gz
$ cd torque-4.1.4/
When configuring and installing one remains best in this directory.
Configure and install the package on the master[edit | edit source]
Set Directory[edit | edit source]
By default make install installs all files in in
You can also specify a different folder where the files should be stored by putting
-–prefix=$directoryname behind ./configure. So If you don't want to change anything, you do not need to consider this step.
Set Library Folder[edit | edit source]
Create a new file:
$ sudo nano /etc/ld.so.conf.d/torque.conf
There you write the path to the libraries. In the standard setting, it would be
home defined as a directory it would be
/home/lib). Then enter the following command:
$ sudo ldconfig
Perform Configure[edit | edit source]
To execute configure you have to install build-essentials, libssl-devel and libxml2-devel with this command:
$ sudo apt-get install build-essentials libssl-dev libxml2-dev
If you execute ./configure you will get an error that libxml2-devel isn't installed. This is a bug in Torque and can be fixed with following steps:
Firstly two lines in the configure.ac file need to be changed (see screenshot).
$ sudo nano configure.ac
The minus describes the line that needs to be changed, the plus describes how the line should read after the change. It is best to look for a keyword for the line to be changed because the file has a lot of lines.
After that execute autoconf:
$ sudo autoconf
and change the configure file:
sudo nano configure
Again, you look for the yellow marked line and change in the end (red rectangle) the -1 in a -l.
Now you can run ./configure and it should finish without errors.
In the end also run make and make install.
sudo make install
By default, make install creates the directory
/var/spool/torque. This directory is referred to as TORQUE_HOME. There, various subfolders are created that are used to configure and run the program.
Install Torque on the Nodes[edit | edit source]
Create packages[edit | edit source]
Torque has the function to create the packages, which uses the configurations and then can be installed on the nodes. Use the command make for this.
The packages are stored in the torque-4.1.4/ and must be copied from there in a shared directory the nodes have access to. In our case it would be the /home directory.
cp torque-package-mom-linux-i686.sh /home
On the nodes only the mom-linux package is needed. All others are optional.
Install Package[edit | edit source]
On the node you navigate to the directory in which you have copied the package and install it with the following command:
Torque Konfigurieren[edit | edit source]
Initialise serverdb[edit | edit source]
In the directory TORQUE_HOME
/server_priv are configurations and information located that the pbs_server Service uses. To initialise the file serverdb run following command:
Then the pbs_server needs a restart.
The server properties can be see by the following command:
sudo qmgr -c ’p s’
Specify Nodes[edit | edit source]
Thus, the pbs_server recognizes which computers in the network are the nodes. For this create in the directory TORQUE_HOME
/server_priv a new file nodes:
sudo nano nodes
In this file, the nodes will be stored with their name. Normally it is sufficient to write the names in the file, you may set special properties for each node. The syntax is:
NodeName[:ts] [np=] [gpus=] [properties]
[:ts]: This option sets the node as timeshared. These nodes are indeed listed by the server, but do not get jobs allocated.
[np=] This option is used to specify how many virtual processors are located on the nodes.
[gpus=] This option is used to specify how many CPUs are on the node.
[properties] This option allows to enter a name to identify the node. However, it must start with a letter.
One can detect the number of processors also automatically:
sudo qmgr -c set server auto_node_np = True
As a result, properties in the server auto_node_np are set to True.
To configure the nodes, the file config in the directory TORQUE_HOME
/mom_priv has to be created:
sudo nano config
This file is created the same on all nodes and should read the following:
Furthermore, one must write the line $usecp*:/home /home write into it. This ensures that the file of the finished jobs is stored in a specific directory (here the shared /home). Otherwise the following error will occur when running the command tracejob:
Execute Job[edit | edit source]
In order for a job to be performed at least 4 services must be started . On the master that are pbs_server, pbs_sched and trqauthd. On the nodes that is pbs_mom:
sudo sudo trqauthd
Run Job[edit | edit source]
The command qsub [file name], executed on the master, starts a job. To run a job, you need a Bash file. In the example above, the date is displayed, wait 10 seconds and then again output the date. The result is then stored in the directory on the master from which the job was started.
Useful Commands[edit | edit source]
There are some commands in Torque with which you can trace the running jobs and which are very useful for troubleshooting.
, executed on the master, shows if a node is active or not. With the command
a list of running or finished jobs is displayed.
There you can see which number a job has which node is used and whether the job is started, in progress or has already ended.
A very useful command for debugging is
tracejob [job number]
This is a command from Torque which searches and summarizes the log files in the pbs_server, mom and scheduler. With this one gets a quick overview.