Installing Torque 6.0.1 on CentOS 7

This guide covers installing Torque 6.0.1 with its native scheduler, both with rsh and with ssh. The instructions are modified from Adaptive Computing's guide.

Prerequisites

Download Torque

Clone the source from github:
# git clone https://github.com/adaptivecomputing/torque.git -b 6.0.1 6.0.1
# cd 6.0.1
# ./autogen.sh

Install Torque on headnode - with rsh

Configure, make, and install:
# ./configure --with-rcp=rcp
# make
# make install

or Install Torque on headnode - with ssh

Configure, make, and install:
# ./configure
# make
# make install

Configure Torque on headnode

Make sure Torque is using the hostname on the external network (e.g., ernst.chem.hope.edu):
# echo [correct_hostname] > /var/spool/torque/server_name
Configure the library path
# echo "/usr/local/lib" > /etc/ld.so.conf.d/torque.conf
# ldconfig
Populate /var/spool/torque/server_priv/nodes. A basic example follows:
      node01 np=1
      node02 np=1
Start the trqauthd daemon:
# cp contrib/systemd/trqauthd.service /usr/lib/systemd/system/
# systemctl enable trqauthd.service
# systemctl start trqauthd.service
Initialize serverdb:
# ./torque.setup root
# qterm
Start pbs_server:
# cp contrib/systemd/pbs_server.service /usr/lib/systemd/system/
# systemctl enable pbs_server.service
# systemctl start pbs_server.service

Install Torque MOMs on compute nodes

On the server, in the source directory, build packages for the nodes:
# make packages
Copy contrib/systemd/pbs_mom.service to /usr/lib/systemd/system/ on all compute nodes:
# bpush contrib/systemd/pbs_mom.service /usr/lib/systemd/system/
Install torque-package-mom-linux-x86_64.sh and torque-package-clients-linux-x86_64.sh to all compute nodes:
# bpush torque-package-mom-linux-x86_64.sh
# bpush torque-package-clients-linux-x86_64.sh
# bexec ./torque-package-mom-linux-x86_64.sh --install
# bexec ./torque-package-clients-linux-x86_64.sh --install
Configure the compute node library paths:
# bpush /etc/ld.so.conf.d/torque.conf /etc/ld.so.conf.d/
# bexec /sbin/ldconfig
Make sure the nodes are using the headnode's hostname on the internal node network for the server name (e.g., ernst00):
# bexec 'echo [correct_hostname] > /var/spool/torque/server_name'
Start the pbs_mom service:
# bexec systemctl enable pbs_mom.service
# bexec systemctl start pbs_mom.service

Configure the scheduler

On the headnode, copy the scheduler service file to the correct location:
# cp contrib/systemd/ pbs_sched.service /usr/lib/systemd/system/
Enable and start the scheduler:
# systemctl enable pbs_sched.service
# systemctl start pbs_sched.service

Test the system

Verify that you can ssh or rsh from the compute node to the headnode as the user that is to be running the jobs.
Make sure all nodes are reporting:
# pbsnodes -a
As a non-root user, run a test interactive job:
$ qsub -I
Exit from the resulting shell and run a job that returns something:
$ echo "date" | qsub
If successful, two files STDIN.oXX and STDIN.eXX should appear in your working directory. If not, you should receive mail with an error report.
Look at a job while it is running:
$ echo "sleep 10" | qsub
$ qstat
This should display that the queue has a running job in it.

Systemd (Centos 7+, Ubuntu 15+, ...)

If installing torque on a newer linux OS running "systemd", disable the default behavior of creating a "private" /tmp directory for services, which breaks the qsub/qstat commands.
Edit /usr/lib/systemd/system/httpd.service (CentOS, Debian, Ubuntu) or /etc/systemd/system/httpd.service (SuSe) and set:
PrivateTmp=false
Restart the daemons:
$ sudo systemctl daemon-reload
$ sudo systemctl restart httpd

To uninstall Torque,

Stop the services
# bexec systemctl stop pbs_mom.service
# systemctl stop pbs_sched.service
# systemctl stop pbs_server.service
# systemctl stop trqauthd.service
# bexec systemctl disable pbs_mom.service
# systemctl disable pbs_sched.service
# systemctl disable pbs_server.service
# systemctl disable trqauthd.service
Remove added files:
# bexec rm -f /usr/lib/systemd/system/pbs_mom.service
# rm -f /usr/lib/systemd/system/pbs_sched.service
# rm -f /usr/lib/systemd/system/pbs_server.service
# rm -f /usr/lib/systemd/system/trqauthd.service
# bexec rm -f /etc/ld.so.conf.d/torque.conf
# rm -f /etc/ld.so.conf.d/torque.conf
# make uninstall
On the compute nodes, delete files listed by:
# ./torque-package-mom-linux-x86_64.sh -l
and by:
# ./torque-package-clients-linux-x86_64.sh -l