Installing Torque 6.0.1 on CentOS 7
This guide covers installing Torque 6.0.1 with its native scheduler, both with rsh and with ssh. The instructions are modified from Adaptive Computing's guide.
Prerequisites
- Setup either rsh or passwordless ssh between the headnode and compute nodes in both directions.
- Install btools (or be prepared to do things manually on compute nodes).
- Install libtool, openssl-devel, libxml2-devel, boost-devel, gcc, gcc-c++, and git.
- Allow all network traffic between headnode and compute nodes.
- Share /home with all nodes over NFS.
- Ensure all nodes (including the headnode) have identical /etc/hosts files including an entry for each node.
- Ensure all nodes know all users (bsync accomplishes this).
Download Torque
Clone the source from github:
# git clone https://github.com/adaptivecomputing/torque.git -b 6.0.1 6.0.1
# cd 6.0.1
# ./autogen.sh
# cd 6.0.1
# ./autogen.sh
Install Torque on headnode - with rsh
Configure, make, and install:
# ./configure --with-rcp=rcp
# make
# make install
# make
# make install
or Install Torque on headnode - with ssh
Configure, make, and install:
# ./configure
# make
# make install
# make
# make install
Configure Torque on headnode
Make sure Torque is using the hostname on the external network (e.g., ernst.chem.hope.edu):
# echo [correct_hostname] > /var/spool/torque/server_name
Configure the library path
# echo "/usr/local/lib" > /etc/ld.so.conf.d/torque.conf
# ldconfig
Populate /var/spool/torque/server_priv/nodes. A basic example follows:# ldconfig
node01 np=1
node02 np=1
Start the trqauthd daemon:node02 np=1
# cp contrib/systemd/trqauthd.service /usr/lib/systemd/system/
# systemctl enable trqauthd.service
# systemctl start trqauthd.service
Initialize serverdb:# systemctl enable trqauthd.service
# systemctl start trqauthd.service
# ./torque.setup root
# qterm
Start pbs_server:# qterm
# cp contrib/systemd/pbs_server.service /usr/lib/systemd/system/
# systemctl enable pbs_server.service
# systemctl start pbs_server.service
# systemctl enable pbs_server.service
# systemctl start pbs_server.service
Install Torque MOMs on compute nodes
On the server, in the source directory, build packages for the nodes:
# make packages
Copy contrib/systemd/pbs_mom.service to /usr/lib/systemd/system/ on all compute nodes:
# bpush contrib/systemd/pbs_mom.service /usr/lib/systemd/system/
Install torque-package-mom-linux-x86_64.sh and torque-package-clients-linux-x86_64.sh to all compute nodes:
# bpush torque-package-mom-linux-x86_64.sh
# bpush torque-package-clients-linux-x86_64.sh
# bexec ./torque-package-mom-linux-x86_64.sh --install
# bexec ./torque-package-clients-linux-x86_64.sh --install
Configure the compute node library paths: # bpush torque-package-clients-linux-x86_64.sh
# bexec ./torque-package-mom-linux-x86_64.sh --install
# bexec ./torque-package-clients-linux-x86_64.sh --install
# bpush /etc/ld.so.conf.d/torque.conf /etc/ld.so.conf.d/
# bexec /sbin/ldconfig
Make sure the nodes are using the headnode's hostname on the internal node network for the server name (e.g., ernst00):# bexec /sbin/ldconfig
# bexec 'echo [correct_hostname] > /var/spool/torque/server_name'
Start the pbs_mom service:
# bexec systemctl enable pbs_mom.service
# bexec systemctl start pbs_mom.service
# bexec systemctl start pbs_mom.service
Configure the scheduler
On the headnode, copy the scheduler service file to the correct location:
# cp contrib/systemd/ pbs_sched.service /usr/lib/systemd/system/
Enable and start the scheduler:
# systemctl enable pbs_sched.service
# systemctl start pbs_sched.service
# systemctl start pbs_sched.service
Test the system
Verify that you can ssh or rsh from the compute node to the headnode as the user that is to be running the jobs.Make sure all nodes are reporting:
# pbsnodes -a
As a non-root user, run a test interactive job:
$ qsub -I
Exit from the resulting shell and run a job that returns something:
$ echo "date" | qsub
If successful, two files STDIN.oXX and STDIN.eXX should appear in your working directory. If not, you should receive mail with an error report. Look at a job while it is running:
$ echo "sleep 10" | qsub
$ qstat
This should display that the queue has a running job in it.$ qstat
Systemd (Centos 7+, Ubuntu 15+, ...)
If installing torque on a newer linux OS running "systemd", disable the default behavior of creating a "private" /tmp directory for services, which breaks the qsub/qstat commands.Edit /usr/lib/systemd/system/httpd.service (CentOS, Debian, Ubuntu) or /etc/systemd/system/httpd.service (SuSe) and set:
PrivateTmp=false
Restart the daemons:
$ sudo systemctl daemon-reload
$ sudo systemctl restart httpd
$ sudo systemctl restart httpd
To uninstall Torque,
Stop the services
# bexec systemctl stop pbs_mom.service
# systemctl stop pbs_sched.service
# systemctl stop pbs_server.service
# systemctl stop trqauthd.service
# bexec systemctl disable pbs_mom.service
# systemctl disable pbs_sched.service
# systemctl disable pbs_server.service
# systemctl disable trqauthd.service
Remove added files:# systemctl stop pbs_sched.service
# systemctl stop pbs_server.service
# systemctl stop trqauthd.service
# bexec systemctl disable pbs_mom.service
# systemctl disable pbs_sched.service
# systemctl disable pbs_server.service
# systemctl disable trqauthd.service
# bexec rm -f /usr/lib/systemd/system/pbs_mom.service
# rm -f /usr/lib/systemd/system/pbs_sched.service
# rm -f /usr/lib/systemd/system/pbs_server.service
# rm -f /usr/lib/systemd/system/trqauthd.service
# bexec rm -f /etc/ld.so.conf.d/torque.conf
# rm -f /etc/ld.so.conf.d/torque.conf
# make uninstall
On the compute nodes, delete files listed by:# rm -f /usr/lib/systemd/system/pbs_sched.service
# rm -f /usr/lib/systemd/system/pbs_server.service
# rm -f /usr/lib/systemd/system/trqauthd.service
# bexec rm -f /etc/ld.so.conf.d/torque.conf
# rm -f /etc/ld.so.conf.d/torque.conf
# make uninstall
# ./torque-package-mom-linux-x86_64.sh -l
and by:
# ./torque-package-clients-linux-x86_64.sh -l