WebMO Help - Batch Queues

External Batch Queues

WebMO Enterprise allows jobs to be submitted to an external batch queue and then run on compute nodes.

A batch queue is a job control systems for scheduling and running jobs on remote compute nodes. A batch queue allows for the coordination of all computationally intensive jobs submitted on a system, including both WebMO and non-WebMO jobs

Batch queues are commonly found on computer clusters, in which users submit jobs on a head node that are then run on compute nodes.

The built-in WebMO queue coordinates running of WebMO jobs by a single WebMO installation. An external batch queue coordinates computationally intensive jobs from multiple WebMO and non-WebMO sources.

Requesting Batch Queue Resources

If an external batch queue is installed and enabled, WebMO users can request a specific batch queue from the Choose Engine page and/or computational resources from the Advanced tab of the Job Options page.

Managing Batch Queues

The system administrator must install an external batch queue on the system and associated compute nodes, in order for WebMO to be able to use it.

The WebMO admin user administers groups with the Batch Queue Manager. The admin user can:

Preparing for Batch Queues

Batch queue software must be installed on the head node/web server and on compute nodes prior to its use by WebMO. Common batch queue systems include: PBS, Torque/Maui, Sun Grid Engine, LSF, SLURM.

To enable interaction with queuing systems, the following criteria must be met:

Some queueing systems require additional configuration:

The script ".webmo_profile" in <webmoUserDir>, if it exists, will be sourced as part of the script that is sent to PBS/SLURM/etc. This script serves the same purpose as a typical .profile or .bash_profile script that is normally sourced during login. It allows sysadmin to do additional module loads, adjust the path, etc. to better configure the environment, in particular for MPI. Note that this script can utilize the WEBMO_ENGINE and WEBMO_ENGINE_VERSION environmental variables to configure the environment for a particular computational engine.

Before enabling a batch queue, verify that the webmo system user can submit jobs to the batch queuing system from the command line:

Troubleshooting Batch Queues

1. Check the batch queue (PBS, SLURM, etc) logs. Was the job submitted? What UID was it submitted under? Why was the job terminated?

2. Run the WebMO batch script from the command line. The EXACT WebMO script that was submitted to the batch queue is stored in the job directory (~webmo/webmo/<username>/<job number>) as pbs_script.sh. The EXACT command used to submit the job is contained in a comment at the top of the script. Therefore, an excellent diagnostic test is to login at the command line as system user "webmo" and use that EXACT same command to submit pbs_script.sh. This command will involve sudo if system users are set up. It should work without a password if passwordless sudo is correctly setup. Since this is the same process used by WebMO, it will work (or fail) in the same way as when WebMO submits the job, but problems are much easier to diagnose from the command line.

A sample command line session follows:

$ cat /home/webmo/webmo/smith/163/pbs_script.sh
#!/bin/sh
# Submitted using: /usr/bin/sudo -u smith /usr/bin/sbatch -J WebMO_163 -o /home/webmo/webmo/smith/163298/pbs_stdout -e /home/webmo/webmo/smith/163/pbs_stderr -p 'hour' --nodes=1 --tasks-per-node=1
...
$ /usr/bin/sudo -u smith /usr/bin/sbatch -J WebMO_163 -o /home/webmo/webmo/smith/163/pbs_stdout -e /home/webmo/webmo/smith/163/pbs_stderr -p 'hour' --nodes=1 --tasks-per-node=1 /home/webmo/webmo/smith/163/pbs_script.sh
Submitted batch job 25687
$ squeue -u smith
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
25687 hour WebMO_163 smith R 0:03 1 hpc201
$ tail /home/webmo/webmo/smith/163/output.out
...