|Posted on Wednesday, August 01, 2018 - 8:23 pm: |
I am attempting to submit jobs via SLURM from a VM running WebMO to an HPC cluster running SLURM. I seem to be running into a problem with the qsub command, and can't figure out the problem based on the run_log.
Initially I was getting the following error:
Value "t1small" invalid for option p (number expected)
Unknown option: nodes
Unknown option: tasks-per-node
qsub [-a start_time] [-A account] [-b y|n] [-e err_path] [-I] [-l
resource_list] [-m mail_options] [-M user_list] [-N job_name] [-o
out_path] [-p priority] [-pe shm task_cnt] [-P wckey] [-q destination]
[-r y|n] [-v variable_list] [-V] [-wd workdir] [-W
additional_attributes] [-h] [script]
I was able to resolve the first three messages by changing "-p '$queue'" to "-q '$queue'" in public_html/cgi-bin/webmo/daemon_pbs.cgi. I also changed "--nodes=$nodes --tasks-per-node=$ppn" to "-l nodes=$nodes:ppn=$ppn" in the same line. However, I'm not sure what's generating the usage message, and the job fails immediately.
If I cd into the job directory, I'm able to execute the qsub from the command line based on the comments at the top of pbs_script.sh.
webmo % pwd
webmo % /opt/scyld/slurm/bin/qsub -o /home/webmo/webmo/loforbes/3/pbs_stdout.test -e /home/webmo/webmo/loforbes/3/pbs_stderr.test -q 't1small' -l nodes=1:ppn=1 pbs_script.sh
webmo % qstat
Job id Name Username Time Use S Queue
------------------- ---------------- --------------- -------- - ---------------
245272 ARCTIC4 nobody 03:06:26 R t2standard
245795 MOM6-CCS1 nobody 00:05:37 R t2standard
245805 MOM6-CCS1 nobody 00:03:49 R t2standard
245807 myscript.sbatch nobody 00:03:33 R t2standard
245827 pbs_script.sh webmo 00:00:00 R t1small
webmo % cat run_log
Executing script: ./run_gaussian.cgi
Creating working directory: /center1/COMPCHEM/webmo/webmo-1945/3
Script execution node: n7
Job execution node(s):
Executing command: /usr/local/pkg/gaussian/gaussian-09.D.01/g09/g09
I can see the Gaussian app running on the cluster compute node and eventually the job finishes properly, although it doesn't show up in WebMO that way. I'm not sure what to try next to diagnose the usage message.
-There are uncountably more irrational fears than rational ones. -P. Dolan
Liam Forbes firstname.lastname@example.org ph: 907-450-8618 fax: 907-450-8601
UAF Research Computing Systems Senior HPC Engineer CISSP
Post Number: 615
|Posted on Thursday, August 02, 2018 - 9:52 am: |
Note that for SLURM you should NOT use qsub. As requested in the 'Batch Queue Manager', provide the path for 'sbatch' (rather than qsub). (The 'qsub' script is just provided for some attempt at compatability with Torque; it is not really for production use.)
|Posted on Thursday, August 02, 2018 - 12:31 pm: |
Ah! Now that you point it out, I see the difference in the web page. I backed out my changes and used the native SLURM commands. It works! Thank you.