|Posted on Wednesday, June 27, 2012 - 5:39 am: |
i have webmo enterprise running with sge.
it can successfully submit jobs if the group permission is "webserver uid". the job runs fine and can be seen with qstat.
however, when the group permission is "webmo username uid", it only creates the job script, input files, etc., but never actually gets queued into sge.
i followed the instruction on how to intall the enterprise version; i've made sure the sudo settings are correct (no requiretty, etc.); and all the suexec file permissions are set properly.
i do have selinux disabled; the server is running rocks 6 (centos 6).
anybody have any ideas as to what's wrong? i'm guessing the "sudo -u username ..." is faulty; but i really can't find anything in the http logs, and in webmo/errors file.
am i missing something else?
Post Number: 264
|Posted on Wednesday, June 27, 2012 - 10:07 am: |
1) The SGE logs, to see if the job was ever even submitted
2) The sudo logs (probably in /var/log/messages or /var/log/authpriv), to see if the sudo command was rejected for some reason
NOTE that using the setting "webmo username uid" is perhaps redundant with "webserver uid". If you have installed WebMO under the public_html directory of the "webmo" account (as recommended) and are using suexec (which is enabled by default on rocks), then WebMO is ALREADY running under the "webmo" account. As such, there is no need to use sudo.
|Posted on Wednesday, June 27, 2012 - 5:46 pm: |
thanks for the quick reply!
1) the sge logs seems to show that the job was not submitted at all. however, both webmo and the user can run sge jobs from the command line. in fact, if i just copy the sudo command being issued by the cgi script (as shown in the pbs_script.sh file), the job gets submitted and runs just fine.
2) no errors in the sudo logs that i can find.
when the group permission is set to "webserver uid", the jobs gets submitted and runs fine. if qstat is issued on the terminal, it shows the job running under the webmo user.
but what i want to do is have it show as running under the login name of the user, not the webmo user---just like the one in the enterprise instruction.
am i missing something here?
Post Number: 268
|Posted on Thursday, June 28, 2012 - 3:13 pm: |
I did a bit of playing around. The issue is that you need to add the variable "SGE_ROOT" to the env_keep list of variables in the /etc/sudoers configuration file.
By default, sudo clears most environmental variables for security reasons. This variable needs to be set for qsub to work. I will update the documentation.
|Posted on Thursday, June 28, 2012 - 10:06 pm: |
i have this on my sudoers file (had to put it there to be able to submit jobs to sge:
Defaults env_keep += "SGE_CELL SGE_ARCH SGE_EXECD_PORT SGE_QMASTER_PORT SGE_ROOT"
unfortunately, it still doesn't work.
i think it's an issue with sudo and suexec---but i can't find out what. the log files are not showing anything.
Post Number: 269
|Posted on Thursday, June 28, 2012 - 10:11 pm: |
Did you define sge_qmaster and sge_execd in /etc/services? There is a note about this in the Enterprise documentation. (If you define it in /etc/services, you SHOULDN'T need to define SGE_QMASTER_PORT and SGE_EXECD_PORT in sudoers.)
I just set this up on a freshly installed Rocks cluster, and aside from what is documented in our existing Enterprise support doc, the ONLY the I had to add for SGE was to define SGE_ROOT in /etc/sudoers. That said, our configs may differ.
|Posted on Thursday, June 28, 2012 - 11:54 pm: |
you are correct! it finally worked. i removed the SGE_QMASTER_PORT and SGE_EXECD_PORT in the sudoers; now everything works! so simple, yet...
thanks very much for the help!
Post Number: 270
|Posted on Friday, June 29, 2012 - 10:04 am: |
Thanks for your help in tracking this down. I have updated the documentation to reflect what we have learned aobut SGE / sudo!
|Posted on Friday, June 29, 2012 - 12:21 pm: |
well, it was all too good to be true...
i was able to submit and run a job to sge. it will show up in the queue, then SOMETIMES will either run to completion (these are very short test jobs) or mysteriously vanish. it's so intermittent that it seemed totally random---until i looked at the job accounting of sge.
sge accounting showed that the vanished jobs DO get submitted, but was being killed sometime before it can be executed.
it does not seem to be sge's fault...
i managed to dig up the subroutine in webmo that processes/checks for queued/running jobs. turns out no parameter is passed to qstat---which, in my settings (i guess the default Rocks setting), will only show jobs being run by the user issuing the qstat command. and because the qstat command is issued by user webmo, while the qsub was sudo'ed to be run by another user, qstat returns nothing.
i solved this by passing -u '*' to the qstat option, which list all jobs on the queue. Now, all is well.
I don't know how time consuming this could be to parse when there's a lot of jobs running. but it works in my setup.
for the record, my setup is:
Rocks 6/GE 6.2u5, all settings are rocks default.
Post Number: 271
|Posted on Friday, June 29, 2012 - 12:27 pm: |
You beat me to the punch. I discovered one more important change needed for RECENT versions of SGE when running under sudo; I was just about to post this.
You need to add the line:
$qstatOptions = '-u "*"' if ($externalBatchQueue eq 'sge');
right below where the qstatOptions variable is defined. This is because the most recent versions of qstat display jobs ONLY from the user running qstat, nor from ALL users. As such WebMO doesn't see the running jobs and thinks something is awry.