|Posted on Wednesday, November 15, 2017 - 5:33 am: |
Hello WebMO team!
Our HPC team and me have issues with memory management with gaussian 09 and WebMO:
Jobs fail with the error message
slurmstepd: error: Exceeded step memory limit at some point.
slurmstepd: error: Exceeded job memory limit at some point.
Meanwhile the job input file requested plenty of memory:
#N B3LYP/Gen pseudo=read freq scf=(xqc, maxcycle=512) nosymm
The administrator reports that even though 1,8GB RAM are available per core, for a 16 core machine that would be 28,8 GB, the job gets cancelled at 3,48GB total Ram utilization.
We fail to find a place to request the proper amount of memory.
In the help file for Gamess there seem to be an option to choose memory, but this doesnt seem available for Gaussian?
Please note that we are using the "Execute Input File" route of creating our jobs, because we need to use ECPs for our transition metal complexes that cant be picked by the Job Creation tool.
Maybe helpful, if I request too much memory in the Gaussian input file, the following error occurs:
Gaussian 09: ES64L-G09RevD.01 24-Apr-2013
Will use up to 16 processors via shared memory.
#N B3LYP/gen pseudo=read opt=calcfc freq scf=(xqc, maxcycle=512)
galloc: could not allocate memory.: Cannot allocate memory
galloc: could not allocate memory.
Error: segmentation violation
rax 0000000000000000, rbx 000000000086ea40, rcx ffffffffffffffff
rdx 0000000000001e01, rsp 00007ffe784d10e8, rbp 00007ffe784d1188
rsi 000000000000000b, rdi 0000000000001e01, r8 00002ba1b34ba300
r9 0000000000000000, r10 00007ffe784d0e00, r11 0000000000000206
r12 0000000000b40a80, r13 00007ffe784d1190, r14 0000000000a95458
Post Number: 577
|Posted on Wednesday, November 15, 2017 - 11:09 am: |
This is a memory limit imposed by SLURM on the running jobs. Asking Gaussian for more memory won't help, since SLURM denies the request. You need to request more memory for the job from SLURM as an argument (--mem, I believe) to sbatch when the job is submitted. You will need to add this argument in 'daemon_pbs.cgi' script.
|Posted on Thursday, November 16, 2017 - 5:01 am: |
Our administrator is very unhappy with this proposal, as this would always request a high amount of memory.
As a compromise he suggested increasing the --mem-per-cpu setting.
His question was if it was possible for WebMO to actually use the requested amount of memory as specified in the input file.
I personally dont fully understand the causality on which of the three software packages, SLURM, Gaussian or WebMO in regards to this.
Post Number: 578
|Posted on Thursday, November 16, 2017 - 9:50 am: |
The daemon doesn't have access to the memory requested by the input file. Note that the SLURM --mem does not actually request any memory. It is a ceiling imposed upon the job.
|Posted on Thursday, November 16, 2017 - 10:36 am: |
That's a helpful bit of information.. So far things seem to work with an increased --mem-per-cpu setting, but we will revisit your suggestion when necessary.
|Posted on Thursday, November 23, 2017 - 9:35 am: |
Is there any way to get more specific with this?
We have different nodes with different amounts of RAM, 32, 64, 128 and 256 and different numbers of CPUs (8 or 16) and would like the possibility to set the requested memory according to the capability of the node running the job.
Alternatively it would help to have a user configurable setting, as is the original intention of the %mem keyword in Gaussian input files.
This would usefully scale with a custom mem-per-cpu setting for each type of available node.
The according node type can be specified in the "Execute Input File" under Constraints.
Is this something that can be accounted for how much memory is requested in the daemon script interfacing with SLURM?
Unfortunately it seems like there is no "one size fits all" pre-setting. Setting the mem-per-cpu value too high would automatically disqualify any job for low memory nodes, according to our admin. This of course is also undesirable.
Post Number: 579
|Posted on Monday, November 27, 2017 - 10:48 am: |
One can specificy SLURM "constraints" both on the "Execute Input File" page and on the advanced job options page. In both case, these are fed into the "--constraints" option on the SLURM command line. This is normally used to request e.g. a machine with a certain "capability".
Often one defines a "highmem" capability for nodes with large memory, etc. This is how most HPC centers handle such situations. Using "%mem" is of course specific to Gaussian!
You are free to reuse the "Capabilities" field for your own purposes by modifying daemon_pbs.cgi and feeding that field into a different command line argument.