WebMO Help - Cloning Engines

Overview

Each supported computational engine within WebMO utilizes a series of files including:

Adding a new computational engine to WebMO simply requires creating the requisite set of files. No modifications of the 'main' WebMO source code is required. The presence of the additional computational engine is automatically deduced by the presence of engine-specific configuration files (details below). These files are dynamically loaded and used by WebMO when engine-specific functions are required.

clone_engine.pl

The script clone_interface.pl in the WebMO.install/scripts directory is designed to assist in creating the new set of files for another engine. It clones a computational chemistry engine with a new name, e.g., uses "gaussian" to create "gaussian2".

Select the original engine most similar to the new engine for which you wish to create a new interface. Clone the new engine. Follow the instructions to hand-edit the new engine *.int file and update the text as appropriate. You should then see entries for both engines, e.g., "gaussian" and "gaussian2" everywhere, from "Choose Computational Engine" to the "Interface Manager".

This script is useful both for installing two versions of the same engine and for providing a nice starting point for interfaces to new programs.

Conventions

For the remainder of this document, the following conventions are used:

Required Files

Before giving a detailed description of each of the various files, it is useful to list and give a brief description of each of the required elements.

File type File name Location Description
Data $engine.int $cgiBase/interfaces The computational engine configuration file; contains information specific to that particular computational engine
Data $engine.tmpl $cgiBase/interfaces The job template file, from which calculation types are defined, and input files are generated
HTML $engine.html $htmlBase Defines the various HTML form elements that determine the available job options
HTML (v12.1+) $engine.js $htmlBase/javascript Associated javascript specific to the job options
HTML ${engine}mgr_admin.html $htmlBase For the web-based configuration of the engine
Script $engine.cgi $cgiBase Parses the HTML form data associated with $engine.html
Script run_$engine.cgi $cgiBase Handles the details of executing the computational engine
Script ${engine}mgr_admin.cgi $cgiBase Facilitates the web-based configuration of the computational engine
Script parse_$engine.cgi $cgiBase Parses the raw text output of the engine to a standard WebMO format

Although this list looks rather intimidating, in fact, most of the required files are trivial to generate through simple modifications of the corresponding file from a pre-existing computational engine. Usually, only minor modifications are required, e.g., changing variable names, or command line arguments to the computational engine. These modifications are usually possible even for someone who has only a basic understanding of HTML and Perl

The exception is in the case of the parse_$engine.cgi file, which handles the (sometimes complex) parsing of the text output generated by the computational engine. In general, this can be a complicated process, which frequently requires a detailed understanding of Perl and regular expressions. However, it is still frequently possible to find an example where an existing computational engine structures its output in a similar manner, and work from that starting point. Yet sometimes it is necessary to start from scratch.

Data Files

$engine.int

This configuration file, location in the $cgiBase/interfaces directory, determines all of the saved configuration options that WebMO utilizes to run the computational engine. This existence of this file is also what WebMO uses to dynamically determine that an engine is supported; if this file is removed, or its extension is changed, WebMO will not utilize the computational engine.

In particular, this file may save the directory of the engine, the path to the executable, version number, etc. Each variable is stored in the format, variable=value. For example,
      nwchemVersion="5.0"
The value must be surrounded by quotation marks if it is a string (and can optionally be quoted for numeric values as well). The contents of this file is arbitrary, but should be restricted to engine-specific configuration variables. Although the names of the variables is also arbitrary, convention dictates that the variable names should be prefixed with the name of the computational engine, to prevent conflicts among engines.

Along with these arbitrary user-define variables, the configuration file also defines a few 'required' variables, that are used throughout WebMO.

$engine.tmpl

The job template file, located in the $cgiBase/interfaces directory, defines the various calculation types supported by this particular computational engine. This file is divided into section, and each section corresponds to a particular calculation type that is supported by this engine. This file is then used to build the Job Options page, and create appropriate input file. The format of the files is as follows:

The bulk of the template file consists of input files for each calculation type. Obvious the exact contents of the input file varies from job to job (i.e. different geometry, charge, basis set, etc.) For this reason, the input file contains a variety of variables, each which begins with a dollar sign. The existence of such a variable in the template triggers an variable expansion, replacing the variable name by its contents.

Many such variables are defined by WebMO, and list of the standard ones can be found in the standard WebMO documentation. In general, each variable in the template corresponds to an HTML form variable of the same name, defined in the corresponding 'Job Options' page ($engine.html). Thus, it is possible to add variables specific to your particular computational engine by defining a field on HTML form of the job options page; a corresponding variable of the same name will automatically made available in the job template.

The input file can also contain conditional expressions, which greatly enhances the power and flexibility of the template system. As of the current version of WebMO, the Perl 'Template Toolkit' package is utilized. This package is quite powerful and flexible, and quite simple to use. The syntax for these conditional statements is rather simple, and the reader is referred to any one of the existing template files for a large number of examples. Complete documentation for the Template Toolkit is available online at:
      http://template-toolkit.org/

HTML Files

$engine.html

This HTML file contains the source of the 'Job Options' page associated with the computational engine. It is strongly suggested that you simply copy / modify the source from an existing engine, both for reasons of consistency and expediency.

Change all occurences of the engine name to reflect that of the new engine. Also change the list of available job options to reflect the capabilities of the engine. The VALUES of the options in the drop-down boxes may need to be altered to reflect the keywords used by that particular program.

In general, no other changes should be required.

$engine.js

This Javascript file contains the necessary Javascript of the 'Job Options' page associated with the computational engine. It is strongly suggested that you simply copy / modify the javascript from an existing engine, both for reasons of consistency and expediency.

There are only a few elements which likely need to be modified:

In general, no other changes should be required.

${engine}mgr_admin.html

This HTML file contains the source of the 'Interface Manager' page associated with the computational engine. It is strongly suggested that you simply copy / modify the source from an existing engine, both for reasons of consistency and expediency.

There are only a few elements which likely need to be modified:

Script Files

The following Perl scripts define the code necessary for WebMO to interface with the new computational engine. Each file contains a variety of engine-specific subroutines that are used by WebMO to run the engine jobs, and process the output. Make sure to follow the naming conventions EXACTLY, as WebMO dynamically loads the required modules BY NAME; deviations from the convention will result in errors.

$engine.cgi

As above, it is recommended to copy / modify and existing example, changing the any appearances of the engine name (including in subroutine names!) to reflect the new engine. The following subroutines are required:

run_${engine}.cgi

As above, it is recommended to copy / modify and existing example, changing the any appearances of the engine name (including in subroutine names!) to reflect the new engine.

The most important aspects to mention here is that environmental variables can be set by modifying the perl hash %ENV. This is often important when setting up an environment to execute the engine.

Engines are run by first fork()ing a new process, writing the PID of the child process (which eventually will contain the engine) to a file, and then exec()ing the computational engine. This process allows WebMO to obtain the PID of the computational engine process so that it can be monitored by the WebMO daemon.

When the engine is executed, it is vital to direct the program to read from the appropriate input file, and write to the appropriate output file. By convention, the input file is 'input.inp' (the extension can be modified, as determined in $engine.cgi) and 'output.log', where these files are located in the $userBase directory, in the tree corresponding to the current user. These file locations are typically passed to the computational program on the command line, but conventions vary.

${engine}mgr_admin.cgi

As above, it is recommended to copy / modify and existing example, changing the any appearances of the engine name (including in subroutine names!) to reflect the new engine.

Beyond updating appearances of the engine name, almost no changes should be required. The default minimalist implementation simply updates the variables in the engine configuration file to reflect that changes made through the 'Interface Manager' web-based configuration tool.

parse_${engine}.cgi

As above, it is recommended to copy / modify and existing example. However, unlike in most of the above cases, dramatic changes are likely to be required for each computational engine.

This file contains a sequence of subroutines, which are called in turn when parsing the text output created by the computational engine. In general, this subroutines write value of the parsed property, in a well-define format, to a file handle that is passed to the subroutine. This information is later used on the 'View Job' page to visualize the results.

For each property define in the 'properties' array of the process_${engine}_output subroutine, a corresponding process_{$engine}_$property subroutine is called. It is the responsibility of this subroutine to determine if the corresponding property exists in the output file, and if so, parse the results to a well-defined format. With a few exceptions, the subroutine is passed the handle of the properties file (to which the parsed results are to be written), and an array of the output file contents (one line per array entry).

In addition to the user-specified properties, WebMO requires some special 'properties' to be parsed for each job. Thus, the following subroutines MUST exist:

The parsing process is aided by the use of a variety of pre-defined parsing functions, which can be used to search through the file to locate various strings, etc. These functions are defined in parse_output.cgi, and the files is normally require()d at the top of the parse_${engine}.cgi file to utilize these functions. Of particular interest is,
      search_from_beginning($regexp, \@logfileText)
which can be used to search for the first match to the given regular expression (when passing a constant string, use ' delimiters around the string rather than " to avoid having to escape the common \ in your regexp), starting from the first line in the output file (i.e. first element of the array). Also,
    search_from_end($regexp, \@logfileText)
does the same thing, searching from the first match starting at the END of the file. By convention, WebMO usually parses the LAST occurrence of a property from the file.

In both cases, the functions return the array index (i.e. line number - 1, since the array is zero based) corresponding to the first match, OR -1 IF NO MATCH WAS FOUND.

Also potentially useful are
      search_forward($regexp, $start, \@logfileText)
      search_backward($regexp, $start, \@logfileText)
which accomplish the same thing, but do not start the search until the specified starting index. This can be useful to iterating a search until the last occurrence is found, at which point the function returns -1.

Beyond this advice, the parsing of properties is difficult to generalize. Progress can be made by modifying the parsing of the corresponding property from a different computational engine, where the properties (i.e., table of normal modes) is formatted in a similar manner. Other times one must start from scratch. No attempt is made to fully document the format of a parsed property (since MANY properties are parsed by WebMO), but the format can easily by deduced from the existing examples.