Using the LoadLeveler batch control system

To access nodes of the NIST SP2, outside of the three-node interactive pool, users must submit their jobs to the batch system, IBM LoadLeveler. NIST is currently running Version of 1.3 of LoadLeveler. The commands used to interact with the batch system given below.

Batch commands:

llsubmit command.file
Submit a job to the batch system. The argument to this command is the name of a file containing special LoadLeveler commands which describe the user's program and machine requirements.

llq [-l]
Query a job's status in the batch queue.

Query the status of the node pools.

llcancel batchid
Kill a job that appears to be hung (on the queue or while processing) or is otherwise not performing as expected. The argument to this command is the four digit job identifier for the process to be cancelled, and can be extracted from the output of the "llq" command.

Display a current list of the available batch classes on the SP2.

Local version of llclasses giving a short listing of available batch classes.

For more details on each of these commands, see the local LoadLeveler documentation.

Batch job preparation utilities

Since both PVM(e) and MPI typically require the user to play an interactive role in the set-up of the parallel environment, a job submitted through Loadleveler must mimic the steps a user would take in an interactive environment. This is accomplished via a shell script, which is invoked when the LoadLeveler job becomes active on the queue. This shell script can be user-defined, but for routine usage, the user may benefit from using prewritten generic scripts. These scripts make use of information provided in the LoadLeveler command file to invoke the appropriate environment control tool (either POE (MPI) or PVM), set the correct number of processors, copy files to/from the local node address space, and start the user's MPMD or SPMD program(s) with the correct arguments and/or input redirection. The scripts do require that certain program information be correctly specified in the LoadLeveler command file, so a user interface has been provided which automates the generation of this command file. This utility is invoked by typing extoll at the Unix prompt in the directory where your executables are stored. (Note that this utility superceeds "X-llcreate" and "llcreate".) The program prompts the user for specific information about the parallel program, and then creates an appropriate command file which includes the appropriate generic script file to properly execute the user program on the nodes. For more information, see the Extoll documentation.