OOMMF Home next up previous contents
Next: File Formats Up: OOMMF Batch System Previous: Batch solver

Subsections

Batch scheduler

Overview
The OBS provides support for complex scheduling of multiple batch jobs via the two files batchmaster.tcl and batchslave.tcl plus a user defined task script. The task script may be modeled after the included simpletask.tcl or multitask.tcl sample scripts.

The OBS has been designed to control multiple sequential and concurrent micromagnetic simulations, but the scheduling scripts batchmaster.tcl and batchslave.tcl are completely general and may be used to schedule other types of jobs as well.

When batchmaster.tcl is run, it sources a task script that should modify the global object $TaskInfo to inform the master what tasks to perform and optionally how to launch slaves to perform those tasks. This is detailed in the Batch task scripts section. After this, batchmaster.tcl launches all the specified slaves, initializes each with a slave initialization script, and then feeds tasks sequentially from the task list to the slaves. When a slave completes a task it reports back to the master and is given the next unclaimed task. If there are no more tasks, the slave is shut down. When all the tasks are complete, the master prints a summary of the tasks and exits. Note: At present, launching and controlling jobs off the local machine requires a working rsh command or equivalent.

Launching
The batch scheduler is launched by the command line:

tclsh app/oommf/oommf.tcl batchmaster [-notk] <task_script.tcl> \
      [server_host [server_port]]
where
-notk
Do not load Tk or display a (blank) window,
<task_script.tcl>
is the user defined task (job) definition file,
server_host
specifies the network address for the master to use (default is localhost),
server_port
is the port address for the master (default is 0, which selects an arbitrary open port).

The required <task_script.tcl> should be based on the included example scripts. If slaves are to be run on remote machines, then server_host must be set to the local machine's network name, and the $TaskInfo methods AppendSlave and ModifyHostList will need to be called from inside the task script. (Details.)

Batch master and slave specifics

The communication protocol between batchmaster.tcl and batchslave.tcl is evolving and is not described here. Check the source code for the latest details.

The command line to start the master script is shown above. The slave, which is launched by the master using instructions specified in the task script, takes a command line of the form

tclsh app/oommf/oommf.tcl batchslave $server(host) $server(port) \
      $slaveid $passwd [aux_script [aux_args]]
The arguments $server(host) through $passwd are provided by the master and should be specified in the task script using the %connect_info percent substitution token. The aux_script is a script for the slave to source before processing any commands from the master. Typically this will be the micromagnetic batch mode script, batchsolve.tcl. The aux_args are additional arguments that will be passed to aux_script.


Batch task scripts

The master script creates an instance of a BatchTaskObj object with the name $TaskInfo. The task script uses method calls to this object to set up tasks to be performed. The only required call is to the AppendTask method, e.g.,
$TaskInfo AppendTask A "BatchTaskRun taskA.mif"
This method expects two arguments, a label for the task (here ``A'') and a script to accomplish the task. The script will be passed across a network socket from the master to the slave, and then the script will be interpreted by the slave. (In particular, keep in mind that the file system seen by the script will be that of the machine on which the slave process is running.)

This example uses the default batchsolve.tcl procs to run the simulation defined by the taskA.mif MIF file. If you want to make changes to the MIF problem specifications on the fly, you will need to modify the default procs. This is done by creating a slave initialization script, via the call

$TaskInfo SetSlaveInitScript { <insert script here> }
The slave initialization script does global initializations, and also generally redefines the SolverTaskInit proc; optionally the BatchTaskRelaxCallback and SolverTaskCleanup procs may be redefined as well. At the start of each task SolverTaskInit is called by BatchTaskRun (in batchsolve.tcl), at each equilibrium BatchTaskRelaxCallback is executed, and at the end of each task SolverTaskCleanup is called. The first and third are passed the arguments that were passed to BatchTaskRun. A simple SolverTaskInit proc could be
 proc SolverTaskInit { args } {
    global mif basename outtextfile
    set A [lindex $args 0]
    set outbasename "$basename-A$A"
    $mif SetA $A
    $mif SetOutBaseName $outbasename
    set outtextfile [open "$outbasename.odt" "a+"]
    puts $outtextfile [GetTextData header \
          "Run on $basename.mif, with A=[$mif GetA]"]
 }
This proc receives the exchange constant A for this task on the argument list, and makes use of the global variables mif and basename. (Both should be initialized in the slave initialization script outside the SolverTaskInit proc.) It then stores the requested value of A in the mif object, sets up the base filename to use for output, and opens a text file to which tabular data will be appended. The handle to this text file is stored in the global outtextfile, which is closed by the default SolverTaskCleanup proc. A corresponding task script could be
$TaskInfo AppendTask "A=13e-12 J/m" "BatchTaskRun 13e-12"
which runs a simulation with A set to 13e-12 J/m. This example is taken from the multitask.tcl sample script.

If you want to run more than one task at a time, then the $TaskInfo method AppendSlave will have to be invoked. This takes the form

$TaskInfo AppendSlave <spawn count> <spawn command>
where <spawn command> is the command to launch the slave process, and <spawn count> is the number of slaves to launch with this command. (Typically <spawn count> should not be larger than the number of processors on the target system.) The default value for this item (which gets overwritten with the first call to $TaskInfo AppendSlave) is
 1 {exec %tclsh %oommf batchslave -notk %connect_info batchsolve.tcl}
This uses the OOMMF bootstrap program to launch the batchslave application, with connection information provided by the master, and using the auxiliary script batchsolve.tcl. The batchmaster script provides several percent-style substitutions useful in slave launch scripts: %tclsh, %oommf, %connect_info, %oommf_root, and %%. The first is the Tcl shell to use, the second is the absolute path to the OOMMF bootstrap program on the master machine, the third is connection information needed by the batchslave application, the fourth is the path to the OOMMF root directory on the master machine, and the last is interpreted as a single percent.

To launch slaves on a remote host, use rsh in the spawn command, e.g.,

 $TaskInfo AppendSlave 1 {exec rsh foo tclsh oommf/app/oommf/oommf.tcl \
      batchslave -notk %connect_info batchsolve.tcl}
This example assumes tclsh is in the execution path on the remote machine foo, and OOMMF is installed off of your home directory. In addition, you will have to add the machine foo to the host connect list with
$TaskInfo ModifyHostList +foo
and batchmaster must be run with the network interface specified as the server host (instead of the default localhost), e.g.,
tclsh app/oommf/oommf.tcl batchmaster multitask.tcl bar
where bar is the name of the local machine.

This may seem a bit complicated, but the examples in the next section should make things clearer.


Sample task scripts

The first sample task script is a simple example that runs the 3 micromagnetic simulations described by the MIF files taskA.mif, taskB.mif and taskC.mif. It is launched with the command
tclsh app/oommf/oommf.tcl batchmaster simpletask.tcl
This example uses the default slave launch script, so a single slave is launched on the current machine, and the 3 simulations will be run sequentially. Also, no slave init script is given, so the default procs in batchsolve.tcl are used. Output will be magnetization states and tabular data for each equilibrium state, stored in files on the local machine with base names as specified in the MIF files.



# FILE: simpletask.tcl
#
# This is a sample batch task file.  Usage example:
#
#  tclsh app/oommf/oommf.tcl batchmaster simpletask.tcl

# Form task list
$TaskInfo AppendTask A "BatchTaskRun taskA.mif"
$TaskInfo AppendTask B "BatchTaskRun taskB.mif"
$TaskInfo AppendTask C "BatchTaskRun taskC.mif"
Figure 1: Sample task script simpletask.tcl. (Description.)


The second task script is a more complicated example running concurrent processes on two machines. This script should be run with the command

tclsh app/oommf/oommf.tcl batchmaster multitask.tcl bar
where bar is the name of the local machine.

Near the top of the multitask.tcl script several Tcl variables (RMT_MACHINE through A_list) are defined; these are used farther down in the script. The remote machine is specified as foo, which is used in the $TaskInfo AppendSlave and $TaskInfo ModifyHostList commands.

There are two AppendSlave commands, one to run two slaves on the local machine, and one to run a single slave on the remote machine (foo). The latter changes to a specified working directory before launching the batchslave application on the remote machine. (For this to work you must have rsh configured properly. In the future it may be possible to launch remote commands using the OOMMF account server application, thereby lessening the reliance on system commands like rsh.)

Below this the slave init script is defined. The Tcl regsub command is used to place the task script defined value of BASEMIF into the init script template. The init script is run on the slave when the slave is first brought up. It first reads the base MIF file into a newly created mms_mif instance. (The MIF file needs to be accessible by the slave process, irrespective of which machine it is running on.) Then replacement SolverTaskInit and SolverTaskCleanup procs are defined. The new SolverTaskInit interprets its first argument as a value for the exchange constant A. Note that this is different from the default SolverTaskInit proc, which interprets its first argument as the name of a MIF file to load. With this task script, a MIF file is read once when the slave is brought up, and then each task redefines only the value of A for the simulation (and corresponding changes to the output filenames and data table header).

Finally, the Tcl loop structure

foreach A $A_list {
    $TaskInfo AppendTask "A=$A" "BatchTaskRun $A"
}
is used to build up a task list consisting of one task for each value of A in A_list (defined at the top of the task script). For example, the first value of A is 10e-13, so the first task will have the label A=10e-13 and the corresponding script is BatchTaskRun 10e-13. The value 10e-13 is passed on by BatchTaskRun to the SolverTaskInit proc, which has been redefined to process this argument as the value for A, as described above.

There are 6 tasks in all, and 3 slave processes, so the first three tasks will run concurrently in the 3 slaves. As each slave finishes it will be given the next task, until all the tasks are complete.



# FILE: multitask.tcl
#
# This is a sample batch task file.  Usage example:
#
#   tclsh app/oommf/oommf.tcl batchmaster multitask.tcl hostname [port]
#
# Task script configuration
set RMT_MACHINE   foo 
set RMT_TCLSH      tclsh
set RMT_OOMMF      "/path/to/oommf/app/oommf/oommf.tcl"
set RMT_WORK_DIR   "/path/to/oommf/app/mmsolve/data"
set BASEMIF taskA
set A_list { 10e-13 10e-14 10e-15 10e-16 10e-17 10e-18 }

# Slave launch commands
$TaskInfo ModifyHostList +$RMT_MACHINE
$TaskInfo AppendSlave 2 "exec %tclsh %oommf batchslave -notk \
        %connect_info batchsolve.tcl"
$TaskInfo AppendSlave 1 "exec rsh $RMT_MACHINE \
        cd $RMT_WORK_DIR \\\;\
        $RM_TCLSH $RMT_OOMMF batchslave -notk %connect_info batchsolve.tcl"

# Slave initialization script (with batchsolve.tcl proc
# redefinitions)
set init_script {
    # Initialize solver. This is run at global scope
    set basename __BASEMIF__      ;# Base mif filename (global)
    mms_mif New mif
    $mif Read [FindFile ${basename}.mif]
    # Redefine TaskInit and TaskCleanup proc's
    proc SolverTaskInit { args } {
        global mif outtextfile basename
        set A [lindex $args 0]
        set outbasename "$basename-A$A"
        $mif SetA $A
        $mif SetOutBaseName $outbasename
        set outtextfile [open "$outbasename.odt" "a+"]
        puts $outtextfile [GetTextData header \
                "Run on $basename.mif, with A=[$mif GetA]"]
        flush $outtextfile
    }
    proc SolverTaskCleanup { args } {
        global outtextfile
        close $outtextfile
    }
}
# Substitute $BASEMIF in for __BASEMIF__ in above script
regsub -all -- __BASEMIF__ $init_script $BASEMIF init_script
$TaskInfo SetSlaveInitScript $init_script

# Create task list
foreach A $A_list {
    $TaskInfo AppendTask "A=$A" "BatchTaskRun $A"
}
Figure 2: Advanced sample task script multitask.tcl. (Description.)



OOMMF Home next up previous Contents

OOMMF Documentation Team
February 23, 2000