10.2 OOMMF 2D Micromagnetic Solver Batch System

10.2.2 2D Micromagnetic Solver Batch Scheduling System

Overview
The OBS supports complex scheduling of multiple batch jobs with two applications, batchmaster and batchslave. The user launches batchmaster and provides it with a task script. The task script is a Tcl script that describes the set of tasks for batchmaster to accomplish. The work is actually done by instances of batchslave that are launched by batchmaster. The task script may be modeled after the included simpletask.tcl or multitask.tcl sample scripts.

The OBS has been designed to control multiple sequential and concurrent micromagnetic simulations, but batchmaster and batchslave are completely general and may be used to schedule other types of jobs as well.

10.2.2.1 Master Scheduling Control: batchmaster

The application batchmaster is launched by the command line:

tclsh oommf.tcl batchmaster [standard options] task_script \
      [host [port]]
task_script

is the user defined task (job) definition Tcl script,

host

specifies the network address for the master to use (default is localhost),

port

is the port address for the master (default is 0, which selects an arbitrary open port).

When batchmaster is run, it sources the task script. Tcl commands in the task script should modify the global object $TaskInfo to inform the master what tasks to perform and optionally how to launch slaves to perform those tasks. The easiest way to create a task script is to modify one of the included example scripts. More detailed instructions are in Batch task scripts.

After sourcing the task script, batchmaster launches all the specified slaves, initializes each with a slave initialization script, and then feeds tasks sequentially from the task list to the slaves. When a slave completes a task it reports back to the master and is given the next unclaimed task. If there are no more tasks, the slave is shut down. When all the tasks are complete, the master prints a summary of the tasks and exits.

When the task script requests the launching and controlling of jobs off the local machine, with slaves running on remote machines, then the command line argument host must be set to the local machine’s network name, and the $TaskInfo methods AppendSlave and ModifyHostList will need to be called from inside the task script. Furthermore, OOMMF does not currently supply any methods for launching jobs on remote machines, so a task script which requests the launching of jobs on remote machines requires a working ssh command or equivalent. (Details.)

10.2.2.2 Task Control: batchslave

The application batchslave may be launched by the command line:

tclsh oommf.tcl batchslave [standard options] \
   host port id password [auxscript [arg ...]]
host, port

Host and port at which to contact the master to serve.

id, password

ID and password to send to the master for identification.

auxscript arg …

The name of an optional script to source (which actually performs the task the slave is assigned), and any arguments it needs.

In normal operation, the user does not launch batchslave. Instead, instances of batchslave are launched by batchmaster as instructed by a task script. Although batchmaster may launch any slaves requested by its task script, by default it launches instances of batchslave.

The function of batchslave is to make a connection to a master program, source the auxscript and pass it the list of arguments aux_arg .... Then it receives commands from the master, and evaluates them, making use of the facilities provided by auxscript. Each command is typically a long-running one, such as solving a complete micromagnetic problem. When each command is complete, the batchslave reports back to its master program, asking for the next command. When the master program has no more commands batchslave terminates.

Inside batchmaster, each instance of batchslave is launched by evaluating a Tcl command. This command is called the spawn command, and it may be redefined by the task script in order to completely control which slave applications are launched and how they are launched. When batchslave is to be launched, the spawn command might be:

exec tclsh oommf.tcl batchslave -tk 0 -- $server(host) $server(port) \
   $slaveid $passwd batchsolve.tcl -restart 1 &

The Tcl command exec is used to launch subprocesses. When the last argument to exec is &, the subprocess runs in the background. The rest of the spawn command should look familiar as the command line syntax for launching batchslave.

The example spawn command above cannot be completely provided by the task script, however, because parts of it are only known by batchmaster. Because of this, the task script should define the spawn command using “percent variables” which are substituted by batchmaster. Continuing the example, the task script provides the spawn command:

exec %tclsh %oommf batchslave -tk 0 %connect_info \
   batchsolve.tcl -restart 1

batchmaster replaces %tclsh with the path to tclsh, and %oommf with the path to the OOMMF bootstrap application. It also replaces %connect_info with the five arguments from -- through $password that provide batchslave the hostname and port where batchmaster is waiting for it to report to, and the ID and password it should pass back. In this example, the task script instructs batchslave to source the file batchsolve.tcl and pass it the arguments -restart 1. Finally, batchmaster always appends the argument & to the spawn command so that all slave applications are launched in the background.

The communication protocol between batchmaster and batchslave is evolving and is not described here. Check the source code for the latest details.

10.2.2.3 Batch Task Scripts

The application batchmaster creates an instance of a BatchTaskObj object with the name $TaskInfo. The task script uses method calls to this object to set up tasks to be performed. The only required call is to the AppendTask method, e.g.,

$TaskInfo AppendTask A "BatchTaskRun taskA.mif"

This method expects two arguments, a label for the task (here “A”) and a script to accomplish the task. The script will be passed across a network socket from batchmaster to a slave application, and then the script will be interpreted by the slave. In particular, keep in mind that the file system seen by the script will be that of the machine on which the slave process is running.

This example uses the default batchsolve.tcl procs to run the simulation defined by the taskA.mif MIF 1.x file. If you want to make changes to the MIF problem specifications on the fly, you will need to modify the default procs. This is done by creating a slave initialization script, via the call

$TaskInfo SetSlaveInitScript { <insert script here> }

The slave initialization script does global initializations, and also usually redefines the SolverTaskInit proc; optionally the BatchTaskIterationCallback, BatchTaskRelaxCallback and SolverTaskCleanup procs may be redefined as well. At the start of each task SolverTaskInit is called by BatchTaskRun (in batchsolve.tcl), after each iteration BatchTaskIterationCallback is executed, at each control point BatchTaskRelaxCallback is run, and at the end of each task SolverTaskCleanup is called. SolverTaskInit and SolverTaskCleanup are passed the arguments that were passed to BatchTaskRun. A simple SolverTaskInit proc could be

proc SolverTaskInit { args } {
   global mif basename outtextfile
   set A [lindex $args 0]
   set outbasename "$basename-A$A"
   $mif SetA $A
   $mif SetOutBaseName $outbasename
   set outtextfile [open "$outbasename.odt" "a+"]
   puts $outtextfile [GetTextData header \
         "Run on $basename.mif, with A=[$mif GetA]"]
}

This proc receives the exchange constant A for this task on the argument list, and makes use of the global variables mif and basename. (Both should be initialized in the slave initialization script outside the SolverTaskInit proc.) It then stores the requested value of A in the mif object, sets up the base filename to use for output, and opens a text file to which tabular data will be appended. The handle to this text file is stored in the global outtextfile, which is closed by the default SolverTaskCleanup proc. A corresponding task script could be

$TaskInfo AppendTask "A=13e-12 J/m" "BatchTaskRun 13e-12"

which runs a simulation with A set to 13×10-12 J/m. This example is taken from the multitask.tcl sample script. (For commands accepted by mif objects, see the file mmsinit.cc. Another object than can be gainfully manipulated is solver, which is defined in solver.tcl.)

If you want to run more than one task at a time, then the $TaskInfo method AppendSlave will have to be invoked. This takes the form

$TaskInfo AppendSlave <spawn count> <spawn command>

where <spawn command> is the command to launch the slave process, and <spawn count> is the number of slaves to launch with this command. (Typically <spawn count> should not be larger than the number of processors on the target system.) The default value for this item (which gets overwritten with the first call to $TaskInfo AppendSlave) is

 1 {Oc_Application Exec batchslave -tk 0 %connect_info batchsolve.tcl}

The Tcl command Oc_Application Exec is supplied by OOMMF and provides access to the same application-launching capability that is used by the OOMMF bootstrap application. Using a <spawn command> of Oc_Application Exec instead of exec %tclsh %oommf saves the spawning of an additional process. The default <spawn command> launches the batchslave application, with connection information provided by batchmaster, and using the auxscript batchsolve.tcl.

Before evaluating the <spawn command>, batchmaster applies several percent-style substitutions useful in slave launch scripts: %tclsh, %oommf, %connect_info, %oommf_root, and %%. The first is the Tcl shell to use, the second is an absolute path to the OOMMF bootstrap program on the master machine, the third is connection information needed by the batchslave application, the fourth is the path to the OOMMF root directory on the master machine, and the last is interpreted as a single percent. batchmaster automatically appends the argument & to the <spawn command> so that the slave applications are launched in the background.

To launch batchslave on a remote host, use ssh in the spawn command, e.g.,

$TaskInfo AppendSlave 1 {exec ssh foo tclsh oommf/oommf.tcl \
      batchslave -tk 0 %connect_info batchsolve.tcl}

This example assumes tclsh is in the execution path on the remote machine foo, and OOMMF is installed off of your home directory. In addition, you will have to add the machine foo to the host connect list with

$TaskInfo ModifyHostList +foo

and batchmaster must be run with the network interface specified as the server host (instead of the default localhost), e.g.,

tclsh oommf.tcl batchmaster multitask.tcl bar

where bar is the name of the local machine.

This may seem a bit complicated, but the examples in the next section should make things clearer.

10.2.2.4 Sample task scripts

The first sample task script is a simple example that runs the 3 micromagnetic simulations described by the MIF 1.x files taskA.mif, taskB.mif and taskC.mif. It is launched with the command

tclsh oommf.tcl batchmaster simpletask.tcl

This example uses the default slave launch script, so a single slave is launched on the current machine, and the 3 simulations will be run sequentially. Also, no slave initialization script is given, so the default procs in batchsolve.tcl are used. Output will be magnetization states and tabular data at each control point, stored in files on the local machine with base names as specified in the MIF files.

 

Figure 10.1: Sample task script simpletask.tcl. (description)

hyperlink

# FILE: simpletask.tcl
#
# This is a sample batch task file.  Usage example:
#
#   tclsh oommf.tcl batchmaster simpletask.tcl
#
# Form task list
$TaskInfo AppendTask A "BatchTaskRun taskA.mif"
$TaskInfo AppendTask B "BatchTaskRun taskB.mif"
$TaskInfo AppendTask C "BatchTaskRun taskC.mif"

 

second sample task script builds on the previous example by defining BatchTaskIterationCallback and BatchTaskRelaxCallback procedures in the slave init script. The first set up to write tabular data every 10 iterations, while the second writes tabular data on each control point event. The data is written to the output file specified by the Base Output Filename entry in the input MIF files. Note that there is no magnetization vector field output in this example. This task script is launched the same way as the previous example:

tclsh oommf.tcl batchmaster octrltask.tcl

 

Figure 10.2: Task script with iteration output octrltask.tcl. (description)
# FILE: octrltask.tcl
#
# This is a sample batch task file, with expanded output control.
# Usage example:
#
#        tclsh oommf.tcl batchmaster octrltask.tcl
#
# "Every" output selection count
set SKIP_COUNT 10

# Initialize solver. This is run at global scope
set init_script {
    # Text output routine
    proc MyTextOutput {} {
        global outtextfile
        puts $outtextfile [GetTextData data]
        flush $outtextfile
    }
    # Change control point output
    proc BatchTaskRelaxCallback {} {
        MyTextOutput
    }
    # Add output on iteration events
    proc BatchTaskIterationCallback {} {
        global solver
        set count [$solver GetODEStepCount]
        if { ($count % __SKIP_COUNT__) == 0 } { MyTextOutput }
    }
}

# Substitute $SKIP_COUNT in for __SKIP_COUNT__ in above "init_script"
regsub -all -- __SKIP_COUNT__ $init_script $SKIP_COUNT init_script
$TaskInfo SetSlaveInitScript $init_script

# Form task list
$TaskInfo AppendTask A "BatchTaskRun taskA.mif"
$TaskInfo AppendTask B "BatchTaskRun taskB.mif"
$TaskInfo AppendTask C "BatchTaskRun taskC.mif"

 

The third task script is a more complicated example running concurrent processes on two machines. This script should be run with the command

tclsh oommf.tcl batchmaster multitask.tcl bar

where bar is the name of the local machine.

Near the top of the multitask.tcl script several Tcl variables (RMT_MACHINE through A_list) are defined; these are used farther down in the script. The remote machine is specified as foo, which is used in the $TaskInfo AppendSlave and $TaskInfo ModifyHostList commands.

There are two AppendSlave commands, one to run two slaves on the local machine, and one to run a single slave on the remote machine (foo). The latter changes to a specified working directory before launching the batchslave application on the remote machine. (For this to work you must have ssh configured properly.)

Below this the slave initialization script is defined. The Tcl regsub command is used to place the task script defined value of BASEMIF into the init script template. The init script is run on the slave when the slave is first brought up. It first reads the base MIF file into a newly created mms_mif instance. (The MIF file needs to be accessible by the slave process, irrespective of which machine it is running on.) Then replacement SolverTaskInit and SolverTaskCleanup procs are defined. The new SolverTaskInit interprets its first argument as a value for the exchange constant A. Note that this is different from the default SolverTaskInit proc, which interprets its first argument as the name of a MIF 1.x file to load. With this task script, a MIF file is read once when the slave is brought up, and then each task redefines only the value of A for the simulation (and corresponding changes to the output filenames and data table header).

Finally, the Tcl loop structure

foreach A $A_list {
    $TaskInfo AppendTask "A=$A" "BatchTaskRun $A"
}

is used to build up a task list consisting of one task for each value of A in A_list (defined at the top of the task script). For example, the first value of A is 10e-13, so the first task will have the label A=10e-13 and the corresponding script is BatchTaskRun 10e-13. The value 10e-13 is passed on by BatchTaskRun to the SolverTaskInit proc, which has been redefined to process this argument as the value for A, as described above.

There are 6 tasks in all, and 3 slave processes, so the first three tasks will run concurrently in the 3 slaves. As each slave finishes it will be given the next task, until all the tasks are complete.

 

Figure 10.3: Advanced sample task script multitask.tcl. (description)
# FILE: multitask.tcl
#
# This is a sample batch task file.  Usage example:
#
#   tclsh oommf.tcl batchmaster multitask.tcl hostname [port]
#
# Task script configuration
set RMT_MACHINE   foo
set RMT_TCLSH      tclsh
set RMT_OOMMF      "/path/to/oommf/oommf.tcl"
set RMT_WORK_DIR   "/path/to/oommf/app/mmsolve/data"
set BASEMIF taskA
set A_list { 10e-13 10e-14 10e-15 10e-16 10e-17 10e-18 }

# Slave launch commands
$TaskInfo ModifyHostList +$RMT_MACHINE
$TaskInfo AppendSlave 2 "exec %tclsh %oommf batchslave -tk 0 \
        %connect_info batchsolve.tcl"
$TaskInfo AppendSlave 1 "exec ssh $RMT_MACHINE \
        cd $RMT_WORK_DIR \\\;\
        $RMT_TCLSH $RMT_OOMMF batchslave -tk 0 %connect_info batchsolve.tcl"

# Slave initialization script (with batchsolve.tcl proc
# redefinitions)
set init_script {
    # Initialize solver. This is run at global scope
    set basename __BASEMIF__      ;# Base mif filename (global)
    mms_mif New mif
    $mif Read [FindFile ${basename}.mif]
    # Redefine TaskInit and TaskCleanup proc’s
    proc SolverTaskInit { args } {
        global mif outtextfile basename
        set A [lindex $args 0]
        set outbasename "$basename-A$A"
        $mif SetA $A
        $mif SetOutBaseName $outbasename
        set outtextfile [open "$outbasename.odt" "a+"]
        puts $outtextfile [GetTextData header \
                "Run on $basename.mif, with A=[$mif GetA]"]
        flush $outtextfile
    }
    proc SolverTaskCleanup { args } {
        global outtextfile
        close $outtextfile
    }
}
# Substitute $BASEMIF in for __BASEMIF__ in above script
regsub -all -- __BASEMIF__ $init_script $BASEMIF init_script
$TaskInfo SetSlaveInitScript $init_script

# Create task list
foreach A $A_list {
    $TaskInfo AppendTask "A=$A" "BatchTaskRun $A"
}