Overview
The OBS provides support for complex scheduling of multiple batch jobs
via the two files batchmaster.tcl and batchslave.tcl plus
a user defined task script. The task script may be modeled after the
included simpletask.tcl or multitask.tcl sample scripts.
The OBS has been designed to control multiple sequential and concurrent micromagnetic simulations, but the scheduling scripts batchmaster.tcl and batchslave.tcl are completely general and may be used to schedule other types of jobs as well.
When batchmaster.tcl is run, it sources a task script
that should modify the global object $TaskInfo to inform the
master what tasks to perform and optionally how to launch slaves to
perform those tasks. This is detailed in the Batch task
scripts section. After this,
batchmaster.tcl launches all the specified slaves, initializes
each with a slave initialization script, and then feeds tasks
sequentially from the task list to the slaves. When a slave completes
a task it reports back to the master and is given the next unclaimed
task. If there are no more tasks, the slave is shut down. When all
the tasks are complete, the master prints a summary of the tasks and
exits. Note: At present, launching and controlling jobs off the local
machine requires a working rsh
command or equivalent.
Launching
The batch scheduler is launched by the command line:
tclsh app/oommf/oommf.tcl batchmaster [-notk] <task_script.tcl> \ [server_host [server_port]]where
The required <task_script.tcl> should be based on the included example scripts. If slaves are to be run on remote machines, then server_host must be set to the local machine's network name, and the $TaskInfo methods AppendSlave and ModifyHostList will need to be called from inside the task script. (Details.)
The command line to start the master script is shown above. The slave, which is launched by the master using instructions specified in the task script, takes a command line of the form
tclsh app/oommf/oommf.tcl batchslave $server(host) $server(port) \ $slaveid $passwd [aux_script [aux_args]]The arguments $server(host) through $passwd are provided by the master and should be specified in the task script using the %connect_info percent substitution token. The aux_script is a script for the slave to source before processing any commands from the master. Typically this will be the micromagnetic batch mode script, batchsolve.tcl. The aux_args are additional arguments that will be passed to aux_script.
$TaskInfo AppendTask A "BatchTaskRun taskA.mif"This method expects two arguments, a label for the task (here ``A'') and a script to accomplish the task. The script will be passed across a network socket from the master to the slave, and then the script will be interpreted by the slave. (In particular, keep in mind that the file system seen by the script will be that of the machine on which the slave process is running.)
This example uses the default batchsolve.tcl procs to run the simulation defined by the taskA.mif MIF file. If you want to make changes to the MIF problem specifications on the fly, you will need to modify the default procs. This is done by creating a slave initialization script, via the call
$TaskInfo SetSlaveInitScript { <insert script here> }The slave initialization script does global initializations, and also generally redefines the SolverTaskInit proc; optionally the BatchTaskRelaxCallback and SolverTaskCleanup procs may be redefined as well. At the start of each task SolverTaskInit is called by BatchTaskRun (in batchsolve.tcl), at each equilibrium BatchTaskRelaxCallback is executed, and at the end of each task SolverTaskCleanup is called. The first and third are passed the arguments that were passed to BatchTaskRun. A simple SolverTaskInit proc could be
proc SolverTaskInit { args } { global mif basename outtextfile set A [lindex $args 0] set outbasename "$basename-A$A" $mif SetA $A $mif SetOutBaseName $outbasename set outtextfile [open "$outbasename.odt" "a+"] puts $outtextfile [GetTextData header \ "Run on $basename.mif, with A=[$mif GetA]"] }This proc receives the exchange constant A for this task on the argument list, and makes use of the global variables mif and basename. (Both should be initialized in the slave initialization script outside the SolverTaskInit proc.) It then stores the requested value of A in the mif object, sets up the base filename to use for output, and opens a text file to which tabular data will be appended. The handle to this text file is stored in the global outtextfile, which is closed by the default SolverTaskCleanup proc. A corresponding task script could be
$TaskInfo AppendTask "A=13e-12 J/m" "BatchTaskRun 13e-12"which runs a simulation with A set to 13e-12 J/m. This example is taken from the multitask.tcl sample script.
If you want to run more than one task at a time, then the $TaskInfo method AppendSlave will have to be invoked. This takes the form
$TaskInfo AppendSlave <spawn count> <spawn command>where <spawn command> is the command to launch the slave process, and <spawn count> is the number of slaves to launch with this command. (Typically <spawn count> should not be larger than the number of processors on the target system.) The default value for this item (which gets overwritten with the first call to $TaskInfo AppendSlave) is
1 {exec %tclsh %oommf batchslave -notk %connect_info batchsolve.tcl}This uses the OOMMF bootstrap program to launch the batchslave application, with connection information provided by the master, and using the auxiliary script batchsolve.tcl. The batchmaster script provides several percent-style substitutions useful in slave launch scripts: %tclsh, %oommf, %connect_info, %oommf_root, and %%. The first is the Tcl shell to use, the second is the absolute path to the OOMMF bootstrap program on the master machine, the third is connection information needed by the batchslave application, the fourth is the path to the OOMMF root directory on the master machine, and the last is interpreted as a single percent.
To launch slaves on a remote host, use rsh in the spawn command, e.g.,
$TaskInfo AppendSlave 1 {exec rsh foo tclsh oommf/app/oommf/oommf.tcl \ batchslave -notk %connect_info batchsolve.tcl}This example assumes tclsh is in the execution path on the remote machine foo, and OOMMF is installed off of your home directory. In addition, you will have to add the machine foo to the host connect list with
$TaskInfo ModifyHostList +fooand batchmaster must be run with the network interface specified as the server host (instead of the default localhost), e.g.,
tclsh app/oommf/oommf.tcl batchmaster multitask.tcl barwhere bar is the name of the local machine.
This may seem a bit complicated, but the examples in the next section should make things clearer.
tclsh app/oommf/oommf.tcl batchmaster simpletask.tclThis example uses the default slave launch script, so a single slave is launched on the current machine, and the 3 simulations will be run sequentially. Also, no slave init script is given, so the default procs in batchsolve.tcl are used. Output will be magnetization states and tabular data for each equilibrium state, stored in files on the local machine with base names as specified in the MIF files.
# FILE: simpletask.tcl # # This is a sample batch task file. Usage example: # # tclsh app/oommf/oommf.tcl batchmaster simpletask.tcl # Form task list $TaskInfo AppendTask A "BatchTaskRun taskA.mif" $TaskInfo AppendTask B "BatchTaskRun taskB.mif" $TaskInfo AppendTask C "BatchTaskRun taskC.mif"
The second task script is a more complicated example running concurrent processes on two machines. This script should be run with the command
tclsh app/oommf/oommf.tcl batchmaster multitask.tcl barwhere bar is the name of the local machine.
Near the top of the multitask.tcl script several Tcl variables (RMT_MACHINE through A_list) are defined; these are used farther down in the script. The remote machine is specified as foo, which is used in the $TaskInfo AppendSlave and $TaskInfo ModifyHostList commands.
There are two AppendSlave commands, one to run two slaves on the local machine, and one to run a single slave on the remote machine (foo). The latter changes to a specified working directory before launching the batchslave application on the remote machine. (For this to work you must have rsh configured properly. In the future it may be possible to launch remote commands using the OOMMF account server application, thereby lessening the reliance on system commands like rsh.)
Below this the slave init script is defined. The Tcl regsub command is used to place the task script defined value of BASEMIF into the init script template. The init script is run on the slave when the slave is first brought up. It first reads the base MIF file into a newly created mms_mif instance. (The MIF file needs to be accessible by the slave process, irrespective of which machine it is running on.) Then replacement SolverTaskInit and SolverTaskCleanup procs are defined. The new SolverTaskInit interprets its first argument as a value for the exchange constant A. Note that this is different from the default SolverTaskInit proc, which interprets its first argument as the name of a MIF file to load. With this task script, a MIF file is read once when the slave is brought up, and then each task redefines only the value of A for the simulation (and corresponding changes to the output filenames and data table header).
Finally, the Tcl loop structure
foreach A $A_list { $TaskInfo AppendTask "A=$A" "BatchTaskRun $A" }is used to build up a task list consisting of one task for each value of A in A_list (defined at the top of the task script). For example, the first value of A is 10e-13, so the first task will have the label A=10e-13 and the corresponding script is BatchTaskRun 10e-13. The value 10e-13 is passed on by BatchTaskRun to the SolverTaskInit proc, which has been redefined to process this argument as the value for A, as described above.
There are 6 tasks in all, and 3 slave processes, so the first three tasks will run concurrently in the 3 slaves. As each slave finishes it will be given the next task, until all the tasks are complete.
# FILE: multitask.tcl # # This is a sample batch task file. Usage example: # # tclsh app/oommf/oommf.tcl batchmaster multitask.tcl hostname [port] # # Task script configuration set RMT_MACHINE foo set RMT_TCLSH tclsh set RMT_OOMMF "/path/to/oommf/app/oommf/oommf.tcl" set RMT_WORK_DIR "/path/to/oommf/app/mmsolve/data" set BASEMIF taskA set A_list { 10e-13 10e-14 10e-15 10e-16 10e-17 10e-18 } # Slave launch commands $TaskInfo ModifyHostList +$RMT_MACHINE $TaskInfo AppendSlave 2 "exec %tclsh %oommf batchslave -notk \ %connect_info batchsolve.tcl" $TaskInfo AppendSlave 1 "exec rsh $RMT_MACHINE \ cd $RMT_WORK_DIR \\\;\ $RM_TCLSH $RMT_OOMMF batchslave -notk %connect_info batchsolve.tcl" # Slave initialization script (with batchsolve.tcl proc # redefinitions) set init_script { # Initialize solver. This is run at global scope set basename __BASEMIF__ ;# Base mif filename (global) mms_mif New mif $mif Read [FindFile ${basename}.mif] # Redefine TaskInit and TaskCleanup proc's proc SolverTaskInit { args } { global mif outtextfile basename set A [lindex $args 0] set outbasename "$basename-A$A" $mif SetA $A $mif SetOutBaseName $outbasename set outtextfile [open "$outbasename.odt" "a+"] puts $outtextfile [GetTextData header \ "Run on $basename.mif, with A=[$mif GetA]"] flush $outtextfile } proc SolverTaskCleanup { args } { global outtextfile close $outtextfile } } # Substitute $BASEMIF in for __BASEMIF__ in above script regsub -all -- __BASEMIF__ $init_script $BASEMIF init_script $TaskInfo SetSlaveInitScript $init_script # Create task list foreach A $A_list { $TaskInfo AppendTask "A=$A" "BatchTaskRun $A" }