To this end, it is a good practice to write programs (whenever
possible) in a style which can be run on one processor.
We include a conversion to SPMD example
to illustrate how a typical MPMD program pair can be converted into
a SPMD program.
This step should eliminate any basic indexing errors, and ensure the
serial program logic is correct.
In Fortran, use IMPLICIT NONE, if you don't already.
This helps catch mistyped variable names by flagging them as undeclared.
Debug your PVM code on your own workstation, using multiple tasks,
but a single host.
This step will make a coarse pass at eliminating message passing
Examine code carefully, looking for:
Unmatched send's and recv's
Message tags that don't match
Non-unique message tags; that is, messages which aren't
properly distinguished. (This is a problem especially
when receiving with wildcard (-1) parameters.)
Messages that are unpacked upon receipt in a different order
than they were packed before sending. The ordering must be identical.
Use pvm_catchout() to route node stderr/stdout to your
console window. (Note: This is not available with the current release of
PVMe on the SP2. Node output from PVMe is automaticaly routed to stdout
of the main process.)
Use the PVMDEBUG (Fortran) or
PvmTaskDebug (C) parameter in the spawn
call. This generates an xterm running "dbx" for each spawned process.
(Make sure code is compiled with -g.)
If possible, test to see whether the program behaves differently on
the interactive nodes of the SP2 than it does on a cluster of
If the program runs on a cluster, but fails regularly or
sporadically on the SP2, then the bug is most likely to be
highly sensitive to communication timing, and therefore difficult
to track down. You may wish to seek the assistance of the
scientific consulting staff (975-2968).
Finally, when all else fails...
Try debugging through the batch queue on the SP2 itself.
(This feature will be available with an upcoming version of the
Make sure that your program(s) are compiled with the
-g compiler option to
xlf or xlc.
See the llcreate documentation
for help in generating a loadleveler command file which will enable debugging.
Copy the debugger and
debugger2 scripts to your working directory.
Submit as usual, but only when the queues are empty and
you are free to wait by your console for you program to run.