The PhaseField method is a mathematical technique, based in
thermodynamics, for describing the process of phase transition in a
material (e.g. from a liquid to solid). One of the distinguishing
characteristics of the phase field approach is that the interface
between phases is diffuse. This is in contrast to methods which
assume a sharp interface between phases, that is, each point in the
material is either fully solid or fully liquid, with the interface
describing a moving 2dimensional surface, complete with associated
boundary conditions. The phasefield is an order parameter
introduced to the model which holds the phase value, from 0.0 (pure
liquid) to 1.0 (pure solid), for each point in the volume of material.
Although the phasefield does not necessarily correspond directly to
some physical material parameter (although it can), the introduction
to the model of this order parameter is a useful mathematical
technique commonly used in material science. The phase field method
reduces to the sharp interface picture of a phase transition in the
limit where the interface thickness goes to zero.
One advantage gained by using the phasefield method to model phase
transitions, compared to the more commonly used sharpinterface
method, is that the explicit tracking of the moving surface, the
liquid and solid interface, is completely avoided. Instead, the phase
of each point in the simulated volume is computed at each time step.
The liquidsolid interface can be determined, and viewed, after the
computation completes by postcalculating isosurfaces of constant
phase field (say at a value of 1/2). In addition, the phase field
method incorporates many types of physical phenomena without ad hoc
introduction.
The Phase Field Method has been parallelized.

Why Parallelize the Phase Field Method?
In contrast to the sharpinterface method, the phasefield method
updates the state of every point in the simulated volume on each
iteration. This requires the use of full 3dimensional arrays to hold
the phase field as well as other modeled parameters. This results in a
large memory requirement for a phasefield method based simulator.
For full resolution 3dimensional simulations, the size of the
computational grid over which these phasefield simulations are run
can require a large amount of memory, 50 GB to 100 GB, as well as a
large number of floating point computations per time step. Single
processor computers and small SMPs with 16 to 32 processors simply do
not contain enough main memory to hold these computations, so a
parallel version is necessary. It is this large memory requirement
that has, until recently, prevented the regular use of the phasefield
method. Now that large parallel machines and clusters have become
commonly available this method has become feasible.


How is the Parallelization Realized?
The simulator we have developed, which uses the phasefield method for
modeling the solidification of an alloy, is implemented as a portable
parallel C program using MPI (Message Passing Interface) for
interprocessor communication.
The 3dimensional arrays used in this simulator are distributed over
the available processors such that each processor can complete one
iteration of the simulation in approximately the same amount of time
as the other processors. In the simplest case, if all processors are
equally powerful and equally loaded, the array elements can be
distributed evenly among the processors. Our implementation
distributes the arrays in blocks along one axis although two or three
dimensional distributions are possible. At the end of each iteration,
processors holding adjacent array elements exchange those elements for
use in the computation of the following iteration.
The load balance between the processors is monitored within the
application and the distribution of the arrays is adjusted, between
iterations, whenever a significant imbalance is detected. Both the
frequency of these load balancing operations and the threshold of what
is considered a "significant" load imbalance is easily configured by
the user.
The distribution and load balancing of the arrays as well as the
exchanging of neighboring array elements is managed by the routines in
the utility library
DParLib,
an MPI (Message Passing Interface) based
library for simplifying the coding of dataparallel style programs.


What is the Performance of the Parallel Code?
Our dendritic growth simulator, bin3d, has been run on all of our local
parallel machines and exhibits excellent performance and scales well as
we increase the number of processors used for a given simulation size.
The simulation updates each point in a uniform 3dimensional grid
of points using a simple finitedifference approximation
to the partial differential equations that describe the system.
The bin3d simulator uses 8 3dimensional arrays of
size (nxnxn), each holding
double precision floating point values. A near term goal is to
perform simulations on grids of size 1000x1000x1000.
Figure 1 shows the resulting perprocessor memory requirement
as a function the number of processors used for n=500 and
n=1000.

Fig 1: Memory requirement per node for bin3d. 
An analysis of the bin3d algorithm shows that the expected run time
for this simulator is .
where the computational grid is of size
.
This reflects the fact that each grid point is updated on each time
step of the simulation, for an order
computation on each iteration,
and the number of iterations required for the liquidsolid interface
to reach the edge of the volume increases linearly with
.
Assuming a perfect speedup with the number of processors, this
expected execution time
can be modeled with the equation:
where
is the number of computational grid points along each of the
three axes, and
is the number of processors used.
Using several sets of test runs on our local machines, including
an IBM SP with 200 MHz POWER3 processors, and a Beowulf cluster of 333 MHz
Pentium II processors, we have plotted actual execution times
against a curve fitted to our expected performance model
.

Fig 2: Execution time on 24 200 MHz POWER 3 processors. The plotted
points are actual times and the curve fitted to the T(n,p) function.


Fig 3: Execution time on 16 333 MHz Pentium III processors.
The plotted points are actual times and the curve fitted to
the T(n,p) function.

Using the T(n,p) functions determined from Figs. 2 and 3, we next
show the results from a series of timing tests for a fixed problem
size of n=250 and a varying number of processors.
These results are plotted along with the prediction from the corresponding
T(n,p) function we previously determined.
These results, which show good agreement with the predicted
values, are shown in Figs. 4 and 5. Some error begins to occur
as the ratio of n/p begins to get too small and the
communication overhead begins to become significant compared
to the computational load on each processor.

Fig 4: Execution time, on 200 MHz IBM POWER 3 processors,
for bin3d with n=250.
The plotted points are actual times and the curve is the function
T(n,p) from Fig. 2.


Fig 5: Execution time, on 333 MHz Pentium III processors,
for bin3d with n=250.
The plotted points are actual times and the curve is the function
T(n,p) from Fig. 3.

Finally, we use the T(n,p) functions to predict the execution time
for our target problem of n=1000 on both the IBM SP and on the
Beowulf cluster of PCs. Assuming each processor has 1 GiB of main
memory, and from the memory requirement plot in Fig. 1 we see that we
need to use at least 70 processors to perform our full sized
simulation. Figures 6 and 7 shows that this simulation, using
70 processors, would require
approximately 4 days to complete on the IBM SP and approximately 10
days on the PC cluster. Of course these times will be be reduced as
we upgrade the processing nodes in our parallel machines to more
current technology and also if we simply use more processors.

Fig 6: Predicted execution time for bin3d
on 200 MHz IBM POWER 3 processors for problems of size n=500
and n=1000.


Fig 7: Predicted execution time for bin3d
on 333 MHz Pentium III processors for problems of size n=500
and n=1000.






Modeling Dendritic Growth in Metallic Alloys


Papers/Presentations

William L. George and James A. Warren, A Parallel 3D Dendritic Growth Simulator Using the Phasefield Method,
Journal of Computational Physics, 177,
2002,
pp. 264283.
Links:
postscript and pdf.


James S. Sims, William L. George, Steven G. Satterfield, Howard K. Hung, John G. Hagedorn, Peter M. Ketcham, Terence J. Griffin, Stanley A. Hagstrom, Julien C. Franiatte, Garnett W. Bryant, W. Jaskolski, Nicos S. Martys, Charles E. Bouldin, Vernon Simmons, Olivier P. Nicolas, James A. Warren, Barbara A. am Ende, John E. Koontz, B. James Filla, Vital G. Pourprix, Stefanie R. Copley, Robert B. Bohn, Adele P. Peskin, Yolanda M. Parker and Judith E. Devaney, Accelerating Scientific Discovery Through Computation
and Visualization II,
NIST Journal of Research, 107
(3)
,
MayJune, 2002,
pp. 223245.
Links:
postscript and pdf.


James S. Sims, John G. Hagedorn, Peter M. Ketcham, Steven G. Satterfield, Terence J. Griffin, William L. George, Howland A. Fowler, Barbara A. am Ende, Howard K. Hung, Robert B. Bohn, John E. Koontz, Nicos S. Martys, Charles E. Bouldin, James A. Warren, David L. Feder, Charles W. Clark, B. James Filla and Judith E. Devaney, Accelerating Scientific Discovery Through Computation
and Visualization,
NIST Journal of Research, 105
(6)
,
NovemberDecember, 2000,
pp. 875894.
Links:
postscript and pdf.



