[Home] [Overview] [Performance] [Applications] [Hardware] [Software]
[Related Sites] [Credits]


Baseline measurements

Fast Ethernet Bandwidth

Network performance using the 3Com Superstack III Fast Ethernet switch, with 3Com 905TX PCI network cards under LAM 6.1.

Myrinet Bandwidth over UDP

Network performance using Myrinet over UDP with MPI (LAM 6.1).

Fortran 90 Benchmarks

Execution Times (secs)

SPARC20 / SunOS SPARC20 / Solaris Intel P6 / Linux 2.0.28) IBM-SP2 / AIX
Banded Cholesky solver 5.2 4.1 1.1* 1.6
GAUSS90 (400x400) 14.6 13.3 3.8* 11.4
heat (2D ADI) 0.89 0.76 0.20 0.38
complex operations 14.7 14.4 4.9 11.3

Sparse matrix kernels

Numerical performance for sparse matrix/vector products using WEST0156 matrix from the Harwell-Boeing collection (see Matrix Market.) Figures use 1 and 10 right hand sides, respectively.

Phase-field algorithm for solidification modeling

Fortran 77 finite-difference application (1200x600)
(Cray optimized)

Machine Execution
time (secs)
Cray C-90 (1 processor) 9.7 secs 50.1%
Alpha EV6 (500 MHz 21264) 23.8 secs 99.0%
Pentium III (500MHz) 34.2 secs 95.1%
Intel Celeron (500MHz/66MHz SDRAM) 45.4 secs 99.9%
Pentium II (400MHz/100MHz SDRAM) 46.5 secs 99.9%
Pentium II (333MHz) 55.8 secs 99.9%
Sun Ultra Sparc 60 (400 MHz) 58.65 secs 97.9%
DEC Alpha (300MHz/Fortran) 65.0 secs 99.9%
Pentium Pro (200MHz) 74.2 secs 99.9%
SGI Power-Challenge (1 node) 74.5 secs 97.5%
Sun Ultra 2 87.4 secs 99.1%
DEC Alpha (333MHz 21164) 94.6 secs 99.9%
IBM SP2 (1 node) 279.1 secs 49.4%

compilation flags

Cray (Cray Fortran) cf -Zv -Wf "-o agress -e mcx -a static"
IBM SP2 xlf -O3
Intel Celeron (Linux 2.2.12, f77 2.91.66) g77 -O3 -funroll-loops
Intel P6 (Linux 2.0.28, g77 2.7.2) g77 -O3 -funroll-loops
SGI Power Challenge f77 -O4
Sun Ultra Sparc 60 (400 MHz), Solaris f77 -O
DEC Alpha / Linux 2.0 (333MHz 21164) f2c | gcc -O4 -funroll-loops
DEC Alpha/Windows NT 4.0 (300MHz 21164) Digital Fortran f77 /fast
Pentium II (333/400 MHz) PowerStation Fortran 4.0 /Ox /Zp4 /G5
Sun Ultra 2 f2c | gcc -O4 -funroll-loops

3-D Helmholtz Solver

Execution Times (secs)
64x64x64 grid

JazzNetII (4 nodes/fast-ethernet) 10.4 secs
Connection Machine CM-5 (32 nodes) 12.0 secs
Jazznet (4 nodes / Myrinet-TCP/IP) 14.6 secs
JazzNetII (1 node) 24.2 secs
Cray C-90 (1 node) 70.47 secs
Sun SPARC20 (1 node) 82.09 secs
Sun Cluster (4 ndoes)
112.5 secs

The Parallel Hierarchical Adaptive MultiLevel Project

Fortran 90 finite-element adaptive multigrid application

Performance of multigrid with fixed problem size

computer (vertices) nproc computation time(sec.) communication time(sec.) total time(sec.) speedup efficiency
Sun (16K) 1 51.54 0.00 51.54 - -
2 31.46 0.96 32.42 1.59 .80
4 16.90 0.90 17.80 2.90 .73
8 9.06 1.08 10.14 5.08 .64
PPro (16K) 1 2.94 0.00 2.94 - -
2 1.30 .20 1.50 1.96 .98
4 0.72 0.19 0.91 3.23 .81
SP2-ethernet (64K) 1 14.42 0.00 14.42 - -
2 7.02 0.58 7.60 1.90 .95
4 3.58 0.79 4.37 3.30 .82
8 1.85 0.63 2.48 5.81 .73
16 0.96 0.93 1.89 7.63 .48

Performance of multigrid with scaled problem size

computer (vertices) nproc computation time(sec.) communication time(sec.) total time(sec.) scaled speedup scaled efficiency
Sun (16K per proc) 1 51.54 0.00 51.54 - -
2 54.28 1.48 55.76 1.85 .92
4 64.70 2.68 67.38 3.06 .78
8 61.40 4.16 65.56 6.29 .79
PPro (16K per proc) 1 2.94 0.00 2.94 - -
2 2.65 1.53 4.18 1.41 .70
4 2.61 0.65 3.26 3.60 .90
SP2-ethernet (64K per proc) 1 14.42 0.00 14.42 - -
2 14.97 1.04 16.01 1.80 .90
4 14.18 1.77 15.95 3.62 .90
8 14.24 2.92 17.16 6.72 .84
16 14.58 5.51 20.09 11.48 .72

Optimal shape design in viscous flows

Course Grid Calculations
MachineElapsed time (minutes)
(4 nodes)start 1start 2start 3
Jazznet 68.7 67.3 68.0
IBM SP2 124.3 113.4 116.3
Sun SPARC20 302.5 321.3 301.2

Dynamics of Material Flow in High-Speed Machining

Tim Burns, ITL, Matt Davies & Chris Evans, MEL, NIST

Single node performance numbers
for MATLAB application

PC 200 MHz P6, Windows NT 4.0 2.7 hours
Jazznet: 200 MHz P6, Linux 2.8.28 2.3 hours
SUN SPARCstation 20 7.0 hours