The NIST Sparse BLAS (v. 0.9)

Performance Studies

Preliminary Performance Results:

Performance measurements are given for simple matrix-vector and matrix-matrix multiplies for several sparsity patterns. The measurements reflect compiler optimization (-O3 and loop unrolling) only. Typical performance for problems with large blocksize and multiple right-hand-sides is about 17 Mflops on a Sun Sparc 20 and about 27 Mflops on an IBM RS6000 Model 590. The ``Lite'' interface provides no measureable performance gain, except for some very small problems. The blocked schemes (BSR and VBR) begin to pay off when blocksize is greater than 5 or 10; for smaller blocksizes, the point-wise (CSR) scheme is more efficient.

Also available in postscript form:
Preliminary Performance Studies, July 1996
(34K gzipped postscript file, 9 pages)

The test matrices used in the following tests were generated by reading sparsity patterns from Harwell-Boeing files, and using these patterns as the block structure for a matrix of given blocksize. The results shown are for Matrix-Vector and Matrix-Matrix multiplications only, rather than a full DAXPY, since we are interested in the efficiency of the sparse code. Source code for the performance testers is available from the authors.

The current test results are for the following matrix patterns:

IMPCOL C (137 by 137, 411 nonzeros) Ethylene plant model
WEST0156 (156 by 156, 371 nonzeros) Simple chemical plant model
GRE 115 (115 by 115, 421 nonzeros)

The experimental parameters are blocksize and number of right-hand-sides. We present results of testing on a Sun Sparc 20 and an IBM RS6000. Each data point represents the average result of 4 runs with the same parameters.

Source code for performance testers:

Source code for NIST Sparse BLAS performance testing routines (39K gzipped shar file)

Last updated: July 25, 1996 by KAR.