The NIST Sparse BLAS (v. 0.9)
Performance Studies
Sparse BLAS Homepage
Preliminary Performance Results:
Performance measurements are given for
simple matrixvector and matrixmatrix multiplies for several
sparsity patterns. The measurements reflect compiler
optimization (O3 and loop unrolling) only. Typical performance
for problems with large blocksize and multiple righthandsides
is about 17 Mflops on a Sun Sparc 20 and about 27 Mflops on
an IBM RS6000 Model 590.
The ``Lite'' interface provides no measureable performance gain,
except for some very small problems. The blocked schemes
(BSR and VBR) begin to pay off when blocksize is greater than
5 or 10; for smaller blocksizes, the pointwise (CSR) scheme is
more efficient.
Also available in postscript form:
Preliminary
Performance Studies, July 1996
(34K gzipped postscript file, 9 pages)
The test matrices used in the following tests were generated
by reading sparsity patterns from HarwellBoeing
files, and using these patterns as the block structure for a matrix
of given blocksize. The results shown are for MatrixVector and
MatrixMatrix multiplications only, rather than a full DAXPY, since
we are interested in the efficiency of the sparse code.
Source code for the performance testers is
available from the authors.
The current test results are for the following matrix patterns:

IMPCOL C (137 by 137, 411 nonzeros) Ethylene plant model

WEST0156 (156 by 156, 371 nonzeros) Simple chemical plant model

GRE 115 (115 by 115, 421 nonzeros)
The experimental parameters are blocksize and number of righthandsides.
We present results of testing on a Sun Sparc 20 and an IBM RS6000.
Each data point represents the average result of 4 runs with the same parameters.
Source code for performance testers:
Last updated: July 25, 1996 by
KAR.