SPBLASTK Distribution/Installation Notes
Go back...
--------------------------------------------------------------------------
*** For installation instructions, skip to Section 1.
--------------------------------------------------------------------------
NIST Sparse BLAS
Toolkit Implementation
NIST SPARSE BLAS v. x.x
--------------------------------------------------------------------------
Authors:
Karin A. Remington and Roldan Pozo
National Institute of Standards and Technology
Based on the interface standard proposed in:
"A Revised Proposal for a Sparse BLAS Toolkit" by
S. Carney and K. Wu -- University of Minnesota
M. Heroux and G. Li -- Cray Research
R. Pozo and K.A. Remington -- NIST
Contact:
Karin A. Remington, email: [email protected]
--------------------------------------------------------------------------
Contents:
0. Release Notes
1. Installation Intructions
2. Toolkit Interface
3. Developer's Interface
4. Source Code Generation
-----------------------------
Section 0. Release Notes
-----------------------------
What's included:
The package includes support for the "BASIC" Toolkit, including
matrix-multiply and triangular solve routines for the following
sparse matrix formats:
csr - compressed sparse row
csc - compressed sparse column
coo - coordinate
bsr - block sparse row
bsc - block sparse column
bco - block coordinate
vbr - variable block row
What's * NOT * included:
The following is **NOT** included in this release:
-- support for triangular solves for the block coordinate (bco) scheme
-- support for non-contiguous block storage in the block formats
What's required:
Minimum: ANSI C compiler
12 MB Free disk space
Optional: Fortran compiler (for testing fortran interfaces)
AWK and SED (for re-generating kernel source code)
Testing:
The testing directories contain both matrix-multiply and triangular
solve testers for each supported storage scheme. C and Fortran
testers are both included, and can be used a examples for library usage.
This distribution has been tested under the following OS/compiler
configurations:
sunos4.1.4: gcc 2.7.0, gcc 2.7.2 and acc 3.0.1
sunsolaris2.4:gcc 2.7.0 (no RANLIB, see makefile.def)
AIX.1.1: xlc
sgi-irix5.3: gcc 2.7.0
Bug reports:
Please send bug reports to [email protected].
---------------------------------------
Section 1. Installation Intructions
---------------------------------------
The installation of the Sparse BLAS Toolkit is automated with
the "make" utility. To use "make" to build the library:
1. Edit the file ./makefile.def to reflect your system setup:
- The minimum installation requires an ANSI C compiler.
- An extended installation which includes Fortran
callable routines and testers is available.
If the presence of a Fortran compiler is indicated in
the makefile.def file, the extended version will be installed.
- The archival process by default uses "ranlib". If this
is not available on your system, set HASRANLIB to 'f'.
2. Type:
"make install" (**) to build the library AND make and run
the C and Fortran testers
"make installc" to build the library AND make and run the C testers
"make library" to build the archive file ./lib/libsptk.a
(tests are not built)
"make testc" to build and run the C testers
(library must be pre-built)
"make testf77" (**) to build and run the Fortran testers
(library must be pre-built)
(**) requires a Fortran compiler
3. For space-saving cleanup, type "make clean" to remove all .o files.
--------------------------------
Section 2. Toolkit Interface
--------------------------------
The Toolkit interface, along with the decision trees for
calling the proper kernel routine for a given set of input
values are implemented in the files
./src_tkc/_xxxmm_c.c and _xxxsm_c.c (C bindings)
./src_tkf/_xxxmm_f.c and _xxxsm_f.c (Fortran bindings)
where:
xxx is the matrix storage format (csr, csc, coo, etc.)
mm indicates matrix multiply routine
sm indicates triangular solve routine
**********************************************************************
* For a complete description of the Sparse BLAS Toolkit interface, *
* see: "A Revised Proposal for a Sparse BLAS Toolkit", an article by *
* S. Carney, M. Heroux, G. Li, R. Pozo, K. Remington and K. Wu. *
* http://www.cray.com/products/applications/support/scal/spblastk.ps *
**********************************************************************
---------------------------------------
Section 3. Developer's Interface
---------------------------------------
FILE STRUCTURE:
The FILE structure for the internal routines of the Sparse BLAS
Toolkit keys filenames to storage format and computation type.
The filenames follow these two templates:
multiply: _xxxyml.c
triangular solve: _xxxytsl.c
where:
xxx is the matrix storage format (csr, csc, coo, etc.)
y v - single column result ( n = 1 )
m - multiple column result ( n > 1 )
ROUTINES:
The routines in the NIST Sparse BLAS library follow a naming
convention which encodes specific kernels drawn from the generic
routine.
The source for the library is divided into separate files for
each storage format and matrix or vector computation combination.
The following files are used in this distribution:
dbcomml.c dbscvtsl.c dcoomml.c dcscvtsl.c dutil.c
dbcovml.c dbsrmml.c dcoovml.c dcsrmml.c dvbrmml.c
dbscmml.c dbsrmts.c dcscmml.c dcsrmts.c dvbrmts.c
dbscmts.c dbsrmtsl.c dcscmts.c dcsrmtsl.c dvbrmtsl.c
dbscmtsl.c dbsrvml.c dcscmtsl.c dcsrvml.c dvbrvml.c
dbscvml.c dbsrvts.c dcscvml.c dcsrvts.c dvbrvts.c
dbscvts.c dbsrvtsl.c dcscvts.c dcsrvtsl.c dvbrvtsl.c
VECTOR/MATRIX MULTIPLY ROUTINES:
Each MULTIPLY file contains all of the either vector or matrix
"lite" kernel routines for the following 6 kernels. (dxxxvml.c
contains the vector routines, dxxxmml.c contains the matrix
or multiple right-hand-side routines.)
CAB = C <- A*B
CABC = C <- A*B + C
CaAB = C <- alpha*A*B
CaABC = C <- alpha*A*B + C
CABbC = C <- A*B + beta*C
CaABbC = C <- alpha*A*B + beta*C
In the cases where storage formats do not allow directly calling
an alternate kernel for performing the transpose multiplication
(all except CSR and CSC), the following kernels are also included:
CATB = C <- A'*B
CATBC = C <- A'*B + C
CaATB = C <- alpha*A'*B
CaATBC = C <- alpha*A'*B + C
CATBbC = C <- A'*B + beta*C
CaATBbC = C <- alpha*A'*B + beta*C
For each of these kernels, there is a basic vector/matrix multiply,
and a skew symmetric vector/matrix multiply:
void XXX_Mult__TYPE
void XXXskew_Mult__TYPE
For the non-transpose kernels, there is also a symmetric vector/matrix
multiply routine:
void XXXsymm_Mult__TYPE
Calling sequences for these routines are similar to the Toolkit
interface, but with meaningless arguments for each special case
eliminated. See the User's Guide or the include header files for
specific calling sequences.
VECTOR/MATRIX TRIANGULAR SOLVE ROUTINES:
Each TRIANGULAR SOLVE file contains all of the either vector or matrix
"lite" kernel routines for the following 24 kernels. (dxxxvml.c
contains the vector routines, dxxxmml.c contains the matrix
or multiple right-hand-side routines.)
CAB = C <- A*B
CaAB = C <- alpha*A*B
CABC = C <- A*B + C
CaABC = C <- alpha*A*B + C
CABbC = C <- A*B + beta*C
CaABbC = C <- alpha*A*B + beta*C
CDAB = C <- DL*A*B
CaDAB = C <- alpha*DL*A*B
CDABC = C <- DL*A*B + C
CaDABC = C <- alpha*DL*A*B + C
CDABbC = C <- DL*A*B + beta*C
CaDABbC = C <- alpha*DL*A*B + beta*C
CADB = C <- A*DR*B
CaADB = C <- alpha*A*DR*B
CADBC = C <- A*DR*B + C
CaADBC = C <- alpha*A*DR*B + C
CADBbC = C <- A*DR*B + beta*C
CaADBbC = C <- alpha*A*DR*B + beta*C
CDADB = C <- DL*A*DR*B
CaDADB = C <- alpha*DL*A*DR*B
CDADBC = C <- DL*A*DR*B + C
CaDADBC = C <- alpha*DL*A*DR*B + C
CDADBbC = C <- DL*A*DR*B + beta*C
CaDADBbC = C <- alpha*DL*A*DR*B + beta*C
In the cases where storage formats do not allow directly calling
an alternate kernel for performing the transpose multiplication
(all except CSR and CSC), transpose kernels are also included.
For each of these kernels, there are two unit-diagonal triangular
solve routines, and for point-entry formats there are also two
non-unit-diagonal triangular solve routines.
XXX_TriangSlvUU__TYPE (Upper triangular, Unit diag.)
XXX_TriangSlvLU__TYPE (Lower triangular, Unit diag.)
XXX_TriangSlvUD__TYPE (Upper triangular, non-unit Diag.)
XXX_TriangSlvLD__TYPE (Lower triangular, non-unit Diag.)
Calling sequences for these routines are similar to the Toolkit
interface, but with meaningless arguments for each special case
eliminated. See the User's Guide or the include header files for
specific calling sequences.
--------------------------------------------------------------------------
-----------------------------------
Section 4. Source code generation
-----------------------------------
The SRC_GEN directory contains generic source files:
bcomm.c bsrmm.c cscmm.c csrmts.c
bscmm.c bsrmts.c cscmts.c vbrmm.c
bscmts.c coomm.c csrmm.c vbrmts.c
along with generator scripts for creating the NIST Sparse BLAS kernel
routines from these generic source files.
These source files are used as "master files", and are written in such
a way that special case routines can be generated by relatively simple
shell scripts which use "sed" and "awk" for text replacement.
The approach saves considerable programming effort by generating most
source files automatically, and reduces errors by ensuring that
any changes are propagated throughout all of the related source code.
The master files provide working source code for the most general
version of the kernel routine. This is where real programming effort
should be expended to optimized the library. The code is commented
with tags which can be used to selectively delete code for special
case routines. The "rules" for creating each special case file
are defined in the SRC_GEN/kernels subdirectory. The kernels subdirectory
contains the files
CAB CADBbC CDADBC CaADB CaDABbC
CABC CDAB CDADBbC CaADBC CaDADB
CABbC CDABC CaAB CaADBbC CaDADBC
CADB CDABbC CaABC CaDAB CaDADBbC
CADBC CDADB CaABbC CaDABC
one representing each of the specializations from the generic master
code, along with kernel files for the master codes. Each of these
kernel files contains pointers to appropriate "Definition" files,
in the directory SRC_GEN/Defs, which are used to build up the
sed script for the text replacement to generate the kernel routines.
For typical use, these kernel and definition files would never have
to be touched. Many modifications (say for optimization) can be made
to the master source files without requiring any change whatsoever
to the file generation mechanism. The only source code changes which
would affect code generation would be those which alter the
relationship between the comment tags and the related source.
A more detailed explanation of the mechanism, and requirements
for modifications, will be forthcoming in the 1.0 release.
After making any necessary changes to these "master" source files,
the library source files may be generated via the "create" script
(automated in the "make" process in this directory with "make install"
or "make re-install").
** IMPORTANT NOTE **
Any changes to source for any routines below the Toolkit interface
layer ** MUST ** be made in the ../SRC_GEN directory to be retained and
propagated to all appropriate kernel routines.
Changes to the Toolkit interface routines, however, should be made
directly in the directory ../src_tk[c|f].)
** IMPORTANT NOTE **
--------------------------------------------------------------------------