-------------------------------------------------------------------------- *** For installation instructions, skip to Section 1. -------------------------------------------------------------------------- NIST Sparse BLAS Toolkit Implementation NIST SPARSE BLAS v. x.x -------------------------------------------------------------------------- Authors: Karin A. Remington and Roldan Pozo National Institute of Standards and Technology Based on the interface standard proposed in: "A Revised Proposal for a Sparse BLAS Toolkit" by S. Carney and K. Wu -- University of Minnesota M. Heroux and G. Li -- Cray Research R. Pozo and K.A. Remington -- NIST Contact: Karin A. Remington, email: kremington@nist.gov -------------------------------------------------------------------------- Contents: 0. Release Notes 1. Installation Intructions 2. Toolkit Interface 3. Developer's Interface 4. Source Code Generation ----------------------------- Section 0. Release Notes ----------------------------- What's included: The package includes support for the "BASIC" Toolkit, including matrix-multiply and triangular solve routines for the following sparse matrix formats: csr - compressed sparse row csc - compressed sparse column coo - coordinate bsr - block sparse row bsc - block sparse column bco - block coordinate vbr - variable block row What's * NOT * included: The following is **NOT** included in this release: -- support for triangular solves for the block coordinate (bco) scheme -- support for non-contiguous block storage in the block formats What's required: Minimum: ANSI C compiler 12 MB Free disk space Optional: Fortran compiler (for testing fortran interfaces) AWK and SED (for re-generating kernel source code) Testing: The testing directories contain both matrix-multiply and triangular solve testers for each supported storage scheme. C and Fortran testers are both included, and can be used a examples for library usage. This distribution has been tested under the following OS/compiler configurations: sunos4.1.4: gcc 2.7.0, gcc 2.7.2 and acc 3.0.1 sunsolaris2.4:gcc 2.7.0 (no RANLIB, see makefile.def) AIX.1.1: xlc sgi-irix5.3: gcc 2.7.0 Bug reports: Please send bug reports to kremington@nist.gov. --------------------------------------- Section 1. Installation Intructions --------------------------------------- The installation of the Sparse BLAS Toolkit is automated with the "make" utility. To use "make" to build the library: 1. Edit the file ./makefile.def to reflect your system setup: - The minimum installation requires an ANSI C compiler. - An extended installation which includes Fortran callable routines and testers is available. If the presence of a Fortran compiler is indicated in the makefile.def file, the extended version will be installed. - The archival process by default uses "ranlib". If this is not available on your system, set HASRANLIB to 'f'. 2. Type: "make install" (**) to build the library AND make and run the C and Fortran testers "make installc" to build the library AND make and run the C testers "make library" to build the archive file ./lib/libsptk.a (tests are not built) "make testc" to build and run the C testers (library must be pre-built) "make testf77" (**) to build and run the Fortran testers (library must be pre-built) (**) requires a Fortran compiler 3. For space-saving cleanup, type "make clean" to remove all .o files. -------------------------------- Section 2. Toolkit Interface -------------------------------- The Toolkit interface, along with the decision trees for calling the proper kernel routine for a given set of input values are implemented in the files ./src_tkc/_xxxmm_c.c and _xxxsm_c.c (C bindings) ./src_tkf/_xxxmm_f.c and _xxxsm_f.c (Fortran bindings) where: xxx is the matrix storage format (csr, csc, coo, etc.) mm indicates matrix multiply routine sm indicates triangular solve routine ********************************************************************** * For a complete description of the Sparse BLAS Toolkit interface, * * see: "A Revised Proposal for a Sparse BLAS Toolkit", an article by * * S. Carney, M. Heroux, G. Li, R. Pozo, K. Remington and K. Wu. * * http://www.cray.com/products/applications/support/scal/spblastk.ps * ********************************************************************** --------------------------------------- Section 3. Developer's Interface --------------------------------------- FILE STRUCTURE: The FILE structure for the internal routines of the Sparse BLAS Toolkit keys filenames to storage format and computation type. The filenames follow these two templates: multiply: _xxxyml.c triangular solve: _xxxytsl.c where: xxx is the matrix storage format (csr, csc, coo, etc.) y v - single column result ( n = 1 ) m - multiple column result ( n > 1 ) ROUTINES: The routines in the NIST Sparse BLAS library follow a naming convention which encodes specific kernels drawn from the generic routine. The source for the library is divided into separate files for each storage format and matrix or vector computation combination. The following files are used in this distribution: dbcomml.c dbscvtsl.c dcoomml.c dcscvtsl.c dutil.c dbcovml.c dbsrmml.c dcoovml.c dcsrmml.c dvbrmml.c dbscmml.c dbsrmts.c dcscmml.c dcsrmts.c dvbrmts.c dbscmts.c dbsrmtsl.c dcscmts.c dcsrmtsl.c dvbrmtsl.c dbscmtsl.c dbsrvml.c dcscmtsl.c dcsrvml.c dvbrvml.c dbscvml.c dbsrvts.c dcscvml.c dcsrvts.c dvbrvts.c dbscvts.c dbsrvtsl.c dcscvts.c dcsrvtsl.c dvbrvtsl.c VECTOR/MATRIX MULTIPLY ROUTINES: Each MULTIPLY file contains all of the either vector or matrix "lite" kernel routines for the following 6 kernels. (dxxxvml.c contains the vector routines, dxxxmml.c contains the matrix or multiple right-hand-side routines.) CAB = C <- A*B CABC = C <- A*B + C CaAB = C <- alpha*A*B CaABC = C <- alpha*A*B + C CABbC = C <- A*B + beta*C CaABbC = C <- alpha*A*B + beta*C In the cases where storage formats do not allow directly calling an alternate kernel for performing the transpose multiplication (all except CSR and CSC), the following kernels are also included: CATB = C <- A'*B CATBC = C <- A'*B + C CaATB = C <- alpha*A'*B CaATBC = C <- alpha*A'*B + C CATBbC = C <- A'*B + beta*C CaATBbC = C <- alpha*A'*B + beta*C For each of these kernels, there is a basic vector/matrix multiply, and a skew symmetric vector/matrix multiply: void XXX_Mult_ _TYPE void XXXskew_ Mult_ _TYPE For the non-transpose kernels, there is also a symmetric vector/matrix multiply routine: void XXXsymm_ Mult_ _TYPE Calling sequences for these routines are similar to the Toolkit interface, but with meaningless arguments for each special case eliminated. See the User's Guide or the include header files for specific calling sequences. VECTOR/MATRIX TRIANGULAR SOLVE ROUTINES: Each TRIANGULAR SOLVE file contains all of the either vector or matrix "lite" kernel routines for the following 24 kernels. (dxxxvml.c contains the vector routines, dxxxmml.c contains the matrix or multiple right-hand-side routines.) CAB = C <- A*B CaAB = C <- alpha*A*B CABC = C <- A*B + C CaABC = C <- alpha*A*B + C CABbC = C <- A*B + beta*C CaABbC = C <- alpha*A*B + beta*C CDAB = C <- DL*A*B CaDAB = C <- alpha*DL*A*B CDABC = C <- DL*A*B + C CaDABC = C <- alpha*DL*A*B + C CDABbC = C <- DL*A*B + beta*C CaDABbC = C <- alpha*DL*A*B + beta*C CADB = C <- A*DR*B CaADB = C <- alpha*A*DR*B CADBC = C <- A*DR*B + C CaADBC = C <- alpha*A*DR*B + C CADBbC = C <- A*DR*B + beta*C CaADBbC = C <- alpha*A*DR*B + beta*C CDADB = C <- DL*A*DR*B CaDADB = C <- alpha*DL*A*DR*B CDADBC = C <- DL*A*DR*B + C CaDADBC = C <- alpha*DL*A*DR*B + C CDADBbC = C <- DL*A*DR*B + beta*C CaDADBbC = C <- alpha*DL*A*DR*B + beta*C In the cases where storage formats do not allow directly calling an alternate kernel for performing the transpose multiplication (all except CSR and CSC), transpose kernels are also included. For each of these kernels, there are two unit-diagonal triangular solve routines, and for point-entry formats there are also two non-unit-diagonal triangular solve routines. XXX_ TriangSlvUU_ _TYPE (Upper triangular, Unit diag.) XXX_ TriangSlvLU_ _TYPE (Lower triangular, Unit diag.) XXX_ TriangSlvUD_ _TYPE (Upper triangular, non-unit Diag.) XXX_ TriangSlvLD_ _TYPE (Lower triangular, non-unit Diag.) Calling sequences for these routines are similar to the Toolkit interface, but with meaningless arguments for each special case eliminated. See the User's Guide or the include header files for specific calling sequences. -------------------------------------------------------------------------- ----------------------------------- Section 4. Source code generation ----------------------------------- The SRC_GEN directory contains generic source files: bcomm.c bsrmm.c cscmm.c csrmts.c bscmm.c bsrmts.c cscmts.c vbrmm.c bscmts.c coomm.c csrmm.c vbrmts.c along with generator scripts for creating the NIST Sparse BLAS kernel routines from these generic source files. These source files are used as "master files", and are written in such a way that special case routines can be generated by relatively simple shell scripts which use "sed" and "awk" for text replacement. The approach saves considerable programming effort by generating most source files automatically, and reduces errors by ensuring that any changes are propagated throughout all of the related source code. The master files provide working source code for the most general version of the kernel routine. This is where real programming effort should be expended to optimized the library. The code is commented with tags which can be used to selectively delete code for special case routines. The "rules" for creating each special case file are defined in the SRC_GEN/kernels subdirectory. The kernels subdirectory contains the files CAB CADBbC CDADBC CaADB CaDABbC CABC CDAB CDADBbC CaADBC CaDADB CABbC CDABC CaAB CaADBbC CaDADBC CADB CDABbC CaABC CaDAB CaDADBbC CADBC CDADB CaABbC CaDABC one representing each of the specializations from the generic master code, along with kernel files for the master codes. Each of these kernel files contains pointers to appropriate "Definition" files, in the directory SRC_GEN/Defs, which are used to build up the sed script for the text replacement to generate the kernel routines. For typical use, these kernel and definition files would never have to be touched. Many modifications (say for optimization) can be made to the master source files without requiring any change whatsoever to the file generation mechanism. The only source code changes which would affect code generation would be those which alter the relationship between the comment tags and the related source. A more detailed explanation of the mechanism, and requirements for modifications, will be forthcoming in the 1.0 release. After making any necessary changes to these "master" source files, the library source files may be generated via the "create" script (automated in the "make" process in this directory with "make install" or "make re-install"). ** IMPORTANT NOTE ** Any changes to source for any routines below the Toolkit interface layer ** MUST ** be made in the ../SRC_GEN directory to be retained and propagated to all appropriate kernel routines. Changes to the Toolkit interface routines, however, should be made directly in the directory ../src_tk[c|f].) ** IMPORTANT NOTE ** --------------------------------------------------------------------------