Tentative Schedule for the Workshop
===================================

Monday June 3, 2002 
Conference Room: Lipman Wolfe, 3rd floor

 8:30 --  9:00   Breakfast (in conference room)
 9:00 --  9:10   Welcome
 9:10 -- 10:00   Ulrich Kulisch 

10:00 -- 10:10   short break

10:10 -- 11:00   Clemens Roothaan (1)
11:00 -- 11:50   Yonghong Yan

11:50 --  1:10   Lunch (buffet in Club Room, 1st floor)

 1:10 --  2:00   Masaaki Shimasaki
 2:00 --  2:50   Mo Mu
 
 2:50 --  3:10   tea break

 3:10 --  4:00   Wayne Enright
 4:00 --  4:50   Patrick Gaffney

 (Dinner on your own)

Tuesday June 4, 2002
Conference Room: Lipman Wolfe, 3rd floor

 8:30 --  9:00   Breakfast (in conference room)
 9:00 --  9:50   Greg Henry
 9:50 -- 10:40   Xintian Wu

10:40 -- 11:00   break

11:00 -- 11:40   Clemens Roothann (2)
11:40 -- 12:30   Rencang Li

12:30 --  1:10   Lunch (buffet in Club room, 1st floor)

 1:10 --  2:00   Dragan Mirkovic
 2:00 --  2:50   John Harrison
 2:50 --  3:40   Peter Tang

 3:40            Wrap up and farewell


==================
Title and Abstract
==================


Ulrich Kulisch 
Universitat Karlsruhe

Hardware Support for Interval Arithmetic and Basic Features of Mathematical
Software

Interval Mathematics has been developed to a high standard during the last few
decades. It provides methods which deliver results with guarantees.  However,
the arithmetic available on existing processors makes these methods very slow.
As a result they are not widely used in the scientific computing community as a
whole.  The talk will show that on super scalar processors interval operations
can be made as fast as simple floating-point operations with only modest
hardware costs. In the second part of the talk the question will be discussed
how computer arithmetic and programming languages, the more basic features of
mathematical software, should be updated to allow an easier development of
better mathematical software.  Program samples will be used to illustrate the
achievements. For further details see the paper: Advanced Arithmetic for the
Digital Computer Design of Arithmetic Units,
ftp://ftp.iam.uni-karlsruhe.de/pub/documents/kulisch/tsukuba.ps.gz


--------------------


Clemens Roothaan (1)
Hewlett-Packard

Minimax Polynomials: New Chebyshev and Remez Engines

For many transcendental functions, traditional calculations use a combination
of argument reduction, table lookup, and an approximating polynomial in terms
of the reduced argument. A minimax polynomial has the property that the maximum
possible error for any valid reduced argument is a minimum compared with using
any other polynomial of the same degree.  It is well known that a linear
combination of Chebyshev polynomials of the first kind, called the Chebyshev
expansion, can come close to achieve this desirable objective. Traditional
methods to determine the optimized coefficients in this expansion are quite
laborious, and often burdened with additional approximations. In the method
presented here the Chebyshev expansion is formulated as a linear transformation
of the Taylor expansion, and vice versa. Using standard tools of linear
algebra, general formulas are obtained for the error term and for corrections
to the Taylor expansion coefficients, yielding an effective Chevyshev expansion
as a modified Taylor expansion.  For a given degree and maximum reduced
argument, the real solution to the minimax problem is the Remez polynomial.
Unlike the Chebyshev solution, the Remez solution requires an iterative
process, and much higher precision. A quadratically convergent Remez scheme is
in place and has been used extensively. It should also be noted that the Remez
process can be adapted to special demands more easily than the Chebyshev
process.


-----------------


Yonghong Yan
Oregon Graduate Institute

The Expectation Maximization Algorithm and Its Application in Speech

In statistical approaches to speech recognition, such as the Hidden Markov
Models (HMMs), one of the key issues is how to model the speech events
accurately and robustly. Given the nature of speech production, not all the
needed information is observable. The Expectation Maximization (EM) algorithm
is used for model parament estimation. By iteratively maximize the expectation
of log likelihood from complete data, the likelihood of incomplete data can be
maximized. In this talk, an introduction of EM algorithm will be given. How it
is applied to acoustic modeling and speaker adaptation will be discussed.


-----------------


Masaaki Shimasaki
Kyoto University

Fast Linear Equation Solvers in High Performance Electromagnetic Field Analysis

Solving linear equations plays a crucial role in high performance
electromagnetic field analysis. We describe forms and characteristics of a
system of linear equations arising in electromagnetic field analysis with
Finite Element Method(FEM). Properties of ICCG and its parallelization are
discussed in context of electromagnetic field analyses. Although current
applicability of multigrid approach is rather limited in electromagnetic field
analysis in comparison with  ICCG, the multigrid method is important because it
is quite fast when applied to very large scale problems. We discuss the
algebraic multigrid method in finite element electromagnetic file analysis.


-----------------

      
Mo Mu
Hong Kong University of Science and Technology

PDE.Mart: A Networked PDE Solving Environment

PDE.Mart is a networked PSE for solving partial differential equations.  It
consists of GUI, Server, and LIB. The GUI is a browser-enabled Java applet
running on client machines; the Server is a Java application running on our
PDE.Mart server; and the LIB contains solver building parts in native languages
like Fortran, C, C++, and some in Java itself.


-----------------


Wayne Enright
University of Toronto

Fast Contouring of Approximate Solutions of PDEs

Visualizing and displaying approximate solutions of PDEs on high resolution
screens and other devices has become a significant part of scientific
computation. On the other hand the prime focus of numerical analysts has been
and still is on the development of effective numerical methods for generating
accurate approximate solutions on relatively coarse meshes (covering the domain
of interest).  In order to display or visualize the results of such methods at
the resolution required by the current display devices significant
computational effort is often required and we are investigating efficient
algorithms for carrying out these tasks.  In particular, in this presentation,
we will consider the generation of contour plots and show how the 'cost' of
this task can be minimized.  More specifically we have developed and
implemented (in MATLAB) an approach that directly generates contour lines
associated with a function u(x,y) that does not have to generate a surface map
(or fine-mesh approximation) as a preliminary computation. Our approach is
based on directly approximating (with a reliable continuous ODE solver) an
associated initial value problem whose known exact solution is the contour line
that is of interest. The overall cost of this approach will be shown to be
optimal and numerical evidence will be presented to show its effectiveness on a
variety of problems.


-----------------


Patrick Gaffney
Bergen Software Services International A/S

Optimization from a Java Interface: Numerical Methods Framework


-----------------


Greg Henry
Intel


-----------------


Xintian Wu
Intel

Fast beam search in speech recognition using Intel(R) IPP
Intel

Beam search is one of the core algorithms in statistical speech recognition.
The algorithm searches all possible word sequences for a most-likely word
sequence that matches given speech observations. The match involves calculating
likelihood probabilities of acoustics and syntaxes (language model). The talk
analyzes the computation of the beam search algorithm and presents a cache
strategy to reduce the likelihood calculation. The cache algorithm effectively
uses Intel(R) Integrated Performance Primitives (IPP) to speed up its
calculation. The Intel(R) IPP is a library designed for signal processing,
speech/audio processing, and much more. Assembly code is used in this library
to squeeze computation power out of latest Intel processors.


-----------------


Clemens Roothann (2)
Hewlett-Packard

A Vector Transcendental Math Libraryfor the Itanium Architecture

The Vector Math Library (VML) presented here was conceived in 1990 in parallel
with the development of the Wide Word Architecture by an HP Labs team. By early
1994, the Wide Word project became the joint responsibility of HP and Intel;
the resulting design is now known by the name Itanium.  The functions addressed
in the VML project are: Reciprocal, Division, Square Root and Reciprocal Square
Root; Exponential, Logarithm and Power Function; Trigonometric and Inverse
Trigonometric Functions; and Hyperbolic and Inverse Hyperbolic Functions.
Single and double precision codes for these functions will be released soon in
the public domain. In the asymptotic range of long vectors, the execution time
per function evaluation ranges from 3.5 machine cycles for a single precision
reciprocal, to 19.5 cycles for the double precision power function. The
precision achieved is better than 0.5005 ulps for single precision, and 0.501
ulps for double precision.  In order to exploit the IA-64 architecture for
vector functions, the algorithms for the functions at hand must be universal
algorithms, i.e. the computational procedure must be identical for all
arguments. For some functions, the traditional algorithms in the textbooks are
indeed universal, but for quite a few they are not. In every case where a new
universal algorithm was needed, a satisfactory solution was obtained. The codes
are designed so that no interrupts will occur to handle special cases. All
exception and error handling is accomplished inside the loop by housekeeping
instructions which are hidden in the floating-point pipeline.


-----------------


Rencang Li
Hewlett-Packard

The ABC Conjecture and Correctly Rounded Reciprocal Square Root
(Joint work with Ernie Croot and Hui June Zhu, University of California, 
Berkeley)

Existing results suggest that to get correctly rounded reciprocal square root
$y=1/\sqrt x$ in a floating point number system with $p$ significant bits it
may have to compute up to $3p+1$ leading bits of $y$. However numerical
evidence indicates the actual number may be as small as $2p$ plus a few more
bits. In this talk we will shows that this is indeed true, assuming the ABC
conjecture holds. The conjecture is widely considered the next biggest one in
Number Theory after Fermat's last theorem. Our technique is extendable to other 
algebraic functions.


-----------------


Dragan Mirkovic
University of Houston

Automatic code generation for FFT algorithms 

The discrete Fourier transform (DFT) is an essential computation in a wide
variety of scientific and engineering fields. The fast method for computing the
DFT -- the fast Fourier transform (FFT) is one of the most important
computational developments of this century. The applications of the FFT
include  digital signal processing for radar, sonar and radio astronomy
applications, image analysis in X-ray crystallography, electron microscopy and
various modalities of medical imaging, data compression and numerical solution
of partial differential equations. The importance of the FFT in these
applications has provided a strong motivation for development of highly
optimized implementations. On the other hand, the growing complexity of modern
microprocessor architectures with multi-level memory hierarchies and
instruction level parallelism has made performance tuning increasingly
difficult.  In this presentation we will discuss an efficient automatic
performance tuning method for fast Fourier transforms. The method is based on a
dynamic construction of the FFT algorithm by using a library of composable
blocks of code, each computing a part of the transform, and by selecting the
optimal combination of these blocks at run time. The blocks of code in the
library are automatically generated and optimized using a special--purpose
compiler. We will describe the basic structure of the code generator and the
use of different splitting methods needed for sparse factorization of the DFT
matrix.

-----------------


John Harrison
Intel

Formal Verification of Mathematical Algorithms

Algorithms for computing floating-point approximations to standard mathematical
functions (e.g. basic operations like division and transcendentals like "exp"
and "sin"), are an important part of most computing environments. The
algorithms may be realized in hardware, microcode, software, or some mixture
thereof, but are usually considered among the most fundamental components of a
computer system. These algorithms, especially those that are aggressively
designed to optimize performance, can be quite complicated, relying on various
results from pure mathematics and subtly exploiting the special features of
floating-point operations. This means that they are relatively difficult to
design without error, while the costs to a company like Intel of a failure,
especially at the hardware level, can be huge.

Traditional approaches to ensuring correctness of hardware and software usually
rely on extensive testing. However, for many functions this is not
realistically feasible, even if a reliable source of comparison is available.
An alternative is to prove by detailed mathematical analysis that the algorithm
obeys a formal specification (e.g. correct rounding or an error bound). Since
such proofs can be tedious and error-prone, it seems wise to check them, or
even partly generate them, by a reliable mechanical proof-checker. We report on
our experience of using the HOL Light theorem prover to formally prove correct
some key floating-point algorithms used in the Intel Itanium architecture. We
will sketch how some of the formal proofs are conducted, and draw some general
conclusions concerning the promise of this approach.

-----------------


Peter Tang
Intel

A Novel Approach to Tight Bound and Statistical Information of Rounding Errors

Tight rounding error analysis of small scale computations are still too
difficult or time consuming for practicing software engineer to carry out
routinely for their production work. This is especially the case in the area of
fixed-point calculation, which compared with floating-point computation, can
have data width and rounding method that vary drastically.

To promote quality fixed-point computational software, we developed an approach
that is capable of automatically derivation of tight error bound as well as
statistical distribution of errors for linear transformations. These include
the important cases of FFT, DCT, filtering, and windowing. The main idea is
that of modeling each computed quantity as its corresponding ideal value
contaminated by two additive error terms, the first is correlated with the
input data, while the second, statistically idepedent of input data. We carry,
numerically, a bound of the first term as a function of the norm of input data,
and also numerically, the probability density function of the second term.
Propagation of these terms can be computed numerically and hence such
information for the final result can be obtianed automatically through the
overloading of operators.

The model of this error, the propagation of these error terms, and some
numerical examples will be presented in this talk.