Tentative Schedule for the Workshop =================================== Monday June 3, 2002 Conference Room: Lipman Wolfe, 3rd floor 8:30 -- 9:00 Breakfast (in conference room) 9:00 -- 9:10 Welcome 9:10 -- 10:00 Ulrich Kulisch 10:00 -- 10:10 short break 10:10 -- 11:00 Clemens Roothaan (1) 11:00 -- 11:50 Yonghong Yan 11:50 -- 1:10 Lunch (buffet in Club Room, 1st floor) 1:10 -- 2:00 Masaaki Shimasaki 2:00 -- 2:50 Mo Mu 2:50 -- 3:10 tea break 3:10 -- 4:00 Wayne Enright 4:00 -- 4:50 Patrick Gaffney (Dinner on your own) Tuesday June 4, 2002 Conference Room: Lipman Wolfe, 3rd floor 8:30 -- 9:00 Breakfast (in conference room) 9:00 -- 9:50 Greg Henry 9:50 -- 10:40 Xintian Wu 10:40 -- 11:00 break 11:00 -- 11:40 Clemens Roothann (2) 11:40 -- 12:30 Rencang Li 12:30 -- 1:10 Lunch (buffet in Club room, 1st floor) 1:10 -- 2:00 Dragan Mirkovic 2:00 -- 2:50 John Harrison 2:50 -- 3:40 Peter Tang 3:40 Wrap up and farewell ================== Title and Abstract ================== Ulrich Kulisch Universitat Karlsruhe Hardware Support for Interval Arithmetic and Basic Features of Mathematical Software Interval Mathematics has been developed to a high standard during the last few decades. It provides methods which deliver results with guarantees. However, the arithmetic available on existing processors makes these methods very slow. As a result they are not widely used in the scientific computing community as a whole. The talk will show that on super scalar processors interval operations can be made as fast as simple floating-point operations with only modest hardware costs. In the second part of the talk the question will be discussed how computer arithmetic and programming languages, the more basic features of mathematical software, should be updated to allow an easier development of better mathematical software. Program samples will be used to illustrate the achievements. For further details see the paper: Advanced Arithmetic for the Digital Computer Design of Arithmetic Units, ftp://ftp.iam.uni-karlsruhe.de/pub/documents/kulisch/tsukuba.ps.gz -------------------- Clemens Roothaan (1) Hewlett-Packard Minimax Polynomials: New Chebyshev and Remez Engines For many transcendental functions, traditional calculations use a combination of argument reduction, table lookup, and an approximating polynomial in terms of the reduced argument. A minimax polynomial has the property that the maximum possible error for any valid reduced argument is a minimum compared with using any other polynomial of the same degree. It is well known that a linear combination of Chebyshev polynomials of the first kind, called the Chebyshev expansion, can come close to achieve this desirable objective. Traditional methods to determine the optimized coefficients in this expansion are quite laborious, and often burdened with additional approximations. In the method presented here the Chebyshev expansion is formulated as a linear transformation of the Taylor expansion, and vice versa. Using standard tools of linear algebra, general formulas are obtained for the error term and for corrections to the Taylor expansion coefficients, yielding an effective Chevyshev expansion as a modified Taylor expansion. For a given degree and maximum reduced argument, the real solution to the minimax problem is the Remez polynomial. Unlike the Chebyshev solution, the Remez solution requires an iterative process, and much higher precision. A quadratically convergent Remez scheme is in place and has been used extensively. It should also be noted that the Remez process can be adapted to special demands more easily than the Chebyshev process. ----------------- Yonghong Yan Oregon Graduate Institute The Expectation Maximization Algorithm and Its Application in Speech In statistical approaches to speech recognition, such as the Hidden Markov Models (HMMs), one of the key issues is how to model the speech events accurately and robustly. Given the nature of speech production, not all the needed information is observable. The Expectation Maximization (EM) algorithm is used for model parament estimation. By iteratively maximize the expectation of log likelihood from complete data, the likelihood of incomplete data can be maximized. In this talk, an introduction of EM algorithm will be given. How it is applied to acoustic modeling and speaker adaptation will be discussed. ----------------- Masaaki Shimasaki Kyoto University Fast Linear Equation Solvers in High Performance Electromagnetic Field Analysis Solving linear equations plays a crucial role in high performance electromagnetic field analysis. We describe forms and characteristics of a system of linear equations arising in electromagnetic field analysis with Finite Element Method(FEM). Properties of ICCG and its parallelization are discussed in context of electromagnetic field analyses. Although current applicability of multigrid approach is rather limited in electromagnetic field analysis in comparison with ICCG, the multigrid method is important because it is quite fast when applied to very large scale problems. We discuss the algebraic multigrid method in finite element electromagnetic file analysis. ----------------- Mo Mu Hong Kong University of Science and Technology PDE.Mart: A Networked PDE Solving Environment PDE.Mart is a networked PSE for solving partial differential equations. It consists of GUI, Server, and LIB. The GUI is a browser-enabled Java applet running on client machines; the Server is a Java application running on our PDE.Mart server; and the LIB contains solver building parts in native languages like Fortran, C, C++, and some in Java itself. ----------------- Wayne Enright University of Toronto Fast Contouring of Approximate Solutions of PDEs Visualizing and displaying approximate solutions of PDEs on high resolution screens and other devices has become a significant part of scientific computation. On the other hand the prime focus of numerical analysts has been and still is on the development of effective numerical methods for generating accurate approximate solutions on relatively coarse meshes (covering the domain of interest). In order to display or visualize the results of such methods at the resolution required by the current display devices significant computational effort is often required and we are investigating efficient algorithms for carrying out these tasks. In particular, in this presentation, we will consider the generation of contour plots and show how the 'cost' of this task can be minimized. More specifically we have developed and implemented (in MATLAB) an approach that directly generates contour lines associated with a function u(x,y) that does not have to generate a surface map (or fine-mesh approximation) as a preliminary computation. Our approach is based on directly approximating (with a reliable continuous ODE solver) an associated initial value problem whose known exact solution is the contour line that is of interest. The overall cost of this approach will be shown to be optimal and numerical evidence will be presented to show its effectiveness on a variety of problems. ----------------- Patrick Gaffney Bergen Software Services International A/S Optimization from a Java Interface: Numerical Methods Framework ----------------- Greg Henry Intel ----------------- Xintian Wu Intel Fast beam search in speech recognition using Intel(R) IPP Intel Beam search is one of the core algorithms in statistical speech recognition. The algorithm searches all possible word sequences for a most-likely word sequence that matches given speech observations. The match involves calculating likelihood probabilities of acoustics and syntaxes (language model). The talk analyzes the computation of the beam search algorithm and presents a cache strategy to reduce the likelihood calculation. The cache algorithm effectively uses Intel(R) Integrated Performance Primitives (IPP) to speed up its calculation. The Intel(R) IPP is a library designed for signal processing, speech/audio processing, and much more. Assembly code is used in this library to squeeze computation power out of latest Intel processors. ----------------- Clemens Roothann (2) Hewlett-Packard A Vector Transcendental Math Library for the Itanium Architecture The Vector Math Library (VML) presented here was conceived in 1990 in parallel with the development of the Wide Word Architecture by an HP Labs team. By early 1994, the Wide Word project became the joint responsibility of HP and Intel; the resulting design is now known by the name Itanium. The functions addressed in the VML project are: Reciprocal, Division, Square Root and Reciprocal Square Root; Exponential, Logarithm and Power Function; Trigonometric and Inverse Trigonometric Functions; and Hyperbolic and Inverse Hyperbolic Functions. Single and double precision codes for these functions will be released soon in the public domain. In the asymptotic range of long vectors, the execution time per function evaluation ranges from 3.5 machine cycles for a single precision reciprocal, to 19.5 cycles for the double precision power function. The precision achieved is better than 0.5005 ulps for single precision, and 0.501 ulps for double precision. In order to exploit the IA-64 architecture for vector functions, the algorithms for the functions at hand must be universal algorithms, i.e. the computational procedure must be identical for all arguments. For some functions, the traditional algorithms in the textbooks are indeed universal, but for quite a few they are not. In every case where a new universal algorithm was needed, a satisfactory solution was obtained. The codes are designed so that no interrupts will occur to handle special cases. All exception and error handling is accomplished inside the loop by housekeeping instructions which are hidden in the floating-point pipeline. ----------------- Rencang Li Hewlett-Packard The ABC Conjecture and Correctly Rounded Reciprocal Square Root (Joint work with Ernie Croot and Hui June Zhu, University of California, Berkeley) Existing results suggest that to get correctly rounded reciprocal square root $y=1/\sqrt x$ in a floating point number system with $p$ significant bits it may have to compute up to $3p+1$ leading bits of $y$. However numerical evidence indicates the actual number may be as small as $2p$ plus a few more bits. In this talk we will shows that this is indeed true, assuming the ABC conjecture holds. The conjecture is widely considered the next biggest one in Number Theory after Fermat's last theorem. Our technique is extendable to other algebraic functions. ----------------- Dragan Mirkovic University of Houston Automatic code generation for FFT algorithms The discrete Fourier transform (DFT) is an essential computation in a wide variety of scientific and engineering fields. The fast method for computing the DFT -- the fast Fourier transform (FFT) is one of the most important computational developments of this century. The applications of the FFT include digital signal processing for radar, sonar and radio astronomy applications, image analysis in X-ray crystallography, electron microscopy and various modalities of medical imaging, data compression and numerical solution of partial differential equations. The importance of the FFT in these applications has provided a strong motivation for development of highly optimized implementations. On the other hand, the growing complexity of modern microprocessor architectures with multi-level memory hierarchies and instruction level parallelism has made performance tuning increasingly difficult. In this presentation we will discuss an efficient automatic performance tuning method for fast Fourier transforms. The method is based on a dynamic construction of the FFT algorithm by using a library of composable blocks of code, each computing a part of the transform, and by selecting the optimal combination of these blocks at run time. The blocks of code in the library are automatically generated and optimized using a special--purpose compiler. We will describe the basic structure of the code generator and the use of different splitting methods needed for sparse factorization of the DFT matrix. ----------------- John Harrison Intel Formal Verification of Mathematical Algorithms Algorithms for computing floating-point approximations to standard mathematical functions (e.g. basic operations like division and transcendentals like "exp" and "sin"), are an important part of most computing environments. The algorithms may be realized in hardware, microcode, software, or some mixture thereof, but are usually considered among the most fundamental components of a computer system. These algorithms, especially those that are aggressively designed to optimize performance, can be quite complicated, relying on various results from pure mathematics and subtly exploiting the special features of floating-point operations. This means that they are relatively difficult to design without error, while the costs to a company like Intel of a failure, especially at the hardware level, can be huge. Traditional approaches to ensuring correctness of hardware and software usually rely on extensive testing. However, for many functions this is not realistically feasible, even if a reliable source of comparison is available. An alternative is to prove by detailed mathematical analysis that the algorithm obeys a formal specification (e.g. correct rounding or an error bound). Since such proofs can be tedious and error-prone, it seems wise to check them, or even partly generate them, by a reliable mechanical proof-checker. We report on our experience of using the HOL Light theorem prover to formally prove correct some key floating-point algorithms used in the Intel Itanium architecture. We will sketch how some of the formal proofs are conducted, and draw some general conclusions concerning the promise of this approach. ----------------- Peter Tang Intel A Novel Approach to Tight Bound and Statistical Information of Rounding Errors Tight rounding error analysis of small scale computations are still too difficult or time consuming for practicing software engineer to carry out routinely for their production work. This is especially the case in the area of fixed-point calculation, which compared with floating-point computation, can have data width and rounding method that vary drastically. To promote quality fixed-point computational software, we developed an approach that is capable of automatically derivation of tight error bound as well as statistical distribution of errors for linear transformations. These include the important cases of FFT, DCT, filtering, and windowing. The main idea is that of modeling each computed quantity as its corresponding ideal value contaminated by two additive error terms, the first is correlated with the input data, while the second, statistically idepedent of input data. We carry, numerically, a bound of the first term as a function of the norm of input data, and also numerically, the probability density function of the second term. Propagation of these terms can be computed numerically and hence such information for the final result can be obtianed automatically through the overloading of operators. The model of this error, the propagation of these error terms, and some numerical examples will be presented in this talk.