If Java^{TM} is to become the environment of choice for highperformance scientific applications, then it must provide performance comparable to what is achieved in currently used programming languages (C or Fortran). In addition, it must have language features and core libraries that enable the convenient expression of mathematical algorithms. The goal of the Numerics Working Group (JGNWG) of the Java Grande Forum is to assess the suitability of Java for numerical computation, and to work towards community consensus on actions which can be taken to overcome deficiencies of the language and its runtime environment.
The proposals we put forth operate under a number of constraints.
We begin by outlining critical requirements for numerical computing that are not currently served well by Java's design and implementation. Unless these requirements are met it is unlikely that Java will have much impact on the numerical computation community. This would be detrimental to the entire Java enterprise by slowing the dissemination of high quality components for solving commonly occurring mathematical and statistical problems.
Issue  Requirement  
1  Complex arithmetic 
Complex numbers are essential in the analysis and
solution of mathematical problems in many areas of science and engineering.
Thus, it is essential that the use of complex numbers be as convenient and
efficient as the use of float and double numbers.

2  Lightweight classes  Implementation of alternative arithmetic systems, such as complex, interval, and multiple precision requires the support of new objects with value semantics. Compilers should be able to inline methods that operate on such objects and avoid the overheads of additional dereferencing. In particular, lightweight classes are critical for the implementation of complex arithmetic as described in issue 1. 
3  Operator overloading 
Usable implementation of complex arithmetic, as well as
other alternative arithmetics such as interval and multiprecision, requires
that code be as readable as those based only on float and double .

4  Use of floatingpoint hardware  The high efficiency necessary for largescale numerical applications requires aggressive exploitation of the unique facilities of floatingpoint hardware. At the other extreme, some computations require very high predictability, even at the expense of performance. The majority of users lie somewhere in between: they want reasonably fast floatingpoint performance but do not want to be surprised when computed results unpredictably misbehave. Each of these constituencies must be addressed. 
5  Multidimensional arrays  Multidimensional arrays are the most common data structure in scientific computing. Thus, operations on multidimensional arrays of elementary numerical types must be easily optimized. In addition, the layout of such arrays must be known to the algorithm developer in order to process array data in the most efficient way. 
Elaborations of each underlying issue, along with proposed solutions are presented in the following section. In suggesting solutions, the working group has been careful to balance the needs of the numerical community with those of Java's wider audience. Although the proposed solutions require some additions to the current Java and JVM design, we have tried to avoid change, relying on compiler technology whenever feasible. This minimizes the changes that affect all Java platforms, and enables implementers to optimize for high numerical performance only in those environments where such an effort is warranted.
Using complex numbers conveniently means that expressions on complex numbers must look and behave like expressions on float or double values. This is critical for code understanding and reuse. Efficient complex arithmetic operations are only a few times slower than their real counterparts; ideally, the speed of a complex computation should be limited by the speed of the underlying floatingpoint arithmetic and not the speed of memory allocation or object copying.
Providing a straightforward complex class using existing Java object mechanisms fails to provide an acceptable solution.
float
and double
. For example, the = and ==
operators manipulate references rather than values. Such differences lead
to many errors.
The second and third items also mean that code reuse is severely limited. In the LAPACK project, much of complex code is identical to its real counterparts and this greatly eased the generation and maintenance of the library. Such economies are much more difficult to obtain if the syntax and semantics of complex is significantly different than that of the primitive types.
An alternative that solves both the convenience and speed issues would be to add complex as a new primitive type in Java. However, this approach also has a number of drawbacks. While complex could be added naturally to the language, adding support for a complex type in the JVM is more problematic. Either the JVM would have to directly support complex, or complex expressions and variables in a Java program would have to be mapped into existing JVM instructions in a predetermined way. Fundamental changes to the JVM should be avoided if the existing functionality is sufficient to implement a given feature. However, translating the complex type into a pair of double values (a real and an imaginary component) in the JVM presents a number of difficulties.
Even if complex numbers can be accommodated with some variation of the above approach, this would only solve the problem for complex. Many other numeric types such as decimal, interval, and arbitrary precision have similar requirements. Thus, instead of merely providing a specialized solution for complex, a better approach is to make Java extensible enough to add new numeric types that can be operated on conveniently and efficiently. To meet this goal, Java needs two capabilities: lightweight classes and operator overloading.
Lightweight classes allow the creation of a new kind of Java object.
The goal of lightweight classes is to allow the runtime use of what
appears to be a C struct
, that is:
=
operator performs a deep copy (instead of changing a
pointer) and the ==
operator performs a deep comparison
(instead of comparing pointer values).
null
;
such variables always have a value.
final
. (This
also removes the necessity to store a dispatch pointer for lightweight
objects.)
struct
at
the JVM level (a very large change), or
Since requiring large JVM changes reduces the likelihood of acceptance, and due to its greater backward compatibility, Java Grande recommends the second approach to implementing lightweight classes.
Implementation implications.
Complex numbers can be implemented as a lightweight class. By
implementing complex
as a lightweight class, type
checking is preserved at both the Java and JVM level. Lightweight
classes reuse Java's existing facility to create new types; the
compiler is responsible for the tedious job of disguising what appears
(to extant JVMs) to be a normal Java object as a lightweight object.
To the programmer, lightweight classes appear to be outside of the
Java class hierarchy rooted at Object
. However,
lightweight classes can be implemented in a way backward compatible
with existing virtual machines. Additional class attributes could be
included in a class
file to indicate to new VMs that
optimizations tailored to lightweight classes can be used. When
compiling programs using lightweight classes, the Java compiler is
responsible for enforcing the following restrictions:
null
value. This implies the following.
null
or
compared to null
. All such expressions are caught as
compiletime errors.
null
for reference
types). The compiler inserts calls to the default constructor before
any code that can access the object. These default constructors are
not dependent on any external program state. For local variables,
each call of the default constructor for a lightweight class is
wrapped with a catch
block that catches
OutOfMemoryError
and rethrows the exception as
StackOverflowError
. Although user code may try to
recover from an OutOfMemoryError
an attempt to recover
from a StackOverflowError
is unlikely. Failure to
allocate memory for a local lightweight object variable corresponds to
running out of stack space.
finalizer
method. In Java, finalizer
methods are run when an
object is garbage collected. Lightweight objects are intended to have
semantics similar to the semantics of primitive types. Therefore,
lightweight classes do not need finalizer
methods.
super
. In a constructor, calling
super
invokes the constructor of the superclass. Since
lightweight objects are defined to be outside of the
Object
class hierarchy, it is not meaningful for a
lightweight class constructor to call super
.
Object
or any
other reference type. Other types cannot be cast to the type of a
lightweight class. Casts between primitive types construct a new
value whereas casts between reference types reinterpret a pointer; no
new value is constructed. However, userdefined conversions between
lightweight classes are other types are permissible.
instanceof
operator to an expression having the type of a lightweight class and
it is a compiletime error to use instanceof
to test for
the type of a lightweight class.
Class
objects to represent
lightweight classes.
=
)
and equality (==
) operators (see Operator Overloading).
Virtual machines are encouraged to inline the methods of lightweight classes where appropriate.
To behave like primitive types, lightweight classes should be passed
by value, that is, when given as an argument to a method or when
returned from a method, a lightweight object is copied. C++'s copy
constructor performs this task. However, references were added to C++
to avoid the overhead of copying small objects like
complex
[Str]. Objects in Java
are already passed by reference. Therefore, for performance reasons
it may be acceptable (but somewhat contradictory semantically) to pass
lightweight objects by reference.
Storage issues.
Since lightweight classes are final
and since references
to lightweight objects are not null
, there is no need to
store a dispatch pointer at runtime.
Heap allocation is potentially more expensive than stack allocation. Additionally, stackallocated objects may be cheaper to garbage collect than heap allocated ones. Therefore, it is preferable to allocate lightweight objects on the stack instead of the heap. Replacing heap allocation with stack allocation is an oftdiscussed optimization. An object can be allocated on the stack if it can be determined that the object cannot be accessed outside of the lifetime of the scope that allocated it. Escape analysis [ParG] makes this determination. Recent work suggests escape analysis may be useful in Java [Deu, Bla]. The related problem of compiletime garbage collection is addressed by region inference [Tol] and its extensions [Aik].
Operator overloading is necessary to allow userdefined numeric types, such as
complex
, to be used reasonably. Without it, many numeric codes
would be extremely difficult to develop, understand and maintain. For example,
codes using complex arithmetic class would look very different than similar
code using real arithmetic, burdening library developers and users alike. A
simple statement such as
a = b+c*d;
might be expressed as
a.assign(Complex.sum(b, Complex.product(c,d))or
a.assign(b.plus(c.times(d)))
Faced with coding like this, a large portion of the scientific computing community would choose to avoid Java as being too unfriendly.
At a minimum, a useful subset of the existing operators must be overloadable. It is useful, but not a requirement of the working group, to allow novel operators to be overloaded as well. (Allowing novel operators to be overloaded does not have to introduce the language complications and programmer difficulties found in ML and Haskell, see [Dar].)
What operators can be overloaded.
The arithmetic, comparison, assignment, and subscripting operators can
be overloaded. Neither the instanceof
, new
,
field access, nor method call operators can be overloaded. The
&&
and 
operators cannot be overloaded
because they have different expression evaluation rules than all other
operators.
If normal classes are also allowed to use operator overloading, it may
be convenient to have Pascal's :=
as a supplemental
assignment operator. Java's =
can be used to indicate
the current semantics of moving a pointer while :=
can be
used for a deep assignment (deep copy, by convention the current
semantics of a class' clone
method). If :=
can be overloaded, it can be used for a copy operator appropriate for
a given class. For example, even when performing a deep copy, an
arbitrary precision arithmetic class may want to use a copyonwrite
policy for the bits of the number to avoiding copying the (possibly
large) data structure unnecessarily.
If :=
is introduced for classes, :=
should
designate normal assignment on the existing primitive types. That way
code using primitive types can more easily be converted to using a
userdefined type instead. For example, if roundoff problems on
double
numbers were suspected of causing loss of accuracy
problems, it would be convenient to replace double
with a
floatingpoint type with more precision to see if the roundoff
problems abate. With sufficiently powerful operator overloading,
potentially only the variable declarations would need to be changed.
How to name methods corresponding to overloaded operators.
The strings making up operators, "+
",
"+=
", etc., are outside of the set of strings
Java allows to form an Identifier. Therefore, to add operator
overloading to Java, some technique must be used to allow operator
methods to be declared. Either the text of the operator can be
included in the method declaration (as with C++'s
operator+
syntax for an addition operator) or there can
be an implicit mapping from textual names to operators (as with
Sather's [StoO] mapping of
"plus
" to +
).
If the latter implicit mapping is used, it is important to have a
separate name for each target operator. This avoids the Sather
problem where a < b
is defined as !(a >=
b)
. Sather's scheme is problematical for IEEE style numbers
since (NaN < b)
and (NaN >= b)
are both
false. To overload operators such as +=
, the
corresponding name should be plusAssign
instead of
plusEquals
. This names the operator according to what it
does instead of what characters constitute the text of operator. In
general, allowing each operator to be separately overloaded provides
the flexibility to model mathematical entities other than traditional
fields.
How to resolve overloaded methods.
For normal Java classes, operator overloading can easily be mapped
into method dispatch. For example, the compiler can translate
a + b
into
a.plus(b)or
a.op+(b)depending on how operator methods are named.
However, this level of support is not sufficient for more general uses
of operator overloading. For example, this technique does not work if
the type of a
is a primitive type like float
or double
since these types do not have corresponding
classes. Clearly, from an orthogonality and usability perspective,
the programmer wants to be able to write double +
complex
as well as complex + double
. Therefore,
in addition to letting operators be instance methods (methods
dispatched from an object), operators must also be static
methods (methods not dispatched from an object). However, using
static
operator methods for regular Java classes presents
name resolution problems.
In regular Java code, if a static
method is used outside
the defining class the class name (and possibly the package name)
must be prefixed to the method call. For example, instead of writing
sin(x)
even if an import
statement is used the Java programmer
must write
Math.sin(x)
to allow the Java compiler to properly determine the location of the
sin
method. Using class name prefixes for operators
would ruin the readability of operator overloading. However, since
lightweight classes are final
, the problems of using
static
methods can be circumvented.
Since lightweight classes are final
, all method calls can
be resolved at compile time; there need not be any dynamic dispatch at
runtime. Therefore, even if used outside of the defining class,
static
operators on lightweight objects do not need to be
prefixed with the class name; the compiler can use a rule of looking
for operator methods defined in the lightweight class of the lefthand
operand.
Additionally, using static
operators allows for better
software engineering. If static
operator methods can be
located from either the class of the left hand operand or the class of
the right hand operand, new lightweight classes can be made to
interact with existing lightweight classes. Otherwise, for symmetric
treatment of a + b
the classes of a
and
b
must be written concurrently.
Recently, Sun released for public comment a Proposal for Extension of Java^{TM} Floating Point Semantics, Revision 1 [PEJ] (abbreviated in this document as PEJFPS). PEJFPS is primarily targeted at improving Java's floatingpoint performance on the x86 line of processors. (No explicit license is granted (yet) to use the fused mac (multiplyaccumulate) instruction, which would benefit users of the PowerPC, among other architectures.)
Assiduously implementing Java's current strict floatingpoint
semantics on the x86 using previously published techniques is very
expensive, potentially more than an order of magnitude slower than
slightly different semantics [Gol]. A less
expensive technique developed recently [Gol98] will be discussed later. PEJFPS grants
partial access to the double extended
floatingpoint
format found on the x86 in order to overcome the speed problem.
However, the reckless license to use or not to use double extended
granted by PEJFPS destroys Java's predictability (see recent
submissions to the numericinterest mailing list, http://www.validgh.com/java/).
Java has been billed as providing "write once, run anywhere" program
development. For both theoretical and practical reasons, Java
programs are not nearly so portable nor reproducible as programmers
would naively expect. However, by exercising discipline (using single
threaded code, using default finalize
methods), it is far
easier to produce a Java program whose behavior can be predicted than
to produce an analogous C or C++ program with the same property.
Dropping Java's predictability would be a significant loss. Therefore,
the JGNWG recommends that PEJFPS not be incorporated into Java.
Instead, JGNWG presents a counterproposal that works within similar
constraints as PEJFPS but maintains the predictability of the language
and addresses additional numeric programming needs omitted from
PEJFPS.
What is the problem on the x86?
x86 processors most naturally operate on 80bit double extended
floatingpoint
values. A precision control word can be set to make the processor round to
single or double precision. However, even when rounding to a reduced
precision, the floatingpoint registers still use the full 15 exponent bits of
the double extended
format (instead of the 11 exponent bits for true
double
and 8 bits for true float
). A store to memory is
required to narrow the exponent as well. Since the register is not
changed by the store, for further computation the stored value must be loaded
back from memory. This memory traffic degrades Java's floatingpoint
performance on the x86. Moreover, this technique suffers from a small
discrepancy between operating on true double
values and
double
values with increased exponent range. Values that would be
subnormal double
s are not subnormal in double
with extended exponent
range. When such a number is stored to true double
, it can be rounded
twice, leading to a difference in the last bit, about 10^{324}.
Published techniques to remove this remaining minor discrepancy can lead to an
order of magnitude slowdown, so Java VMs on the x86 generally set the
precision control to double
precision and allow double
rounding on underflow, at variance with Java's specification [Gol].
The 10fold potential performance degradation for exact floatingpoint
conformance on the x86 is largely a hypothetical concern since in
practice VMs on the x86 use the storereload technique. PEJFPS aims
to eliminate the smaller 2fold to 4fold penalty from storereload. PEJFPS
would remove this speed penalty by allowing the x86's double
extended
registers to be used at full precision and range.
However, PEJFPS would put too few constraints on when, where, and
whether extended precision is used, leading to unpredictability.
There are two issues for exact reproducibility stemming from the x86's wider exponent range: maintaining the proper overflow threshold and preserving the proper gradual underflow behavior. The storereload technique solves the former problem but not the latter. Since additions and subtractions resulting in subnormal values are exact, the underflow threshold is properly preserved. Using the storereload technique, double rounding on underflow can only occur for multiplication and division.
Recently, a refinement of the storereload technique that eliminates
the double rounding problem has been developed [Gol98]. To avoid double rounding during
multiplication, the new technique scales down one of the operands by
2^{(Emax double extended 
Emaxdouble)} where
E_{max} is the largest exponent for a given floating
point format. After this scaling, all operations that would result in
subnormals in true double
also result in subnormals in
double
with extended exponent range. This result is then
rescaled back up by the same quantity; normal results are unaffected and
subnormals are properly rounded once. A store of the product after being
scaled enforces the overflow threshold.
The procedure for division is analogous; the dividend can be scaled down or the divisor can be scaled up. In both cases, the resulting quotient is rescaled up to the proper magnitude.
This new technique has many advantages over previous approaches to
making the x86 exactly round to true double
:
double
only entails a factor of 2 to 4 slowdown
instead of a factor of 10.
±0.0
, infinities,
and NaN.
What capabilities are needed? Different numeric applications have different needs. Some, like certain implementations of higher precision arithmetic using standard floating point formats, depend on strict floatingpoint semantics and could easily break if "optimized." Other calculations, such as dot product and matrix multiply, are relatively insensitive to aggressive optimization; meaningful answers result even when blocking and other answerchanging optimizations are applied. The vendorsupplied BLAS are heavily optimized for a given architecture; vendors would not spend considerable resources creating optimized BLAS, sometimes included handcoded assembly, if there were not demand for such faster programs. The needs of the typical Java programmer fall somewhere between these extremes; there is a desire for floatingpoint computation that is not unnecessarily slow, but the programmer doesn't want to be surprised when his computed results misbehave unpredictably.
Since Java is aimed at a wide audience, the JGNWG proposal changes Java's default floatingpoint semantics to allow somewhat better performance on the x86 and PowerPC. However, for most programs on most inputs the change in numerical results will not be noticed. Like PEJFPS, the JGNWG proposal adds a "strict floatingpoint" declaration to indicate current Java semantics must be used. JGNWG also includes a second declaration to allow optimizers to rearrange certain floatingpoint operations as if they were associative. Associativity enables many useful optimizations, including aggressive code scheduling and blocking.
PEJFPS would mingle increased speed and increased precision.
In PEJFPS "widefp
", which allows use of float extended
and double extended formats, presumably yields code that runs faster and may
incidentally use increased precision and range. Although it my have
been intended only for the sake of fast register spilling, PEJFPS
would allow almost arbitrary truncation of results from extended to
base formats. In any case, the programmer is faced with an
unpredictable program, leading to the resurrection of bugs from
earlier systems, like the Sun III (see the recent comp.compilers
thread "inlining + optimization = nuisance bugs" for a contemporary
example). JGNWG's proposal does not mix speed and precision, rather,
as a concession to the x86, JGNWG allows extended exponent range in
some circumstances.
Some architectures, such as the PowerPC, include a fused mac instruction that multiplies two numbers exactly and then adds a third number, with a single rounding error at the end. Machines with this instruction run faster when it is used. Current Java semantics prohibit fused macs from being used.
There are three degrees of fused mac usage to support:
Fused mac must be forbidden in some sensitive calculations. For example, using fused mac recklessly can also lead to inappropriately negative discriminants in quadratic formula calculations [Kah]. Using a fused mac if available would give more accurate and faster dot products and matrix multiplies. Some algorithms require a fused mac to work properly. Mandating a fused mac is necessary to simulate a fused mac capable machine on one that isn't.
Java Grande Counter Proposal. The JGNWG proposal is based upon an analysis that finds very few of the diverse floatingpoint semantics allowed by PEJFPS to be both useful and necessary. On the other hand, each of the few semantics of the JGNWG proposal is necessary because dropping one would either
The semantic contexts proposed by the JGNWG are as follows.
float
and true double
values.
double
values on the JVM operand stack.)
There are three kinds of floatingpoint semantics in JGNWG, strict, default, and "associative." The following table illustrates what effect these have on current processors.
Architecture  
Modifier  x86  PowerPC  SPARC 
strictfp 
Java 1.0 (no double rounding on underflow) 
Java 1.0 (no fused mac) 
Java 1.0 
default (no modifier)  larger exponent range allowed  fused mac can be used  Java 1.0 
associativefp 
many optimizations allowed on all platforms 
Strict semantics are indicated by the new strictfp
class
and method modifier. Strict semantics are the current Java floating
point semantics; fused mac is not permitted in strict code (unless the
compiler can prove the same answer results). On the x86, using stores
to restrict the exponent range can readily give exactly
Javaconforming results for the float
format. Using the
new technique described above, exactly Javaconforming results for the
double
format can also be implemented at a tolerable
cost.
Like PEJFPS, JGNWG proposes to modify the default Java floatingpoint
semantics, i.e. those used in the absence of a strictfp
or associativefp
declaration.
The
strictfp
and associativefp
declarations are
mutually exclusive. In general, default semantics allow increased
exponent range of anonymous expressions (on the x86) or use of fused
mac (on the PowerPC). In default mode on the x86, anonymous
float
and double
values created during
expression evaluation are allowed to use the larger exponent range of
double extended
. If the extended exponent range is used
in code with default semantics, it must be consistently used for all
anonymous values (this implies that anonymous values spilled to memory
must be spilled as 80 bit quantities to preserve the exponent values).
All explicit stores must be respected and only true
double
and true float
can be stored into
programmerdeclared variables.
System properties indicate whether fused mac and extended exponent range are used by a particular VM. It is always permissible to implement the default semantics as the "strict" Java 1.0 semantics. Therefore, on a SPARC there is no difference between the two kinds of floatingpoint semantics.
It is possible that a program intended for strict semantics will fail under the nonstrict JGNWG default semantics. To ease detection of such cases, JVMs should provide a runtime flag to force default semantics to be implemented as strict.
Finally, JGNWG also includes an associativefp
declaration to allow optimizers to rearrange floatingpoint operations
as if they were associative if the operations would be associative in
the absence of over/underflow and roundoff.
Associativity is an important property for optimization and parallelization
of numerical codes that may change the numerical outcome of a
computation.
On machines with fused mac instructions, chained multiply and add/subtract operations in the source code can be fused at runtime in default mode. Expressions are fused preferring fusing a product that is the right hand argument to an addition over the left hand argument; for example, the expression
a*b + c*d
is treated as fmac(a, b, c*d)
and not fmac(c,
d, a*b)
. Such fusing operations must occur if fused mac is in
use; this prevents programmer surprises. Otherwise, optimizers could
potentially fuse unexpected expressions or prevent an expression from
being fused (e.g., common subexpression elimination reusing a product
that prevents a fused mac).
The arguments to fused mac must be evaluated in left to right order.
Programmers can require fused mac to be used by an explicit
method call to a new method Math.fmac
.
When fused mac is being used, the programmer can explicitly store each intermediate result to locally implement strict semantics.
Unresolved issues regarding additional optimizations.
There is agreement within the working group that, at a minimum, allowing the
use of associativity should be permitted as described above. Disagreement
remains as to whether compilers should be allowed to employ additional types of
optimizations that can, in some cases, cause incorrect results. Under current
Java semantics, for example, common transformations such as 0*x
to
0
(wrong if x
is NaN) or x*4x*3
to
x
(requires distributivity) are disallowed.
Some have argued strongly in favor of allowing wide latitude for optimizations,
provided that the software developer and user concur on their use. From their
point of view, as long as both parties have the option to disable potentially
harmful optimizations, then compiler writers should not be barred from
providing them. Gosling has proposed an idealizedNumerics
[Gos] mode that would allow any transformation that would
preserve results on the real number field, for example. Others argue that the
predictability of the Java language is its key feature and that users are not
served well if potentially harmful optimizations are admitted.
A related issue is whether the strict exception model of Java should be relaxed
in associativefp
mode to give additional freedom to the
optimizer. NullPointerException
and
IndexOutOfBoundsExceptions
would still be thrown and caught by the
same handlers but the values in variables might not be in the same state as in
an unoptimized version of the code. Others have argued that the strict
exception model should be preserved so that programming styles that rely on
exceptions would behave correctly. An example is the following somewhat
unusual code for computing the dot product of two vectors.
double s = 0; try { for (int i=0;true;i++) { s += a[i]*b[i]; } } catch (ArrayIndexOutOfBounds bounds) { }
Finally, some issues regarding the interplay between associativefp
and the fused multiplyadd must be resolved.
These issues will require further discussion, and public comment is encouraged.
Code Examples for the x86. The dot product loop
static double dot(double a[], double b[]) { double s; for(i = 0; i < a.length; i++) s += a[i] * b[i]; return s; }
can be compiled differently under strict and default semantics. In the examples that follow, loop tests and other integer calculations are omitted and are assumed to overlap with the floatingpoint calculations. Under strict Java 1.0 semantics on the x86, one compilation of the dot product loop is:
// x86 code for strict dot product loop
// rounding precision set to double
// assume scaling factors are in the register stack
push scaling factor
load a[i] and multiply with scaling factor
load b[i] and multiply with a[i]^{scaled down}
multiply to rescale product
store product to restrict exponent
reload back restricted product
add product to s
store s to restrict exponent
load back restricted s
increment i
loop
As shown above, the product (a[i] * b[i]
) and the sum (s +
a[i] * b[i]
) both need to be stored to memory and loaded back in. As
shown below, under default semantics, only the sum needs to be written out
to memory, halving the excess memory traffic and removing two
multiplications.
// x86 code for default dot product loop
// rounding precision set to double
load a[i]
load b[i] and multiply with a[i]
// product does not need to be stored/reloaded and scaled
add product to s
store s to restrict exponent
reload s
increment i
loop
A larger number of anonymous values in an expression results in a greater reduction in the number of excess stores. A VM might also be able to use trap handlers to achieve faster average execution. For example, trapping on overflow could remove a load from the loop above, yielding
// x86 code for default dot product loop
// rounding precision set to double
// VM is using an overflow trap handler to remove a load
load a[i]
load b[i] and multiply with a[i]
add product to s
store s to restrict exponent // dummy store, reload if store overflows
// reload of s elided
increment i
loop
This trap handler is used by the compiler and not visible to the applications programmer. The functionality of this trap handler is simple; the trap handler just has to reload the stored value.
On the x86, if an expression (and all its subexpressions) neither
overflows nor underflows for a given input, executing the default
compilation of the expression will give the same result as the
executing the strict compilation. As when using fused mac, explicitly
storing each intermediate result can be used to implement strict
semantics in a limited area. In default mode, both float
and double
expressions can use the extended exponent
range. Method arguments and return values from methods must be strict
double
or float
values to prevent
programmers from being ambushed by greater exponent range they neither
intended nor anticipated.
Cost of strictness Using the new scaling technique, a matrix multiply with strict semantics can be a little more than twice as slow as a matrix multiply with default semantics. In both the loops below, an overflow trap handler is used to remove excess loads in the common case. The sum variables are already in the stack. For better instruction scheduling, two elements of the matrix product are calculated simultaneously:
// x86 code for fast matrix multiply using default semantics
// rounding precision set to double
// VM is using an overflow trap handler
// the loop has approximately an 8 or 9 cycle latency on a Pentium
load b[i]
dup b[i]
load a_{k}[i] and multiply with b[i]
swap top two stack elements
load a_{k+1}[i] and multiply with b[i]
swap top two stack elements
add with pop a_{k}[i] * b[i] to sum_{k}
add with pop a_{k+1}[i] * b[i] to sum_{k+1}
loop
The strict code is similar:
// x86 code for fast matrix multiply using strict semantics
// rounding precision set to double
// VM is using an overflow trap handler
// the loop has approximately a 19 cycle latency on a Pentium
// assume scaling constants are already in the register stack
put scaling constant on the top of the stack
load b[i] and multiply with scaling factor
dup b[i]^{scaled down}
load a_{k}[i] and multiply with b[i]^{scaled down}
swap top two stack elements
load a_{k+1}[i] and multiply with b[i]^{scaled down}
swap top two stack elements
rescale a_{k}[i] * b[i]^{scaled down}
swap
rescale a_{k+1}[i] * b[i]^{scaled down}
dummy store of a_{k+1}[i] * b[i]
swap
dummy store of a_{k}[i] * b[i]
add with pop a_{k}[i] * b[i] to sum_{k}
add with pop a_{k+1}[i] * b[i] to sum_{k+1}
store sum_{k}
swap
store sum_{k+1}
swap
loop
Scoping. Whatever new floatingpoint semantics are expressible in Java need to be expressible in the JVM too. PEJFPS uses spare bits in a method descriptor to indicate which kind of floatingpoint semantics a methods has; JGNWG can use the same approach. This provides methodlevel control of floatingpoint semantics. This would be the coarsest level acceptable to the JGNWG.
It may also be convenient to have a finer granularity blocklevel control.
While this is not critical to the JGNWG proposal, it should be considered.
Such a declaration is easily added to Java, but it is not immediate apparent
how to encode such information in the JVM. Java compilers can include extra
attributes in a class
file ([JVM] § 4.7.1). These
attributes can be used to support things such as improved debugging
facilities. JVMs are required to ignore unknown attributes. Therefore, JGNWG
could represent the different floatingpoint semantics of different portions of
code using a table emitted as an extra class
file attribute. Strict semantics
is always a permissible policy under the JGNWG proposal; so, in this respect a
JGNWGstyle class file would be backward compatible with existing VMs.
Discussion. The JGNWG proposal allows improved hardware utilization over standard Java while preserving program predictability. Programmers can test system properties to determine the VM's behavior.
A reasonable question is that if Java Grande is opposed to PEJFPS due to its unpredictability, why does Java Grande's proposal also allow some unpredictability by default? JGNWG permits much less unpredictability than PEJFPS and JGNWG has fewer differences between strict and default semantics. For example, in JGNWG a floatingpoint feature must be used consistently; fused mac or extended exponent range cannot be used on one call to a method and not used on the next (something allowed by PEJFPS). On the x86, between allowing extended exponent range and allowing extra precision, allowing extended exponent range results in many fewer visible differences between strict code and default code.
The differences arising from extended exponent range on the x86 are visible only if the calculation would over/underflow on a machine like the SPARC. Over/underflow is comparatively rare in practice; therefore the Java Grande differences would be observed at most rarely. PEJFPS allows extended precision for intermediate results. Differences stemming from extended precision would almost always be visible. For example,
// implicit widefp under PEJFPS
// default semantics under Java Grande
static foo(){
double one = 1.0;
double three = 3.0;
double a;
double b[] = new double[1];
a = one/three;
b[0] = a;
}
// Results under different proposals
// JGNWG
PEJFPS
if(a == b[0]){...}
// always true
true or false?
if(a == (one/three)){...}
// always
true
true or false?
If (one/three)
is calculated to extended precision and a
is treated
as an extended precision value, then a == b[0]
will be false under
PEJFPS since arrays are always stored in the base format (32 bit
float
or 64 bit double
). If a
is stored as
double
precision, a == (one/three)
will be false if
(one/three)
is calculated to double extended
precision. The Java
Grande proposal would always return true for these cases. In short, on
the x86 the
cases where the JGNWG proposal allows differences between default and strict
semantics are where overflow or underflow would occur; the cases where
PEJFPS allows differences between default and strict semantics are
(approximately) where an operation is inexact, as most are.
Additional Floatingpoint Types.
The JGNWG proposal thus far does not provide any access to the double
extended
format found on the x86. Consistent access to this format is
important to allow good hardware utilization and to ease writing
numerical programs; having access to several more bits of precision than
the input data often allows simpler (and sometimes faster) algorithms
to be used. To access double extended
, JGNWG proposes that Java
include a third primitive floatingpoint type called
"indigenous
." The indigenous
floatingpoint type corresponds to the widest IEEE 754 floatingpoint
format that directly executes on the underlying processor. On the
x86, indigenous
corresponds to the double extended
format; on most other processors, indigenous
corresponds
to the double
format. (The indigenous
type
must be at least as wide as double
.) For a particular
VM, class variables indicate whether indigenous
is
implemented as double
or double extended
.
The float
and double
floatingpoint types
have hardware support. Adding a double extended
type would require
costly simulation on most architectures other than the x86. Having an
indigenous
type preserves the performance predictability
of a program and keeps a close correspondence between floatingpoint
types in a language and floatingpoint formats on a processor.
Implementing indigenous
at the JVM level can be done
either by adding new JVM instructions or (as an interim measure) using
the operator overloading and lightweight classes described earlier.
Additional Floatingpoint Expression Evaluation Policies.
To better utilize certain processors and to lessen the impact of
rounding errors, it is useful for the programmer to conveniently be
able to evaluate a floatingpoint expression in a wider format. For
example, a programmer may want to evaluate an expression consisting of
float
variables in double
precision.
PreANSI C used this floatingpoint expression evaluation policy
exclusively.
JGNWG adds a new declaration, anonymous
FloatingPointType, to control the expression evaluation policy:
anonymous double
gives the original C expression
evaluation; all float
values are promoted to
double
.
anonymous indigenous
promotes float
and
double
values to indigenous
. Using
anonymous indigenous
makes best use of the x86's double
extended registers.
anonymous float
specifies to use the existing Java
expression evaluation policy.
anonymous double
.
Our goal is to provide Java with the functionality and performance associated with Fortran arrays.
Multidimensional arrays are ndimensional rectangular collections of elements. An array is characterized by its rank (number of dimensions or axes), its elemental data type (all elements of an array are of the same type), and its shape (the extents along its axes).
Elements of an array are identified by their indices along each axis.
Let a kdimensional array A of elemental type
T have extent n_{j} along its jth axis,
i = 0,...,k1.
Then, a valid index i_{j} along the jth axis must
be greater than or equal to zero and less than n_{j}.
An attempt to reference an element
A[i_{0},i_{1},...,i_{k1}]
with any invalid index i_{j} causes an
ArrayIndexOutOfBoundsException
to be thrown.
Rank, type, and shape are immutable properties of an array. Immutable rank and type are important properties for effective code optimization using techniques developed for Fortran and C. Immutable shape is an important property for the optimization of runtime bounds checking according to recent techniques developed for Java [MiMS98,MoMG98].
We can understand the differences between multidimensional arrays and Java arrays of arrays through an example. Consider the following declaration and definition of a Java array of arrays.
double[][] A = new double[m][n];At a first glance, this seems equivalent to a twodimensional
m
× n
array of doubles.
However, nothing prevents this array from becoming jagged at
a later point in the code:
for (int i=0; i<A.length; i++) { A[i] = new double[i+1]; }(The array could also have been created jagged in the first place.) Also, two or more rows of the array
A
can be aliased
to the same data:
for (int i=0; i<A.length1; i+=2) { A[i] = A[i+1]; }In general, given a reference of type
double[][]
the
compiler has no information about shape and aliasing.
While arrays of arrays are important data structures for their flexibility, this flexibility comes at a performance cost. The potential aliasing between rows of an array of arrays forces compilers to generate additional stores and loads. The potential shape changing in arrays of arrays complicates bounds checking optimization, by requiring array privatization [MiMS98,MoMG98]. True rectangular, multidimensional arrays solve both these problems.
Proposal.
We propose the development of standard Java classes that
implement multidimensional rectangular arrays.
These classes can be included as a subpackage in
java.lang.Math
or in their own package
java.lang.Array
.
Standardizing the classes as part of Java is important for
maximum compiler optimization.
(In particular, it enables semantic inlining
techniques [WMMG98].)
The rank and type of an array are defined by its class.
That is, for each rank and type there is a different class.
(This is necessary for traditional compiler optimizations,
since Java does not support templates.)
Supported types must include all of Java primitive types
(boolean
, byte
, char
, short
,
int
, long
, float
,
and double
), one or more complex types (at least the
Complex
class in this proposal), and Object
.
Supported ranks must include 0 (scalar), 1, 2, 3, and possible 4
to 7dimensional arrays.
(Rank 7 is the current standard limit for Fortran.)
The shape of an array is specified at its creation, and it is immutable. An example might be
doubleArray2D A = new doubleArray2D(m,n);which creates an
m
× n
twodimensional
array of doubles.
(Note that A
itself can be changed to refer to another
array, unless it is declared with the final
modifier.)
The shape parameters m
and n
must be
int
expressions.
The value of any array shape parameter must be a nonnegative
int
.
(That is, it must be greater than or equal to 0 and less than
or equal to 2147483647.)
If any of the shape parameters is negative, the constructor
must throw a NegativeArraySizeException
exception.
If an array of the specified shape cannot be created because of
memory limitations, then an OutOfMemoryError
must
be thrown.
The array classes should support the concept of regular array sections. A regular array section corresponds to a subarray of another array, which we call the master array. Each element of an array section corresponds to a unique element of its master array. Referencing one element of an array section (for reading or writing) has the effect of referencing the corresponding element of the master array. Regular array sections have the same type as, and rank less than or equal to, their master arrays. Regular array sections behave exactly like regular arrays for all operations, including sectioning. (In fact, there are no separate classes for array sections.)
A regular array section is defined by specifying a subset of the
elements of its master array.
For each axis of the master array, the specification can be
(i) a single int
index, or
(ii) a regular range of int
indices.
A regular range is an arithmetic progression defined by a first
element, a last element, and a stride.
The rank of an array section is equal to the number of
regular ranges in its definition.
The shape is defined by the number of elements in those ranges.
Note that indices for an axis of an array section must be
greater than or equal to 0 and less than the extent of that
axis.
Array sections might be referenced as follows, for example.
doubleArray3D A = new doubleArray3D(m,n,p); doubleArray2D B = A.section(new Range(0,m1),new Range(0,n1),k); doubleArray1D C = B.section(new Range(0,m1),j);The first creates an
m
× n
×
p
threedimensional array A
.
It then extracts an m
× n
twodimensional section B
corresponding to the
kth plane of A
.
It also extracts the jth column of B
into
C
, which also corresponds to the jth
column of the kth plane of A
.
The array classes should also support the concept of irregular array sections, although in a more limited fashion. An irregular array section is defined by specifying, for at least one of the axis of a master array, a generic set of indices. Operations on an irregular array section are limited to extracting and setting the values of its elements. (These correspond to the gather and scatter operations typical in vector codes.) For example, these might be specified as follows.
doubleArray2D A = new doubleArray2D(20,n); int idx[] = { 1, 3, 5, 7, 11, 13, 17, 19 }; Index index = new Index(idx); A.set(new Range(0,19),new Range(0,n1),0.0); A.set(Index,new Range(0,n1),1.0);This sets the value of the rows of
A
with prime numbers indices
to 1 and the value of all other rows to 0.
The array classes provide methods that implement Fortranlike functionality for arrays. In particular, the following operations must be provided:
REAL
and AIMAG
, that
convert complex arrays into double
arrays).
Discussion. The array classes can be implemented with no changes in Java or JVM. However, it is essential that the get and set methods be implemented as efficiently as array indexing operations are in Fortran or in C. We expect that inlining will be used for this purpose, and that garbage collectors will treat rectangular arrays efficiently. Multidimensional arrays are extremely common in numerical computing, and hence we expect that efficient multidimensional array classes will be heavily used.
The inclusion of standard array classes in java.lang.Math
or java.lang.Array
does not
require any change to the Java language.
However, the use of explicit method
invocation to effect all array operations will significantly decrease the
readability of Java code and incur the wrath of users. The introduction of a
simple notation for multidimensional arrays which maps to the standard array
classes would make the use of such arrays much more natural. A multiindex
notation, like a[i,j]
to refer to such array elements would be
ideal. This would allow statements like
a.set(i,j,b.get(i,j)+s*c.get(k,l));
to be more naturally expressed as
a[i,j] = b[i,j] + s*c[k,l];
The frontend compiler could disambiguate the expression according to the type
of a
. This requires changes in the Java language or
fancier operator overloading mechanisms.
Additional facilities that would be very helpful, but are not strictly necessary are the following.
a[f:l:s]
referring to the section of the onedimensional
array a
from element f
to element l
in steps of s
. This requires new syntax, or fancy
overloading of the indexing.
Unresolved issues. While most working group members see benefit from the provision of multidimensional array classes, disagreement remains regarding a number of critical implementation issues.
The first issue regards the internal implementation of the array class. Library developers want the simplicity of Fortran arrays. They would like to see the requirement that multidimensional arrays be implemented using a onedimensional native Java array with elements filled in, say, rowmajor order. The simplicity of such a scheme would benefit both optimizers and library developers, leading to better overall performance of codes. Library developers, for example, would know exactly how to traverse the array to maximize locality of reference. Some developers even want the ability to extract the internal 1D array, manipulate it, and then put it back.
Others feel that it is a mistake to expose the internal storage scheme. Compiler developers want the flexibility to decide on the layout based on their own analysis. They argue, for example, that packing everything in a 1D array would artificially limit the size of multidimensional arrays based on indexing considerations. One proposal would provide an additional array attribute called the preferred access mode. This could have values of (i) row major, for Cstyle coding, (ii) column major, for Fortranstyle coding, and (iii) block major, for blockoriented recursive algorithms. The access mode would provide a hint to the compiler regarding the most likely access order for the array. The default would be rowmajor. Compilers could ignore the access mode attribute.
Library developers counter that it is unreasonable to expect them to provide optimized routines for each type of access mode (and for all combinations in case of arrayarray operations), and that the alternative of casting to and from differing access modes adds unacceptable overhead.
Another issue that must be considered further is the semantics of array sections. Array sections can be either aliases to the master array or copies extracted from the master array. In either case, the effect on the master array of writes to the section, and vice versa, must be carefully spelled out.
Finally, many array operations will result in an array of output. The best way of providing the output array is not clear. Providing a new array object as output is sometimes convenient, but may be inefficient. Providing a separate parameter to contain the output may be better.
The following additional problems were addressed by the Working Group.
java.lang.Math
library of
transcendental
functions. The current operational definition is imprecise and
suboptimal. (The functions are defined in terms of compatibility with a
particular implementation, C's fdlibm source, interpreted using
Java's strict semantics). Alternative definitions are
As a compromise, we propose that fdlibm be translated to Java and that this particular implementation be mandated when Java's strict semantics are being enforced. Otherwise, a looser, implementationdependent alternative which conform to the requirements of C9X (as fdlibm does) should be allowed.
Similarly, reliability concerns, as well as the ability to efficiently implement interval arithmetic, requires the ability to set rounding modes for floatingpoint operations. It is sufficient to provide methods to set (and get) the global rounding mode to accomplish these goals.
In order for such features to be used reliably, compilers and JVMs must respect the semantics of the special methods used to implement these operations. In particular, the floatingpoint state must be saved and restored across thread context switches, and compiler and JVM optimizations must be modestly limited.
java.lang.Math
. Equivalents of two of
the ten IEEE 754 recommended functions are already available in
the Java API (IsInfinite and isNaN).
The following six should also be added.
copySign(x,y)
 returns x with the sign of y .

scalb(y,n)
 returns y * 2^{n} for integers
n without explicitly computing 2^{n} .

nextAfter(x,y)
 returns the next representable neighbor
of x in the direction towards y .

unordered(x,y)
 Returns true is one of its arguments is unordered with respect to the other. This occurs when at least one is a NaN. 
fpClass(x)
 Returns an integer that indicates which of the nine "kinds"
of IEEE floatingpoint numbers x is.

logb(x)
 Returns the unbiased exponent of x .

The description of the functions given above is quite terse and ignores some subtleties for extreme arguments. (The same is true of the IEEE 754 document itself.) For a detailed discussion of how these functions can be implemented in Java, see [Dar98].
In addition, several elementary functions which are provided in C should also be included in the Java API, the following, for example.
hypot(x,y)
 returns sqrt(x^{2} + y^{2}) without
overflow whenever the result does not overflow.

Note that it is important that functions for float
and
double
be put in the same class (e.g., the Math
class). Currently, separate isNaN
and
isInfinite
methods are found in the Float
and
Double
classes. Because of this, the type of the argument has
to be part of the function call, e.g. Float.logb(f)
and
Double.logb(f)
. If both were in the same class, then the
function names could be overloaded, allowing for easier maintenance of
codes that are provided in both float
and
double
versions.
The Numerics Working Group has agreed to begin the development of a variety
of core numerical classes and interfaces to support the development of
substantial Java applications in the sciences and engineering. The main
purpose of this work is to standardize the interfaces to common mathematical
operations. A reference implementation will be developed in each case. The
purpose of the implementation will be to clearly document the class and its
methods. Although we expect these to be reasonably efficient, we expect that
highly tuned implementations or those relying on native methods will be
developed by others. Also, the simple methods, such as get or set, will not
provide reasonable performance unless they are inlined, because the method
invocation overhead will be amortized over very few machine instructions.
Unless otherwise specified, we will initially only define classes based on
double
s, since computations with Java float
s are less useful in numerical
computing.
The classes identified for first consideration are the following.
This implements a complex data type for Java. It includes methods for complex arithmetic, assignment, as well as the elementary functions. A strawman proposal has already been developed and released for comment.
Current working proposal:  http://www.vni.com/corner/garage/grande/index.html 
Contacts:  John Brophy, Visual Numerics Marc Snir, IBM 
This implements one, two and threedimensional arrays for Java as described above. A strawman proposal has already been developed and released for comment.
Current working proposal:  http://math.nist.gov/javanumerics/#proposals 
Contacts:  Jose Moreira, IBM Marc Snir, IBM Roldan Pozo, NIST 
This implements matrices (in the linear algebraic sense) and operations on matrices such as the computation of norms, standard decompositions, the solution of linear systems, and eigenvalue problems. A strawman proposal has already been developed and released for comment.
Current working proposal:  http://math.nist.gov/javanumerics/jama 
Contacts:  Cleve Moler, The MathWorks Roldan Pozo, NIST Ron Boisvert, NIST 
These implement elementary operations on vectors and matrices of use to developers of linear algebra software (rather than to average users). This work will be done in conjunction with the BLAS Technical Forum. Some working notes on this effort can be found at http://math.nist.gov/javanumerics/blas.html
Contacts:  Roldan Pozo, NIST Steve Hague, NAG Keith Seymour, University of Tennessee 
This includes functions such as the hyperbolics, erf, gamma, Bessel functions, etc. A strawman proposal has already been developed and released for comment.
Current working proposal:  http://www.vni.com/corner/garage/grande/index.html 
Contacts:  John Brophy, Visual Numerics Ron Boisvert, NIST 
This includes not only a general complex transform, but specialized real, sine and cosine transforms.
Contacts:  Lennart Johnsson, University of Houston 
This implements an interval real data type for Java. It includes methods for interval arithmetic, assignment, as well as elementary functions. An API is actively under development.
Contacts:  Dmitri Chiriaev, Sun 
This implements a multiprecision real data type for Java. It includes methods for arithmetic, assignment, as well as elementary functions.
Contacts:  Sid Chatterjee, University of North Carolina 
The working group will review these proposals and open them up for public comment. It will also set standards for testing and documentation for numeric classes. It will work with Sun and others to have such classes widely distributed.
The following individuals contributed to the development of this document at the Java Grande Forum meetings on May 910 and August 67 in Palo Alto, California.
The working group also acknowledges helpful discussions with and input from the following individuals.
[Aik]  Alexander Aiken and Manuel Fahndrich and Raph Levien, Better static memory management: improving regionbased analysis of higherorder languages, in Proceedings of the ACM SIGPLAN '95 Conference on Programming Language Design and Implementation, June 1995. 
[Bla]  Bruno Blanchet, Escape Analysis: Correctness Proof, Implementation and Experimental Results, in Proceedings of the 25th ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, January 1921, 1998. 
[Dar]  Joseph D. Darcy, Borneo 1.0: Adding IEEE 754 floatingpoint support to Java, M.S. thesis, University of California, Berkeley, May 1998, http://www.cs.berkeley.edu/~darcy/Borneo. 
[Dar98]  Joseph D. Darcy, Writing Robust IEEE Recommended Functions in "100% Pure Java"^{TM}, UCB//CSD981009, University of California, Berkeley, 1998, http://www.cs.berkeley.edu/~darcy/Research/ieeerecd.ps.gz. 
[Dem]  James W. Demmel and Xiaoye Li, Faster Numerical Algorithms via Exception Handling, IEEE Transactions on Computers, vol. 42, no. 8, August 1994, pp. 983992. 
[Deu]  Alain Deutsch, On the Complexity of Escape Analysis, in Proceedings of the 24th ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, January 1517, 1997. 
[Gol]  Roger A. Golliver, Firstimplementation artifacts in Java^{TM}. 
[Gol98]  Roger A. Golliver, Personal communication, August 1998. 
[JLS]  James Gosling, The Evolution of Numerical Computing in Java, http://java.sun.com/people/jag/FP.html 
[JLS]  James Gosling, Bill Joy, and Guy Steele, The Java^{TM} Language Specification, AddisonWesley, 1996. See http://java.sun.com/docs/books/jls/. 
[JVM]  Tim Lindholm and Frank Yellin, The Java^{TM} Virtual Machine Specification, AddisonWesley, 1997. See http://java.sun.com/docs/books/vmspec/. 
[Kah]  William Kahan, Lecture Notes on the Status of IEEE Standard 754 for Binary FloatingPoint Arithmetic, http://www.cs.berkeley.edu/~wkahan/ieee754status/ieee754.ps. 
[MiMS98]  Sam P. Midkiff, Jose E. Moreira, and Marc Snir, Optimizing bounds checking in Java programs. IBM Systems Journal, vol. 37 (1998), no. 3, pp. 409453. 
[MoMG98]  Jose E. Moreira, Sam P. Midkiff, and Manish Gupta, From flop to megaflops: Java for technical computing. Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing, LCPC'98. IBM Research Report 21166, 1998. 
[ParG]  Y. G. Park and B. Goldberg, Escape Analysis on Lists, in Proceedings of the ACM SIGPLAN '92 Conference on Programming Language Design and Implementation, July 1992. 
[StoO]  David Stoutamire and Stephen Omohundro, Sather 1.1, August 18, 1996, http://www.icsi.berkeley.edu/~sather/Documentation/Specification/Sather1.1/. 
[Str]  Bjarne Stroustrup, The Design and Evolution of C++, AddisonWesley Publishing Company, 1994. 
[PEJ]  Sun Microsystems, Proposal for Extension of Java Floating Point in JDK 1.2, http://java.sun.com/feedback/fp.html. 
[Tol]  Mads Tofte and JeanPierre Talpin, RegionBased Memory Management, in Information and Computation, vol. 132, no. 2, pp. 109176, February 1997. 
[WMMG98]  Peng Wu, Sam P. Midkiff, Jose E. Moreira, and Manish Gupta. Improving Java performance through semantic inlining. IBM Research Report 21313, 1998. Submitted for publication. 