*
James L. Blue, ACMD
Charles L. Wilson, Information Access & User Interfaces Division
Omid Omidvar, University of the District of Columbia
*

Neural networks can be used successfully for pattern recognition and classification on data sets of realistic size. Commercially important examples are classification of fingerprints and handprinted characters.

Determining the weights, or ``training'' the network, is essentially done by
minimizing some function **E**. A common choice for **E** is a sum (over all the
output nodes and all the examples in the training data set) of squares of the
differences of actual and desired output node values. Other terms, such as a
sum of squares of the weights themselves, are often added. The choice of the
**E** that is used is made partly because it a desirable objective function and
partly because minimizing **E** is tractable. Even supposing that we have an
optimal **E**, the training rarely succeeds in getting to the ``best'' minimum.

Although MLPs have no feedback, the training process does have feedback; the output node values are used to change the weights, and through the weights all the nodal values. The network undergoing training can be viewed as a nonlinear, recurrent, dynamic network, although the trained network is feedforward and static.

Since the system is trying to learn a complex surface via a recurrent dynamic process, we conjectured that the complexity of the training process could provide insight into the learning problem. To obtain possible insights we studied the dynamics of a simple weakly nonlinear recurrent network model. Mathematical results on such systems indicate several interesting ways in which the dynamics of the feedback signals influence network behavior.

>From the dynamic system analogy, we developed four modifications to our standard training method:

- For neuron activation functions, we use sines instead of sigmoids to reduce the probability of singular Jacobians.
- We use successive regularization to constrain the volume of the weight space.
- We use Boltzmann pruning to constrain the dimension of the weight space.
- We use prior class probabilities to normalize all error calculations so that statistically significant samples of rare but important classes can be included without distorting the error surface.

On handprinted digits and fingerprint classification problems, these modifications improve error-reject performance by factors between 2 and 4, and reduce network size by 40% to 60%.

Generated by boisvert@nist.gov on Mon Aug 19 10:08:42 EDT 1996