James L. Blue, ACMD
Charles L. Wilson, Information Access & User Interfaces Division
Omid Omidvar, University of the District of Columbia
Neural networks can be used successfully for pattern recognition and classification on data sets of realistic size. Commercially important examples are classification of fingerprints and handprinted characters.
Determining the weights, or ``training'' the network, is essentially done by minimizing some function E. A common choice for E is a sum (over all the output nodes and all the examples in the training data set) of squares of the differences of actual and desired output node values. Other terms, such as a sum of squares of the weights themselves, are often added. The choice of the E that is used is made partly because it a desirable objective function and partly because minimizing E is tractable. Even supposing that we have an optimal E, the training rarely succeeds in getting to the ``best'' minimum.
Although MLPs have no feedback, the training process does have feedback; the output node values are used to change the weights, and through the weights all the nodal values. The network undergoing training can be viewed as a nonlinear, recurrent, dynamic network, although the trained network is feedforward and static.
Since the system is trying to learn a complex surface via a recurrent dynamic process, we conjectured that the complexity of the training process could provide insight into the learning problem. To obtain possible insights we studied the dynamics of a simple weakly nonlinear recurrent network model. Mathematical results on such systems indicate several interesting ways in which the dynamics of the feedback signals influence network behavior.
>From the dynamic system analogy, we developed four modifications to our standard training method:
On handprinted digits and fingerprint classification problems, these modifications improve error-reject performance by factors between 2 and 4, and reduce network size by 40% to 60%.