Here at NIST, we have developed a genetic programming system called GPP (Genetic Programming - Procedural). On this web page we describe regression problems to which we have applied GPP. The regression test problems are constructed both with and without noise.
The purpose of this effort was not merely to use GPP to derive functions that approximate the desired functions. Our objective is to derive functions of the same form as the originating function. So if the original function from which we derive the test data is a fifth degree polynomial, we consider GPP to be successful only if it evolves a fifth degree polynomial.
All of the test cases that we present here are polynomials; they range in degree from 2 to 10. The polynomial coefficients and the location of test points were generated randomly by the program makeRandPoly.c.
Both the coefficients and the test points were chosen randomly, using a uniform distribution, from the range [-2.0, 2.0]. When noise was added, it was generated randomly using a normal distribution with standard deviation as indicated below for each case.
The test case files are in plain text (ASCII). All were generated with the program makeRandPoly.c. The files have the following format:
coeffs_low_to_high: < coefficient > for degree 0 < coefficient > for degree 1 . . . < coefficient > for degree n-1 noise_standard_deviation: num_points: points: < x y noise > for point 1 < x y noise > for point 2 . . . < x y noise > for point m
For the test points, x is the independent variably, y is the value of the polynomial evaluated at x plus the noise. Note that the y value includes the noise. The noise value is specified separately just for convenience.
So for example:
num_coeffs: 3 coeffs_low_to_high: 1.763543010984e+00 -5.413456536936e-01 2.300428768290e-01 noise_standard_deviation: 1.000000000000e-02 num_points: 5 points: -1.717255447306e+00 3.356281952522e+00 -1.527851586467e-02 -1.159210298531e+00 2.693714216237e+00 -6.486626837695e-03 -6.846658082231e-01 2.241207824495e+00 -8.126171154368e-04 1.190237793577e-01 1.695343815300e+00 -7.025129275700e-03 1.203979188659e+00 1.421243536778e+00 -2.399288002308e-02
This file specifies the second degree polynomial:
f(X) = 1.763543010984 - 0.5413456536936X + 0.230042876829X2
In this example, five data points have been provided with added noise selected from a distribution with standard deviation 0.01. The first point says that 3.356281952522 = f ( -1.717255447306 ) , where the Y value includes a noise component equal to -1.527851586467e-02.
In the following list, we present for each test case the following information:
The sequence number is used to group and to distinguish tests cases of the same degree. Test cases of the same degree and sequence number represent the same polynomial and test points. They will differ only in the level of noise added. By the same token, two test cases of the same degree but different sequence number will represent different polynomials.
|File name||Graph||Degree||Seq #||Noise Std Dev||GPP Success Rate|