back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [ 101 ] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150]


101

Figure 16.5 Equity curves over the validation period for each of the networks. the percentages at the right identify the portion of raw data used to train the network.

-40 1

1/1/93 2/1/93 3/2/93 3/31/93

Source: Aspen Technology, Inc. Used by permission.

4/29/93 5/28/93

Table 16.5 Performance of networks with differing sizes of working set selected uniformly from the training data

12131190 to 12131192 Train I Test Set

1/1/93 to 6/18/93 Validation Set

Network

Nodes

Lose

Ratio

Lose

Ratio

Perfect

1.00

1.00

100%

0.52

0.09

0.49

0.21

0.49

0.11

0.49

0.25

0.40

-0.10

0.44

0.20

The portion of the data selectedfor the working set is shown under the Network heading. The number ofhidden units in the final network is shown under Nodes. The range 1/1/93 through 6/18/93 was set aside as a validation set. It was not used at all for developing the model.



a Few good inputs

The prior discussions on preprocessing described how to create additional candidate inputs to a model. Many of those inputs ate colineat. That is, they have a high degtee of correlation between them. In general, networks trained on this type of data often find idiosyncratic solutions as shown in Figure 16.2. The solution to this problem is to find a way to select a synergistic subset of candidate inputs that best solve the problem.

Several approaches to this problem of variable selection have been developed. Most work well on clean data where relationships are strong. I have found only one that works well on noisy data of the type found in financial modeling problems: Genetic Variable Selection.

As part of developing a stock picking system for managing an equity portfolio, several techniques for selecting variables were tested. These included simplistic through esoteric approaches. None but genetic algorithms proved effective. The methods tested were:

• Candidate inputs with the highest correlation to the target output.

• Stepwise linear regression inputs with largest coefficients.

• Selecting the most predictive variables from a particular level of Hierarchical Cluster Analysis applied to the inputs.

• Principal component extraction.

• Training a network and picking the variables with the highest sensitivity.

• Training a network and measuring the effect of leaving out one variable.

• Iteratively training and pruning network inputs with the lowest sensitivity.

• Causal data analysis to identify fundamental causal inputs.

• Genetic variable selection.

Though many of these approaches work well on clean data such as found in physical processes, even there they often prove suboptimal. As an example, an analytic formula of a chemical reaction was constructed with two inputs and one output. Each of the independent inputs was ttansformed into four new inputs for a total of eight inputs. These transformations were selected based on prior analysis. The input space was rotated (detrended) to remove the principal components that were modeled as a linear trend. A netwotk was trained on the residual and added to the trend. The result was a 8-5-1 network with a linear correlation of 0.93 and a notmalized rms error of 0.1 .* When a network was trained on the data without rotating the input space, the result was a 8-2-1 network with a linear correlation of 0.99 and a normalized rms error of 0.02. In both instances, dozens of networks were trained. The results represent the best network in each test. It is not intuitively obvious why rotating the

*RMS (root mean square) error is the square root of the average squared error. (Editor)



combined input and output space should make the problem so much harder for a neural network to solve. This is an area that requires more research. A lesson to be learned from this is that you cannot rely on "theoretical" arguments to predict the best way to preprocess data. The only effective method is to take the time to compare a new approach to other proven methods on real data sets.

Genetic variable selection is the best overall approach that was tested. It was implemented as a binary chromosome in which each bit represented the presence (1) or absence (0) of a candidate input. The subset of inputs represented by a chromosome is used to train a network. The performance of the network on the test set is its measure of fitness. Standard bit-mutation and uniform crossover operators are used.

In initial tests, the final population of the genetic algorithm contained several sets of candidate variables all with similar rankings. When a network was trained on the union of the top five sets of variables, the resulting network performed worse than networks trained on any of the subsets. These subsets were definitely synergistic. The situation appears to be that certain variables mask the more subtle and at times more powerful predictive nature of other variables when they are all together. Only the genetic variable selection approach was able to identify these subsets.

When an application starts with 50 or more candidate variables, even genetic variable selection may have problems. A solution to this is to cascade the variable selection process. In cascade variable selection, genetic selection is repeatedly performed. Each selection is limited to a small number of generations so that the resulting populations are still quite diverse. The frequency of occurrence of each variable in the top 10 percent of each population is computed. Those with a low frequency of occurrence are dropped from future consideration. The remaining variables are retained and the process repeated until a stable set of candidate variables remains. Cascade variable selection, a feature in NeuralWorks Predict™, has been successfully used on a problem with over 3,500 candidate input variables.

The primary benefits of effective variable selection are better performance and better generalization. Table 16.6 shows the impact of variable selection on the performance of networks trained to predict the S&P 500. Reducing the number of inputs from 65 to 5 resulted in substantially better generalization from the training to test to validation sets. Notice that the network was able to find an almost perfect solution on the training set using 65 inputs. However, performance degraded rapidly on the test

TABLE 16.6 THE effect of variable selection on generalization

Data Set

Run 1

Run 2

Training set

0.976

0.573

Test set

0.329

0.433

Validation set

0.182

0.499

Run 1 used 65 inputs to predict the five-day forward change in the S&P 500. Run 2 used 5 inputs selected by genetic algorithm to make the same prediction. All other conditions were maintained constant. Performance measure is linear correlation coefficient.



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [ 101 ] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150]