back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [ 100 ] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150]


100

Another approach to achieving level independence is to use a broad market measure as the base and transform all prices as a ratio to the current market measure or a moving average of it.

In certain cases, magnitude may be important. For example, some levels of sales are strongly correlated to bond rating. In such instances, the problem may be subdivided into three or four broad subproblems based on sales level. Alternatively, the sales level may be encoded into a series of fuzzy membership sets as input to the network.

In general, transforming input data into a form that is level independent results in models that generalize better and will perform well in trading as well as trending markets.

Statistical Preprocessing

Most neural networks are constructed to minimize a sum of squared error or regression criteria. As a practical matter, the net result is that they work best when all the data is normally distributed, having zero mean and unit variance.

An example of this type of transformation commonly used to express price changes is log(pt+ ,/pt)- Though the raw distribution of price changes (pt+ fpt) is lep-tokuttotic (has a long tail), the log(pt+ j/pt) is more nearly normally distributed. This is what is meant by the expression, "returns are log normal."

The log transformation is one example of a variety of transformations that do distribution shaping. (A reasonably readable treatment of this is found in Chatter-jee.6) The basic idea is as follows. While examining one input variable, select a univariate transformation f(x) that maintains monotonicity (i.e., If x. < x., then f(x.) < f(x.)) and has a distribution that is most similar to the desired distribution.

One sttategy fot accomplishing this has been implemented in Aspen Technologys Neural Simulator™. A series of transformations are tested of the form g(a0 + a. x). The transforms tested include log(x), log(log(x)), exp(x), exp(cxp(x)), 1/x, 1/ 2, 1/ 4, 2, 4, 1/ 0.5, 1/ 0.25, 0.5, 0.25, tanh(x), ln(x/(l - )). The values aQ and a] are determined using a directed random search procedure, and include both negative and positive values for a. . The distribution of the transformed data is compared with a normal (or any arbitrary distribution) using the Kolomogorov-Smirnov statistic7. The transformation and values for aQ and a( that produce the best fit to the target distribution ate used to transform the data. Even when tfansforming the data with a technique like this, it is useful to retain the raw nonttansformed data as a candidate input.

Scaling Data

Candidate inputs should all be scaled into the range -1 to +1 with zero mean. This is what is expected for any statistical technique, including lineat and neural regression. Use the following formula to compute the linear scale (aj) and offset (aQ):



,--

where

x is the maximum value of x

x is the minimum value of x

x is the average value of x

avg t>

Data is transformed using the following formula: x = a„ + a, X x ,,. The new

new 0 1 old

values have zero mean and lie in the range [-1..+ 1].

Selecting Data For Training

How you select training data depends on what kind of network your are planning to use.

If you have carefully selected a handful (less than 5 to 8) candidate inputs, kernel-based networks, such as Generalized Regression Neural Network (GRNN) can be trained almost instantly and often will produce good results. In this instance, the output of the GRNN is the expected value of whatever you are predicting. Fuzzy rule-bases are equally effective in this application.

When there are more than a handful of inputs, or a backpropagation trained Perceptron network is used, subselecting training examples is important. The reason for this is that the backpropagation algorithm minimizes the sum of the squared errors between the estimated output and the observed output. This objective function is data distribution dependent. Figure 16.4 illustrates this issue. When substantial amounts of data are concentrated in one portion of the space, they tend to dominate the solution to the exclusion of the more sparse examples. Though the regression line shown is optimal in terms of minimizing sum of the squared errors, it fails to capture infrequent events. In financial applications, it is often the infrequent events that are the most important in terms of profit.

The solution to this is to find a way to uniformly select data so that it covers the input space. Unfortunately, with many candidate inputs, this is not possible. However, under certain assumptions, we can create an approximation of a uniform covering of the input space by selecting a uniform subset of examples based on their outcomes.

Here is how the procedure works. Divide the output range into a series of equal width non-overlapping bins. Assign each example in the training set to a bin based on its outcome. Begin with the first bin and select an example from it at random. Put the example in the output training set. Proceed to the next bin and continue this process in a round-robin fashion until the desired number of examples have been selected. Use the new data set to do variable selection and train the network.

max(xmm -x ,x -xmin,



Figure 16.4 Linear regression. This example illustrates the effects of data distribution. the line shown minimizes the sum of the squared error between the input (x-axis) and observed values (y-axis) for the 100 points.

0.9 j i

0.8 j

0.7 -0.6 0.5 0.4 0.3 0.2

0.1 -r

0 4-------1------i---- --------]-------1----1-----1-1--------

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Source: Aspen Technology, Inc. Used by permission.

Does it work? Figure 16.5 shows the cumulative equity curves on a validation set for trading the S&P 500. Each curve represents a network trained on different fractions of the available data. Each network was trained to predict the five-day forward change in price using nine inputs selected by prior experiments. If the output of the network was greater than 0.1, take or maintain a long position. If the output of the network was less than -0.1, take or maintain a short position. Of the data available for training, 40 percent was set aside as a test set in a train/test methodology. All networks were trained using Aspen Technologys Neural Simulator.™* Table 16.5 shows summary data for both the train/test data as well as the validation set. Note, that even with perfect information the trading strategy resulted in 94 percent win ratio on the train/test set and 89 percent ratio on the validation set. This is due to a mismatch between the prediction horizon and the trading strategy. This problem can be eliminated by using a more complex trading strategy.

In general, less data resulted in better generalization (more profit) up to a point at which the data was insufficient to fill the space. At that point (30%), performance decreased rapidly from overfitting a network trained on very few examples.

Predict contains many of the preprocessing features described in this chapter. (Editor)



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [ 100 ] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150]