back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [ 104 ] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150]


104

Model Output: What Parameters Should We Predict?

Another question that arises when building a model is, "What is the output?" The answer depends on what we are trying to accomplish with the model. Sometimes the answer may be the value of an index or a stock price forecast. Other times, it may be best for the output to be a rate of change in the current price. This has the advantage of de-trending the data, an important consideration.

Obviously, we want to predict the future, but how far into the future? There are basically two answers to this question. Since we know by intuition that a prediction of something tomorrow is more accurate than something a month from now, a short prediction is best. However, the daily movements of the DJIA contain a large degree of noise, through which the daily trend of the market may be difficult to determine.

What, then, are our choices? We can develop models that predict for a short time in the future. These values can then be used as input to the same networks to predict the next-time step in the future. This process, known as iterated prediction, can continue until a prediction is made sufficiently far in the future. The problem is that any error made in any of the predictions will be magnified many times in this process. If the prediction point is significantly far away, the prediction may have little resemblance to reality.

Another approach is simply to develop a model that predicts as far in the future as you desire. This avoids the problem of magnifying small errors, but is likely to be inaccurate due to higher variability. The only way to really determine what is best is to do both: compare results, then choose.

Neural Networks: A Semiautomated Modeling Approach

Many years ago, a new technology burst on the scene-neural networks. It was originally thought that this technology would solve all the worlds problems. It could learn, and within just a few years with the speed of computers and the large, perfect memory, computer scientists would be able to emulate the human brain and thought processes. In fact, it was thought that they might be able to do even better. What was soon discovered and reported by Minsky and Pappert,2 was that single-layered neural networks, based on the Perceptron, could not solve even the simple XOR problem.* Werbos, in his dissertation, showed that multiple-layered neural networks using Per-ceptrons, connected together in a network and trained through the use of backpropagation, the XOR problem could be solved. In fact, it was found that neural networks could handle the separation of highly nonlinear spaces through the creation of hyper-planes in a multidimensional space.3

* This tested whether or not the neural net could detect if two binary inputs were the same or different.



Determining whether to buy or sell stocks is akin to the military target recognition problem. Is predicting the markets a target recognition problem? Yes. We are trying to recognize if the time is right to buy, sell, or hold. These three are our targets and this problem is posed as a discrete space with three decisions.

Early neural net users thought that they could just throw all possible data at the network and let it figure out what is important and what is not important. Important values will have large weights associated with that input and unimportant inputs will have associated weights close to zero. What was not considered is that as the number of inputs increases so too does the complexity of the network and the required training time. This approach also does not consider who it is that will gather and enter all this data.

Once the input and its transformations have been chosen and normalized and the proper output chosen for the appropriate prediction period, then the network topology must be selected with traditional neural networks. The topology consists of the number of nodes per layer, the number of layers, and the transfer functions (squashing functions) to be used. Kolmogorov4 and Cybenko5 have proven that any function can be represented by a neural network consisting of three layers (input, hidden, and output) using any transfer function provided it is not an even polynomial. In essence, a neural network is nothing more than a complicated, nonlinear equation with many adjustable coefficients. In fact, a sufficiently complex equation with all the input variables represented, a copious number of coefficients, and an appropriate algorithm to adjust the coefficients, could perform the same job as a neural network.

Conventional wisdom suggests the following approach for applying traditional neural networks: use the sigmoid as the transfer function (since this can represent any function), one hidden layer (since this is sufficient to represent any function), and twice the number of hidden layer nodes as in the input layer. This may or may not work. We must remember several points when selecting the topology. If there are too few hidden nodes, there may not be sufficient degrees of freedom for the network to model the underlying process adequately. If there are too many hidden nodes, the network may just "memorize" the training data and will be unable to generalize. During memorization, the network models all the noise and inaccuracies in the input data. It will be unable to do a good job of extrapolating, predicting outside the range of the input data or it may be a poor performer at interpolating, predicting data within the input data range. In addition, the selection of too many hidden nodes will require vastly more time to train. Why should we consider more than one hidden layer? Kolmogorov and Cybenko both showed that one is sufficient (although neither tells us how to construct the network, only that one exists). However, training time may be less and there may be fewer problems with local minima if more layers are used.

When training a neural network, proper selection must be made of starting values of the node weights (normally randomly selected) and learning rates. Frequently, the neural network will become "stuck" in a local minimum and will not perform the desired prediction task. When this happens, reinitializing and retraining will generally solve the problem.



What have we determined so far? The topology must be selected to promote fast training while having sufficient complexity to model the underlying process. Many topologies must be examined to select the right one. In practice, these decisions are made using a trial-and-error approach (along with intuition) to develop the model. Considering all the possibilities, it would probably take forever to train and select the final network model, even on the most high-powered computers available. Further, the network needs to be retrained frequently to model the changing dynamics of the market. Is there any hope? Is there an easier way? There is; the technology is known as ontogenic statistical networks, models that can generate their own topology during training.

Ontogenic Statistical Networks-A Truly Automated and Intelligent Modeling Approach

To address the problems that backpropagation neural networks present, a new class of neural networks has been developed, called ontogenic statistical networks. The name comes from the word ontogenesis meaning the history of the organism. An ontogenic network develops its own topology during training. No longer does the developer need to decide, a priori, how many layers, what transfer functions, or how many nodes. Depending on the specific ontogenic network technology chosen, unimportant inputs may also be eliminated. A particular class of ontogenic networks-statistical networks-is presented here with modeling results.

Some researchers would class algorithms that start with many nodes in a hidden layer and then prune (or eliminate) nodes that have little influence as ontogenic neural networks. We specifically exclude these methods from this class. The reason is that the large network must be trained to determine what nodes to prune. This can require a lot of time. Further, the pruned network must also be trained, requiring even more time. It makes much mote sense to start small and grow to the cotrect size.

Another approach taken in some algorithms is to eliminate the concept of back-propagation of the error. In other words, develop a very fast method to train a fixed network. Using this method, many network topologies can be evaluated very quickly. However, an evaluation criterion must be used that is sensitive to how well the training data is modeled and to the complexity of the model itself.

Cascade Correlation

One of the best known ontogenic networks is Cascade Correlation developed by Fahlman and Lebiere.6 They discovered that the reason many networks require long training times was that two or more nodes in the hidden layer were "fighting" to recognize a certain input pattern. Often this would take many training cycles to resolve. They reasoned that this process could be hastened if the network is built one node at a time. This would also result in a network that could not get stuck in a local minimum. When a new node is added, so too is an additional dimension that allows



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [ 104 ] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150]