back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [ 106 ] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150]


106

second-layer nodes have a transfer function that can be one of several different functions. The choice is left to the user. The most typical function used is

exp[(Z.-l)o-2]

whete

Z. is the result of matching pattern, to the input Z. < 1 a is a user-defined smoothing parameter

The a parameter has the effect of smoothing or lumping the training examples. Small values of O" tend to make each training point distinct, whereas larger values will force a greater degree of interpolation between the training points. Small values mean that the probability density function is close to that of the training examples. Large values of O" would force the probability density function to be Gaussian even if the underlying function is not.

The third layer in Figure 17.4 has one node for each class of output. Each cell sums only the output of the pattern nodes in the second layer that correspond to its class. This result is then passed to the fourth layer consisting of output nodes. Output nodes take into consideration a priori probabilities as well a bias factor representing losses associated with wrong predictions. These nodes then use a hard limiter to produce a binary output.

Reports have shown PNN to be as good as or better than conventional back-propagation neural networks. While requiring more memory than a backpropagation neural network, they train many orders of magnitude faster. The drawbacks concern the use of the biasing factor in the third layer and the smoothing factor, O", in the pattern layer. These must be chosen through trial and error. Because they train so fast, this is not a problem.

These networks have been used successfully by other researchers for short-term stock prediction. We recommend that you read Specht and other references available on the Web site before embarking on the use of these networks. Versions of the PNN written for MATLAB are available free on the World Wide Web at http: cheml.nrl.navy.mil/~shaffer/pnn.html. At this same Web site is a good list of references on PNNs.

STATISTICAL NETWORK™ DATA MINING TECHNOLOGY

Here, we discuss an ontogenic network algorithm that has been successfully applied to stock market data mining. We first discuss the Statistical Network data mining algorithm, and then present a specific approach and corresponding results for a particular stock market modeling application.

As presented earlier, analytical development of trading algorithms is an extremely difficult task. For many fundamental and practical reasons, an in-depth



understanding of the complex set of interactions among the plethora of prices, indicators, and other variables (including human psychology) required for precise mathematical models typically does not exist. Further complicating this task is the high degree of noise (i.e., price movements that cannot be explained by existing variables and indicators) that typically exists in these applications.

As presented, modeling of complex dynamic systems (i.e., financial markets) from examples of behavior-rather than from a fundamental understanding of the system-is often a more successful strategy. Statistical Networks offer a practical on-togenic network modeling approach that can learn complex relationships from example data.

Statistical Network Algorithm

Statistical Networks produce relational models (those relating a set of inputs or observations to a desired parameter estimate) that learn inductively from empirical evidence. Relationships that potentially represent a complex process or environment are hypothesized and "scored" according to some criterion that minimizes error. On the basis of the performance of the hypothesized relational model, several refinements and adjustments are made. Traditional statistical regression and neural network approaches offer some utility, but suffer from practical limitations.

Statistical Networks process information with complex mathematical functions.9 Functions are attractive because they capture a large number of complex relationships in a compact and rapidly executable form. The Statistical Network learning algorithm produces a network of functional nodes, each containing a multiple-term polynomial relationship. Polynomial nodes are an extremely powerful method for performing complex reasoning tasks-they are the basis of traditional neural networks and other modeling techniques. They process one, two, or three inputs to compute an output value; and contain a bias or constant term (toJ, and linear, quadratic, cubic, and cross terms. A linear node processes several inputs and contains only the linear and bias terms. The equations for each node type are:

Single = wo + w]xj + wcf + w}x3

Double = wo + w Ix! + WjXj2 + w}x3 + wx., + wc2 + w3 + w?x/x2 + wtx2 + u>9x2x2

Triple = wo + w/xl + w2x2 + WjX3 + wx2 + wjc2 + wjc2 + W-X3 + wc3 + wjc3 + wigx1x2

+ WUX1X3 + W,2X2X3 + W,3X,X2X3 + W14X1XI + W1SX,2X2 + W,6X,X32 + W17X,2X3 + WjjCjxf + WI9X2X3

Linear = w + w,x, + wjc~ +. . . + w x

11 7 2 n n

An example Statistical Network is shown in Figure 17-5. It is a feed-forward network of polynomial nodes processing information from left to right. Each node produces intermediate information that is used as inputs for subsequent nodes. This networking strategy segments the overall relationship being modeled into more manageable components, and simplifies the learning process. Functional networks are



Figure 17.5 Example statistical network. Input values propagate through a series of functional nodes that are rained automatically by the data mining algorithm.

Output

synthesized automatically from a flat file database where each column is an input or output parameter (i.e., a variable), and each row contains an example set of the parameters. A hypothesize and test strategy finds the network that best represents the relationships contained in the database.

While individual nodes only allow up to three inputs and are limited to third order terms, using them in the networking strategy shown in Figure 17.5 allows the overall network to accept any number of inputs. In addition, because a specific node can contain a third order term, a two-layer network can model a ninth-order relationship. An additional layer allows the modeling of up to 27th order relationships, and so on. Therefore, networking relatively simple node types creates a powerful knowledge representation.

The Statistical Network learning process produces networks of functional elements that more effectively "learn" complex relationships among features than is often practical with other methods. The key to any machine learning strategy is the learning algorithm itself. It must be able to generalize from, and not memorize, numerical examples of a problem domain. It must be able to automatically discover relationships to produce a model that performs well for not only training data but also independent (i.e., real-world) data. The driving reason for this crucial requirement is that all data contain uncertainty. Noisy, missing, conflicting, and erroneous data are all manifestations of uncertainty in numerical examples.

An effective machine learning algorithm must learn relationships and avoid memorizing noise in an automated manner. Statistical Networks achieve this through the use of intelligent search heuristics to find the optimal network architecture and a modeling criterion to ensure generalization. What follows is a top-level summary of the Statistical Network learning algorithm (outlined in Figure 17.6).

Step 1

Several statistical measures are computed for each database variable such as their mean and standard deviation. The values for each variable are normalized so that they



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [ 106 ] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150]