back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [ 107 ] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150]


107

figure 17.6 Statistical network learning algorithm. This automated process finds the best network architecture, the node equation, and guards against overfitting the training data.

Database Analysis

Network Hypothesis

Learn Network Equations

->

Determhe Netwak Performaxe

(DhH»uSic7) C¹Sffi) Cj°"g°""*>

Report Best Netwotk

Compae and Select Best Nawj*

exhibit a mean of zero and a standard deviation of unity, greatly enhancing node regression performed in Step 3.

Step 2

Candidate network architectures are hypothesized using graph-tree network search heuristics. The heuristics employ a survival of the fittest strategy-similar to the underlying concept of genetic algorithms-by hypothesizing more refined versions of networks that have already exhibited promise. Initially, very simple network models are hypothesized (i.e., those that contain only one node). The best of these simple models (as scored by the modeling criterion in Step 4) are then used with the original input parameters as building blocks to hypothesize more complex networks. Search heuristics determine the best manner of combining simpler networks to form more complex ones. This process is repeated (automatically) several times, each providing an additional network layer.

Step 3

For each hypothesized network, each nodes coefficients and their respective values are determined using advanced regression algorithms. In each node, the coefficients are

w , w„ Zf „ . . . , w .

1 z n

Step 4

Each network is "scored" with the Predicted Squared Error (PSE) modeling criterion, shown in Figure 17.7. The PSE was developed at Stanford University in the early 1980s specifically as a modeling criterion for statistical learning.9 The network



Figure 17.7 Predicted squared error modeling criterion, the pse produces robust models that work well with noisy data.

with the best (i.e., least) score is selected as the best for a particular database. The PSE performs a trade-off between network complexity and accuracy to find the simplest network that best models training and independent data. It gives an analytic estimate of the network for independent data. The PSE is:

PSE = FSE + KP = PSE + CPM [(2KIN) sp2]

where

FSE is the fitting squared error of the network on the training data

KP penalizes more complex networks, as they are more likely to overfit training

data and therefore not perform well on independent data

CPM is a Complexity Penalty Multiplier, used to vary the emphasis of the AT5 term

Ais the total number of netwotk coefficients in the network model

Wis the number of training observations

sf2 is an a priori estimate of the optimal models error variance

The PSE produces networks that avoid modeling noise and overfitting training data. The network synthesis process begins at the left of the PSE curve shown in Figure 17.7. As the complexity of hypothesized networks increases, the PSE of those



networks decreases until the network with the minimum PSE is found. The learning process ends when certain "stopping criteria" are met (see Figure 17.7). These criteria include heuristics that recognize when the learning process is taking place on the upward slope of the PSE curve, indicating the best network has already been found.

Statistical Network Advantages

While Statistical Networks are parametric at the node level, the hypothesis heuristics and modeling criterion at the network level create an automated nonpara-metric process. Therefore, the modeler is not required to be an integral part of the learning algorithm as is required by other approaches. This allows the modeler to focus limited resources on other issues, such as data collection, problem analysis, approach design, model evaluation, and trading system development. Compared with traditional neural network technology, Statistical Networks excel at estimating continuous parameters and are much more practical to develop. Because the process is nonparametric, resulting models generally outperform those developed with linear regression.

Example of a Statistical Network Trading application

To demonstrate the application of Statistical Networks data mining to market modeling, we chose to model daily price and volume data from the Dow Jones 30 Industrials (DJIA) as of January 31, 1997. We used High, Low, Open, Close, and Volume data from the period January 1, 1987, to January 31, 1997, which includes 2,548 trading days. Our goal was to create a model that accurately produces Buy and Sell indicators.

Technical Indicator Descriptions

One of the distinct advantages of Statistical Networks is that inputs that do not provide useful information for modeling the output variables will not be used in the final network produced by the Statistical Network learning algorithm. This ontogenic characteristic allows considerable freedom in selecting input variables; it allows one to include any and all variables that may contain useful information. The Statistical Network learning algorithm will automatically determine which variables should be used, and in what way.

For this problem, we chose a diverse set of technical indicators.10 Each is a function of Open, High, Low, Close, and/or Volume data. We did not use indicators that are functions of broad market indicators such as indices, new Highs/Lows, Put/Call ratios, and Up/Down Volume.

Each technical indicator is useful for characterizing certain market characteristics. The challenge is determining how to combine the many different characterizations



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [ 107 ] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150]