back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [ 135 ] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166]


135

Figure 13.4 Neural network architecture: (a) simple three-layer perceptron; (b) weights, biases and transfer functions.

13.2.1 Architecture

The architecture of a simple three-layer neural network is shown in Figure 13.4a. A neural network that has this type of layered architecture is called a multi-layer perceptron. It will operate by matching the outputs with targets that capture the desired response. Such neural networks have obvious applications to price forecasting, where the target for the network output would be the prices that are being forecast. Alternatively, they may be applied to estimate parameters of a distribution; in this case the output of the network would be the parameters of the distribution. The target response of the network is obtained by matching observed with predicted returns within the training data set.



In a multi-layer perceptron each set of input data is first processed into input nodes, and then passed through a series of hidden layers via the various connections of the network, to one or more output nodes. The nodes in the hidden layers contain transfer functions that give the non-linear twists that are characteristic of all neural networks. If there is more than one hidden layer, these layers are usually structured like sandwiches with alternating layers of non-linear and linear transfer functions.

The structure of a neural network depends on the following:

> The number of layers and nodes per layer. Working through the network, there is the input layer, then one or more hidden layers, and finally an output layer. One hidden layer is often sufficient for most purposes, as in Figure 13.4a.

>- Any restrictions on the connections between the nodes. Connections are often, but not always, made only between adjacent layers, again as in Figure 13.4a. The more connections that have non-zero weights, the more complex the neural network. Most neural networks place a penalty on over-complexity for reasons that are explained below.

>- The form of transfer functions. The transfer functions, which determine the non-linear twists in the hidden layers, must be differentiable. Often they are also monotonic and -shaped, like the hyperbolic tangent or the sigmoid function, so that they squash the input from an infinite range to a finite interval of the real line. Each node could have a different transfer function, not necessarily because they have different functional forms, but because the parameters can differ between nodes.

Figure 13.4b illustrates how the network operates: The outputs xf (j - 1, . . ., ri) from the n lower nodes are multiplied by the connection weights Wj (j = 1, . . ., ri) and summed, a process that is called attenuation. The sum is added to the attenuation bias w0 of the node in question and the result = w0 + hii + . . . + wnx„ is then passed through a transfer function/so that the output of the node is f(y).

13.2.2 Data Processing

Preprocessing of the data involves normalization and data compression. Normalization of the different time series that are used as input vectors facilitates the process of tuning the parameters: they would have totally different orders of magnitude unless all inputs are normalized to lie in a similar range. This range is usually [0, 1] or [-1, 1], depending on the form of transfer function. For example, if the transfer functions are hyperbolic tangents, which have domain (-oc, +oo), all inputs should be normalized to the range [-1, 1]. This could be achieved by putting:

The nodes in the hidden layers contain transfer functions that give the non-linear twists that are characteristic of all neural networks

x*- 2(.y, - min)/(max - min) - 1,



where min and max are the minimum and maximum values of the time series input {x,}.

Note that the targets that will be matched with the network output also require normalization, independently of the input normalization. For example, if a transfer function is sigmoid with output range [0, 1], it must be matched with targets that take a similar range. Actually the sigmoid only takes the boundary values 0 and 1 when inputs are infinite, so it is usually advisable to normalize the targets to lie well within the interior of the interval.

Wavelets perform a similar function to principal components in that dimensions are reduced while all the relevant information in the input vectors is retained

Data compression is necessary for the network to cope with long historic time series. Often there is much redundant information, so the optimization procedure is facilitated when the redundancies are removed. For example, moving averages of different lengths, or discrete cosine transformations, can be applied to smooth out noisy data. Or if the data are highly collinear just the first few principal components of the vector time series can be taken as inputs (§6.1). Principal components analysis for data preprocessing is a standard option in many neural network packages. Another popular form of data compression is to use wavelet transformations (Press et al., 1992). Unlike principal components, wavelets are not necessarily orthogonal, but they perform a similar function to principal components in that dimensions are reduced while the most important information in the input vectors is retained. Wavelet transforms are available as in-built procedures in many statistical packages.

13.2.3 Backpropagation

A training set of data is used to estimate the parameters of the network, viz. the weights on each of the connections and the attenuation biases of each node. Initial values of these parameters are set, and then the network is trained by comparing the results from the output nodes with observed targets using some type of performance measure. During training, the weights and biases are estimated by iterating on the performance measure for output results.

The process of iteration between initial and final values of network parameters is called backpropagation (Azoff, 1994; Ripley, 1996). The backpropagation algorithm is illustrated in Figure 13.5. It operates by first calculating a 5 value for each output node, and then propagating these deltas back through the layers of the network. The delta values of the output nodes that are used in the backpropagation algorithm are calculated as - / , where is the output from the output node and E is the error function (§13.2.4).

To propagate the output deltas back through the network, suppose that node j has output f(y), where/is the transfer function and is the attenuated input to that node. If the nodes above node j have deltas 5b 52, . . ., 5 and the connection weights from node j to these nodes are wu w2, . . ., wk then the delta value for node j is



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [ 135 ] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166]