back start next
[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [ 112 ] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134]
112 where each time series contains 9 years of hourly data from January, 1, 1987, to December, 31, 1995. The data are divided into alternate training (insample) and test (outofsample) periods of 1 \ year each. We have used the four arithmetic functions (+,,*,/), the basic comparison (<, >), and the logical operators (AND,OR,IF), which were defined in Bhattacharyya et al (1998). Both comparison and logical operators are defined for a ternary logic {1,0,+1}, which corresponds to the signal returned by a trading model. As in the case of the arithmetic functions, these operators are used with syntactic restrictions to preserve the overall symmetry properties of the generated GP trees. Indicators reported in earlier studies (Oussaidene etal, 1997; Bhattacharyya etal, 1998) are also used for this application. The terminals used are as follows: Antisymmetric indicators: M„ that represent the momentum of price x over n days. Here we consider three different ranges: Ms and Mi6, M32. Symmetric indicators: V„ that are volatility indicators over n days: V« and Vi6 Constant values in the range [2, +2J. As a last step, the output value of the GP tree is mapped to a gearing value in the range [1, +1] to obtain a gearing signal. For the purpose of reducing overfitting, each trading model is tested on many exchange rate time series. The fitness measure, Xeg, of a GPtree is then defined as the average fitness over each exchange rate, decreased by a penalty proportional to the standard deviation of these values. Five independent optimization runs are performed. In each run, we evolve four subpopulations of 100 individuals each. All subpopulations are created randomly using the ramped halfhalf approach (Koza, 1992) with a maximum depth of 4. In the reproduction phase, we use the tournament selection. The mutation and crossover operators arc used with corresponding probabilities of 20% and 80% and the maximum depth allowed for generated trees is fixed to 6. Each subpopulation sends periodically 5% of its best individuals to another, randomly selected, subpopulation. A given subpopulation includes its local buffer of received migrants when the difference between the best and the average scores of the current population is smaller than half the standard deviation of the scores. The new individuals replace an equal number of lowfitness ones in the receiving population. The evolution is stopped when a maximum number of 10,000 individuals have been evaluated. The selected solution is the best solution found by the four subpopulations. Table 11.2 presents the performance results of the different runs. The first entry gives the average quality of the basic antisymmetric momentum indicators M8, Ml 6, and M32. The results of the optimization runs are given in decreasing order of their outofsample performance. The results of this table indicate that on average the performance of the solutions provided by the genetic algorithm are significantly higher than the performance of the basic momentum indicators.
TABLE 11.2 Trading model results versus tree complexity. Tree complexity, yearly return R, and fitness value Xeg (in percent) corresponding to the insample and outofsample periods. The results are given for preoptimized indicators and for the best solution of each optimization run.  Tree  Insampli   Outofsample   complexity   Xeff   Xeff  Indicators   4.23%  1.84%  3.61%   .11%  Run 1   8.12%  3.86%  6.29%  1.09%  Run 2   9.49%  4.16%  5.80%  0.47%  Run 3   7.45%  2.82%  5.06%  2.34%  Run 4   7.71%  3.55%  4.87%  3.28%  Run 5   6.84%  2.63%  3.10%  5.85% 
For instance, the solution selected in the second run that provides the best insample result is given by the GPtree: (IF (> V8 0.027) (IF (> V16 0.108) (+ M16 M32) (* M32 1.957) ) (* M8 1.542) The use of syntatic restrictions allows the discovery trees of lower complexity on average, compared with the previous study of Oussaidene et al. (1997), and all the generated GP trees are valid. However, there may exist some solutions that do not generate any trades, because some conditions always evaluate the same value. These solutions are quickly eliminated in the selection process. On average the solutions seem to be more robust and to provide higher outofsample performance. One limiting problem in these optimizations was implied by the use of "hard" logical and basic comparison operators that give rise to undesirable discontinuities due to the jumps of the Boolean variables that can occur for tiny changes of the basic indicators. One better way of implementing these operators would be through the use of fuzzylogic. A smoother transition between different logical states can probably provide better performances. All these optimization runs are done using hourly data, for obvious efficiency reasons. Then, when such optimization is completed, the final selected solution) must be tested in the complete realtime trading model environment with tickbytick data to check its behavior and to do some finetuning needed for the understanding of the final users.
11.5 OPTIMIZATION AND TESTING PROCEDURES When the main trading strategy has been selected, one of the most difficult tasks is to optimize the parameters present in the model and to test the different solutions in order to select the most robust trading model to be used in the real time data. The goal is to select robust solutions that have desirable generalization properties to provide satisfactory performance in the future. In the optimization process we expect the selection of a trading model that realizes large profits from the price moves present in the time series. We assume that these profits will be maximized when the trading model catches the dynamics of the pricegeneration process. Unfortunately, during the optimization phase the trading model repeatedly sees the same data set and discovers how to profit from some specific price moves that could be due to some random fluctuation of the prices. This will lead a trading model to provide poor results on realtime data. Such a model is called an "overfitted" model. To minimize overfitting during optimization, a few important elements are to be present: A good measure of the trading model performance Indicator evaluation for different time series Large data samples A robust optimization technique Strict testing procedures An optimization algorithm will always try to find the best solution in the parameter space. In the case of trading models, optimization of such properties is not suitable, because a solution corresponding to the best possible parameters generally corresponds to an overfitted solution. As we argued earlier, such solutions will often generate poor generalizations in a realtime trading setting. Tn the next section, we shall concentrate on one robust optimization technique based on the genetic algorithm approach. This method allows the selection of a group of solutions that correspond to broad regions of the parameter space where the trading performance is higher on average, rather than the highest. 11.5.1 Robust Optimization with Genetic Algorithms The new element we want to present in this section is a way to automatize the search for improved trading models. Genetic algorithms offer a promising approach for addressing such problems (Allen and Karjalainen, 1999). Genetic algorithms consider a population of possible solutions to a given problem and evolve it according to mechanisms borrowed from natural genetic evolution: reproduction and selection. The criterion for selecting an individual is based on its fitness to the environment, or more precisely to the quality of the solution it bears. A possible solution is coded as a chromosome (gene), which is formally the data structure containing the values of the quantities characterizing the solutions. In the framework of the present optimization, a gene will contain the indicator parameters, a time horizon, a weighting function of the past, and the type of
[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [ 112 ] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134]
