back start next
[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [ 114 ] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134]
114 where djc is the distance of the gene / to the centroid of the corresponding cluster c. Here the square root is used to avoid too large a correction for an average concentration of genes, as is often the case. To keep the clusters space as large as possible, we also have to minimize the overlap between diiferent clusters. To reduce this overlap, the clustering parameter min must be quite large and here we use Dm;n = Dmax. In order to have a reasonable clustering parameter for large dimensionality of the parameter space, the values of the two clustering parameters Dm\n and Dmax are multiplied by *Jn where n is the number of parameters to be optimized. With this new sharing scheme, the selection pressure is no more specific to each individual, as in a standard GA, but is the same for all genes present in a given cluster. This allows us to get a selection mechanism that looks for subpopulations of solutions with an average high quality instead of the best individual solution. Of course, the overall convergence speed is slightly reduced. The selection pressure toward robust solutions is still present through the adaptive cluster methodology that tends to create clusters around a group of good individuals and through the reproduction technique, which uses elitism and mating restriction inside each cluster. Moreover, to keep a larger variety in the population, all the individuals who do not really belong to any clusters (i.e., who are further than the maximum distance Dmax from all existing cluster centroids) will have an unmodified fitness value. During the reproduction phase these individuals will have no mating restriction and generally a slightly higher selection probability. To speed up the full process, the result of each different gene is stored and not recomputed when this gene appears again in the next generations. Moreover, the information of all the previously computed solutions can be used at the end to assess the reasonableness of the optimum solution. Eventually, the algorithm selects, for each cluster, the best solution that is not farther than the distance Dmax/2 from the cluster centroid. The final solution is the solution selected for the cluster that has the maximum average fitness corrected by the variancethat is, for the maximum value of fc  o(fc). The success of this type of genetic algorithm is still quite sensitive to the quality of the fitness measure but also to the normalization of the parameter space (i.e., to the quality of the metric used in the cluster construction). If the parameters do not have all the same sensitivity, this should also be reflected in the clustering algorithm. That is why we introduce the possibility of modifying the normalization of the parameter space, but in many applications this is not enough and some parameter mapping functions are needed. These functions depend on the specific problem to solve. 11.5.2 Testing Procedures Strict optimization and testing procedures are a necessary condition to obtain robust trading strategies. The three main phases in the development of new trading models are as follows:
The development and optimization of new trading strategies The historical performance tests to select the strategies from data that were not used to optimize the models Realtime tests to confirm the performance of the selected models The amount of historical data available for both the development (optimization) and the testing of a new trading model is always of finite size. On one side, to obtain meaningful and robust optimization results, a data sample as large as possible is requested. On the other side, the same is true for the statistical tests of the performance of a new model. Of course, the same data cannot be used for both the optimization and for the test of a trading strategy. The available historical data must be split into a minimum of two different sample periods. One period, named the insample, is used for the optimization and the other one, named the outofsample, for the performance tests. Such splitting must never be modified during the optimization or the testing phase, otherwise the risk of overfitting the historical data becomes very large and the statistical tests on the model performance are unreliable. Another problem to take into account with financial data is the longterm heteroskedasticity (i.e., the presence of clusters which correspond to periods where the average price volatility is higher and other ones where the average price volatility is lower). As many trading models can react quite differently according to the average volatility of the market prices, it is not very convenient if the two selected data sets for optimization and testing present significative differences in their statistical properties. A rale that provides reasonable results is to use twothirds of the historical data for the optimization and onethird for the tests. The first part of the optimization data must be kept for the indicator initialization. The size of this initialization period, also named the buildup period, depends on the type of the indicators. In the case of exponential moving averages, the size of the initialization must be approximately 12 times larger than the range of the slower moving average. At the end of the optimization process, the performance tests are executed once. If these performance tests do not provide good results, then the new trading model must be rejected. It is strongly recommended to avoid tiny modifications of the initial model until good performance tests are obtained, because such procedure implies that the outofsample data period is, in fact, used indirectly for the optimization process itself, and again opens the door to overfitting problems. When a new model is selected and passes the historical performance tests, the final phase is to check it in realtime for a few months. These last tests, named exante tests, are useful to confirm the historical performance of the model and to check its reaction to realtime data flow. At Olsen & Associates (O&A), only the models that pass with success, both the historical test and the realtime exante period, are used for real trading.
11.6 STATISTICAL STUDY OF A TRADING MODEL 11.6.1 Heterogeneous RealTime Trading Strategies The idea of this section is to use some trading models developed at Olsen & Associates as a tool to study the market structure (work presented in Dacorogna et al, 1995). These models act like filters that concentrate on typical price movements and give us information about the market itself. The hypothesis of a heterogeneous market leads to three conjectures: 1. In a heterogeneous market, no particular trading strategy is systematically better than all the others. Excess return can be gained for different trading profiles, so various ways of assessing the risk and return of trading models are needed. 2. The different geographical components of the FX market have different business hours according to different time zones and, on the assumption of the heterogeneous market hypothesis, different strategies. Therefore, there are disruptions in the market behaviors from one geographical component to the next. Trading models that do not explicitly analyze the geographical components can avoid these disruptions only by restricting their active hours to the normal business hours of one geographical market. For such models, trading 24 hr a day does not pay. 3. The most profitable models actively trade when many agents are active in the market (liquid periods) and do not trade at other times of the day and on weekends. The heterogeneous market hypothesis attributes the profitability of trading models to the simultaneous presence of heterogeneous agents, whereas the classical efficient market hypothesis relates this profitability to inefficiencies. (This would imply that the illiquid periods of the market are the most favorable for excess returns.) If our conjecture is right, the optimal daily trading time interval should depend on the traded FX rate rather than the model type. Trading will be most profitable when the main markets for a particular rate are active. Two trading models based on different algorithms are used in this study. The performance of these models is analyzed against changing market conditions, trading intervals, opening and closing times, and market holidays. The first trading model (RTT) is the one described in Section 11.4.1. Whereas the RTT model relies on one indicator with one time horizon, the second type of trading model (named here RTM) uses three different time horizons simultaneously to incorporate the views of three diiferent market components. Like the RTT model, the RTM models have a profit objective of 3%, but the stoploss value and profit objective are much smaller. The dealing frequencies of the RTM models are often higher than those of the RTT models, and they are also neutral more often. The study presented here does not try to optimize the models in any way, so the distinction between in and outofsample is of little relevance. All the tests were conducted in a 7year period from March 1986 to March 1993 for
[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [ 114 ] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134]
