back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [ 146 ] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205]


146

Rank = Profit per trade * (1 - (I/sqrt(number of trades) * (gross profits/gross losses)

If the profits per trade are larger, the rank is also higher; if there are more trades, the rank is higher, and if ratio of gross profits to losses is greater, the rank is higher. These factors are not weighted, but they serve as a simple criterion for comparing or measuring the fitness of a chromosome. For example, a trading method that retumed $500 per trade for 10 trades, with a gross profit/loss ratio of .5 would have a rank of 170.75. Another sjstem that retumed only $250 per trade over 100 trades with a profit/loss ratio of 1.4 would rank 315.00. Therefore, die sjstem with the smaller retums has a much more agreeable trading profile according to the fitness criteria. Eadi analjst must create a criterion that allows those strategies with a personally desirable profile to survive.

Mution and Crossover

Remembering that a genetic algorithm is a sophisticated search method, it needs a way to introduce new rules (genes), combine genes into individuals that have passed the fitness criteria, and discard genes and individuals that are less promising. This is done through mutation and crossover Mutation is the process of introducing new genes, or combinations of genes, from a gene pool, which we have defined as a set of rules and relational operators, similar to those in Table 20-6. For example, the first gene, which represents a frend-following calculation, could be a moving average, exponential smoothing, linear regression, or breakout. In mutating the gene, one of these four techniques is chosen randomly. Similarly, the calculation period and the way in which rules are combined are selected randomly Tjpically, only one of the individuals in the chromosome is mutated in eadi st ; therefore, in Chromosome 1,

Chromosome 1: MA. 10, <, C, [0], &, Sloch. 5, >, SO, 1

the genes in the second individual (Sloch. 5, >. 50,1) might be mutated to (RSI. 10. >, 40.1).

Crossover is a way of combining two chromosomes by exchanging their individuals. The result is called an offspring. Using chromosomes 1 and 2, we can switch the first mdi-viduals to get

Oflspring 1: Exp, 20, <. L, [1]. &. Sloch. 5, >. SO. 1 Offspring 2. MA. 10, <, C, [0], &. RSI. 10, <, SO. 1

The two methods of mutation and crossover provide the only tools needed to infroduce new features to the optimization and to combine features in all wajs. If you could continue this process indefinitely, you can study every possible mutation and combination.

The process of natural selection is to allow only the best individuals to survive. A sfrong selection criterion chooses only those individuals with the highest ranking, as determined

by the fitness test. A weak criterion allows lower rankings to survive. The implementation of the selection process is also drawn from evolution. When an individual has a high fitness score, it becomes a larger part of the population; when it has a low score, it is removed or reduced from the population. This can be done by modifjing the list of conditions, which is used to create new genes and individuals randomly. For example, in the first gene we have the possibility of four frend criteria:

1. Movmg average

2. Rxponential smoothing

3. Linear regression

4. Breakout

Using random selection, eadi one has an equal chance of being picked. Let us say that we perform the first 10 tests without removing any of the possibilities, but only measure the average ranking of the results. On the 11 th test, we select linear regression and compare the results of its fitness with the average fitness of the first 10 tests. If the fitness is greater, we make a number of copies of this linear regression gene and add it to the list of frending methods.



One popular way to calculate the number of copies is to divide the current fitness, by the average fitness, F, of previous tests. Therefore.

Number of copies = current fitnessaverage fitness

This method works best when there are a large number of choices rather than only four possibilities. However, if the fitness of the linear regression was 4, and the average fitness of previous tests was 2, we would make two copies, replacing the original entry. The new list would then have

1. Moving average

2. Exponential smoothing

3. Linear regression

4. Breakout

5. Linear regression

When a trend criterion is mutated, that is, another criterion is selected randomly from the list there is now a 40°o chance that a linear regression will be selected, and a 20°6 chance that another method will be used. Natural selection has resulted in the linear regression being more desirable because its fitness ranks higher. In the initial tests that formed the average fitness, there were other conditions that were mutated other than the frend criteria. It could be that these conditions were the reason for the higher score, and not the frend method. At the same time the first gene is copied, other genes that were part of the same chromosome were also copied, so that all of these genes have a better chance of appearing. Because we have not eliminated any of the other frend criteria, each one should appear from time to time. If ttie breakout method is actually better than the linear regression, it should have a higher fitness when it is mutated; it will then be copied and, theoretically, displace the linear regression as the survivor. Eliminating a choice may speed up the genetic algorithm, but infroduces the possibility of removing a good technique on the basis that il only appeared in combination with other genes that were not sfrong.

Putting It into Practice:Training,Tuning, and Testing

The technique for a successful search using a genetic algorithm requires that the data be divided into three subsets-training, tuning, and testing-in a manner similar to a neural network and using a procedure of step-forward testing. Beginning with the first 1,000 data points, the sjstem is trained; it is then tuned on the n 500 data points, and finally the performance using a test set of only 10 points is recorded. The process is then continued by shiftirig the data forward by 10 points. In this way, the performance is recorded for dis

joint tests of 10 points. This process implies that new parameters must be found every lo data points for trading to duplicate the training method.

CONSIDERING GENETIC ALGORITHMS, NEURAL NETWORKS, AND FEEDBACK

Stumbling onto the perfect solution, or even a better solution, is ahvajs acceptable. Recognizing a good solution, by hard work or by chance, still requires talent many discoveries that seem to have occurred by chance are really the product of hard work that provides the opportunities for discovery A genetic algorithm is a way of recognizing a situation that may be an improvement or a brilliant new method-but not ahvajs. Unlike real-life genetic mutations in which you can relate the change to the environmental need, this genetic algorithm may simply apply an isolated rule because of a single situation that may never occur again in the same way. To follow the path of a mutation, we need to confirm that this new rule is statistically and logically sound-that there are enough cases to make it appear to be the right choice. Dont accept the results without carefiil thought and validation.

The large number of possible tests that make the genetic algorithm valuable also requires that you understand the concept of local versus global solution* When you start with a random set of rules and parameters, then varj some and combine others, it is possible that you will find a local maxima, that is, a good solution but not the best, it is possible that a combination of frends and indicators that would have produced the perfect sjstem was never seeded by the random process, nor was it found by changing some of the values during the process. One solution is to run the genetic algorithm a number of times to see if it arrives at the same solution starting with different random values While it is not a guarantee that you will cover all major combinations, especially when there are a massive number of



variations, it is the simplestway to avoid serious oversights.

Feedbad; is die basis for successful solutions usmg advanced searching methods, but it also creates uncertainty. For example, a neural network may be trained on 70°o of the data and tested on 20°o. When the test data is used to evaluate the weighting factors found using the training data, it provides feedback, which biases the selection of weighting factors in the training data. Although these data sets are separated, the constant feedback between the two provides a connecting tunnel that requires them to be considered as one data set. The testing data is no longer out-of-sanple; therefore, another I0°o of the data is withheld for scientific out-of-sample testing when the final solution is found, if that 10" data does not produce results consistent with the training and testing, then the inputs musl be changed and the process begins again. There is now a dilemma as to whether there is any out-of-sanple data at all because the 10° held aside has been used as feedback for the entire process.

This problem is not unique to neural networks and genetic algorithms, but to all testing processes beginning with basic serial optimization. How do you test your results on current data without being guilty of feedback? There is really no answer. Statisticians claim that the best solutions use the most data; you cant overfit a solution when you tesl a large number of situations. The results are forced into being generalized. Holding a small amount of data aside for out-of-sanple testing may not prove anything unless the results are exceptionally bad. If the out-of-sample results are poor, they may still be representative of one or two small periods during a long historic test. The final decision rests with the trader and not the computer, if the results seem reasonable, there was sufficient test data, the inputs were relevant, and the performance was monitored before trading begins, then the solution is likely to be good.



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [ 146 ] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205]