back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [ 151 ] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205]


151

will smooth the performance. The best parameter set is the highest value of PL, once all averages have been substituted for the raw test results.

it is more likely that a spike or erratic results will appear in areas that test shorter periods than in those that tesl longer trends. The difference between the results of a 2- and 3day moving average is greater than that of a 30- or 3 1-day average or even a 30- or 40-day test. If the optimization was scaled properly, leaving more space between tests of higher value in the manner of a percent difference, the simple averaging technique will be a good substitute for testing every value.

Two-Parameter Averaging.

The results of a 2-parameter test may also be averaged. Using the form discussed earlier (Figure 219), the results of each test will be denoted as PLjj, the profit/loss associated with row i and column j of a 2-dimensional display. The object is to replace each PL, with PL. the average value of its surroundings. This is done, as seen in Figure 2I-9a, by taking an average of the eight test results adjacent to the ijth value as well as the center value. There are special cases when the ijth result is not fully surrounded but on the perimeter of the test map. Figure 2I-9b-d shows the averaging technique used when the ijth box is a top, side, or comer value. The 9-box average shown here is comparable to the 3-point average in the I-parameter test If the test has relatively small increments between calculation values and more test cases, an average of a larger area would be appropriate.

STEP-FORWARD TESTING AND OUT-OF-SAMPLE DATA

The correct procedure for testing ahvajs includes reserving some data to be used afterward to validate the results. The data used during testing is called in-sample data, and the reserved portion for validation is called out-of-sample data. This concept should be familiar to everyone involved in the design or development of a sjstem.

Traditionally, analjsis use as much data as possible to determine whether their proposed strategj is sound. The use of long test periods assures a good sample of price pattems, including long periods of sidewajs movement, bull and bear markets, and a good

FIGURE2I-9 Averaging of map output results, Center average (9-box). (b)Top-edge average (6-boxj. (c) Left-edge average (6-box). (d) Upper left comer average (4-box).



pt-1.1-1

Pi-.-l,l*l

PL, .,,1

<e)

PLi.i..

<b)

«=1.1....

PL.,,

(rf)

number of price sho(ts of various sizes. When more data are used, it is most likely that the results will show laiger profits and laiger losses.

When there is, for ex ample, 10 years of data available for testing, a standard approach would be to test the oldest 9 years, find the best set of parameters, then test the sjstem on the most recent year using the selected values. If the results of the out-of-sample period are similar to the in-sample performance, the sjstem is considered validated. One serious problem occurs when the results of out-of-sample testing fail to perform as expected. The results are inspected closely to find how this period differed from the past, and if successful, a rule change is inco orated into the program and the in-sample data retested. This repeated use of out-of-sample data is called a feedback peoess. Unfortunately, there is no longer any out-ofsample data to be used for validation. The improved results are the produci of fitting the data and you are left uncertain about the ability to produce profits in the future. This is a problem ttiat can only be resolved by a broad view of robust testing procedures discussed ttiroughout this chapter.

st forward tesmg procedure

The conc t of choosing a parameter set from a test of in-sample data, then applying the results to out-of-sample data, has been developed into a test process called st -forward testing (also walk-forward testing) This process goes as follows:

1. Select ttie total test period, for example 10 years of daily data from 1988 ttirough 1997.



2. Select the size of the intKvidual test intervals, for example 2 years.

3. Begin testing with 1988 and 1989 data.

4. Select the best parameters from those results.

5. Find the performance of the nest 6 months of data (the first half of 1990) using the parameters selected in step 4. Accumulate this out-of-sanple performance throughout the test process.

6. If there is more data to test, move the test period forward by 6 months (the second test will be from the second half of 1988 through the first half of 1990) and test the data; otherwise, go to step 8.

7. Go to step 4.

8. The results of the step-forward test are the accumulated results of the intKvidual out-of-sample tests in step 5.

This process clearly simulates the traditional approach to test design, using in-sample and out-of-sanple data in its proper order. Unfortunately, it does not correct the problems of reusing the out-of-sample data, which transforms that step to another piece of insample data.

Step-forward testing can infroduce a bias that favors faster trading models. Because the in-sanple test has only 2 years of data in our exanple, a long-term trading model may only post a few trades during this period, and one of the longer trades may be interrupted by the end of the data window and (Kscarded. This makes the results of the longer-term models unreliable. In addtion, these slower frading approaches will often start a trade at the beginning of the test data window at a point that is abnormal, that is, one that would not occur if the data were continuous.

The problem of short-term bias can be identified by the optimum parameters switching from short-term to long-term in successive test periods. For exanple, if you test moving averages from 5 to 50 dsjs, you will find one period shows 10 dsjs asthe best followed by a period in which 50 dsjs is best, followed again by a period in which 15 dsjs is best. This erratic behavior is a sign that the testing apecifications are too limited. It can be corrected by extending the test intervals from 2 years to 5 years and making the out-of-sample period I year. None of these solutions, however, correct for the use of out-of-sanple data more than once. More on this problem can he found in the following section "Retesting Procedure."

CHANGING RULES

As teating progresses, it is inevitable that the analjst will want to modifj a rule in the frading strategj. This is usually the result of inapecting the results of tests and noting that a apecific pattem was not freated properly (or profitably). After some work the analjst infroduces a apecial rule, which turns a previously losing situation into a profit. Figure 21-10 shows two possible pattems in the test performance pattem based on this change.

if the rule change improves one price pattem at the cost of others, the complete test results will appear as shown in Figure 21-IOa. Higher profits at the previous peak and greater losses at the ends result in the same cumulative test profitability Because the new

A thorough presentation of both step-forward testmg and optimum search techniques can be found in Robert Pardo, Design, Testing and Optimization of Trading Sjstems (John Wiley & Sons, 1992).

FIGURE2I-I0 Pattems resulting fran changing rules. (a)A rule change that improves one situation at the cost of others, (b) A rule change resulting in general improvement.



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [ 151 ] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205]