back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [ 147 ] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205]


147

21 Testing

Testmg a trading sjstem was once a „cry simple process. Not so long ago it was difficult to get data, you needed to apply your rules manually to eadi price, you kept a handwritten record of your trades, and it took a long rime. You were careful not to begin unless you were reasonably sure that the method had a good chance of success; this was based on an understanding of the fundamentals or market experience.

During the past 20 years that process has changed, not ahvajs for the better. Those who have applied the same diligence and selectivity to the automation of sjstems are likely have gotten better results and more usefiil information than ever before. Others who have become careless because of the power and convenience of trading strategj-developmerit software are not having much success. Bigger and faster computers do not mean better results.

Computers give back as much as you put in. The tools alone do not result in a successful tradmg program. The thought process and creativity, are needed to be competitive in todays maiket, although thoroughness and focus are qualities that can never be underestimated. Certainly, computers allow us to solve problems of magnitude and complexity that we could never have considered before, and do it in a wav that is easj to understand. We can manipulate incredible amounts of data and find out not only, whether our ideas are sound, but the risks associated with various events.

One of the most interesting techniques to develop fi-om computerization is optimization. It is not the mathematical process of finding a local maximum in a field of continuous contours; instead, it is the iterative process of repeatedly testing data to find the single best moving average speed, point-andfigure box size, stop-loss size, or other information. That is not to imply that it could not be more. Optimization is an irresistible method when a computer is available, and, in an indiscriminate way, it has replaced logical selection, it is frequently the means of creating strategies referred to as black box methods, and often results in fine tuning, which produces sjstems that invariably, generate high expectations but poor actual performance

Computers present a serious, often futile, dilemma for the analy st if a trend-following sjstem is the objective and some form of smoothing technique is selected (a simple or stepweighted moving average, exponential smoothing), how do You select the right apeed-. Twenty years ago, the 5-, I0-, and 20-day moving averages -,N-erc most popular because they represented a I-week, 2-week, and I-month time interval. They, were also easj- to calculate. In fact, the lO-day moving average was the mosl common because no division was necessary to arrive at the average Unfortunately, that simple approadi no longer works.

Consider what is necessary to create a trading strategj-. For a trend sj stem, once the trend period has been logically selected (e.g., lO-day.) and the trading rules determined (such as a stop-loss), you will want to know the performance of that choice over past markets. If it proves successful, you may trade using that method: however, if it does not nicer expectations using the test data, what do you do? With a computer, the answer is easj tn another trend apeed and see if the results are better. This follows a perfectly natural progression after all, who would use a sjstem today that has failed to be profitab Ic in the past-,

It is obvious that a trader with a talent for computers who has purchased a TradeStation, MetaStodc or another software package for testing, will take an organized approadi to experimenting with a broad selection of trend apeeds, box sizes, stop-loss points, and trading rules to arrive at a sjstem with an excellent trading profile. When no satisfactory combination can be found for a apecific market, that market m-ill not be traded. This method of optimization has become a dominant factor in the development of trading sjstems; it is entirely dependent on having a computer.

The following sections will discuss various aspects of testing and optimization. Performed properly, testing can teadi a great deal; done incorrectly, it can be misleading and completely illogical. It is not ahvajs easj to know which case has just been satisfied.

EXPECTATIONS

Expectations are an underrated part of sjstem development, it forces you to define your ideas in advance and put substance into those plans. You need to decide the rules, time period over which the sjstem should work, the relative risk, the proportion of profitable trades, and other characteristics. The importance of this comes when you see actual test results.

Testing should be the process of validating your ideas. That first requires that you define those ideas in a clear plan and decide whether it should work for hourly, daily, or weekly intervals. If the test results confirm your ideas, then you can have confidence in the strategj. if they differ, you know that something is wrong, either with your plan or the way it was entered into the computer. Without expectations, there can be no validation or confirmation; therefore, if



you indiscriminately test all indicators combined together, you have no idea whether you have found a good idea or simply overfit the data.

Setting Your Objective

There are a number of steps that must he carefully performed before testing can begin. These steps are more important than the actual testing and m-ill decide what you are looking for and how you will use the results. Correctly defined, the final sjstem will have realistic goals and predictive qualities. Incorrectly done, it will look successful bul fail in actual trading.

First, define the test objective. What results should the test present so that you can determine its success? Is il the highest profits possible from the test strategj. the frequency of profitable trades, or the average-profit to average-loss ratio? Testing software gives you a choice, but doesnt allow a combination of these items; most likely, it will be a combination that you want. For example, if maximum profits are used as the performance criteria, whidi is most often the default case, the resulting sjstem may have one or two large profits (as Iraqs invasion of Kuwait in 1989 and the U.S. retaliation in.Januarj 199 1) and an overwhelming number of small losses before and after it. The same performance would also show a very impressive average-profit to averageloss ratio. The highest frequency of profitable frades can also be inadequate if there are very small profits and large losses.

A popular way to view results is to look for a test profile with high profits and low risk. If we simply choose the largest equity drop as a measure of risk and net profits as a reward, a measurement fundion for best results could he

Test funrtion = net profits x (100 - percentage equity drop)

which will reduce profits according to the size of the equity drop. The larger the losses-, the smaller the tesl fundion value. A higher test fundion value should yield a smoother performance.

IDENTIFYING THE PARAMETERS

Once the test sfrategj is known, the parameters to be tested must be identified. A parameter is a value within the strategj that can be changed to varj the timing of the sjstem. For , parameters include:

1. Movmg average speed (in hours, dajs, weeks)

2. Rxponential smoothing constant (in percentage)

3. Bandwidth around a moving average (number of standard deviations)

4. Stop-loss values (in points or percentages)

5. Size of a point-and-figure box (in points)

The simplest optimization would be a test of the number of dajs in a moving average sjs tem, where all other values are fixed. The test would then simulate the results of the trading strategj by stepping through the possible range of moving average days: 1, 2, 3, and so forth, until a satisfactory range has been covered. This is called a 1-diiriensional optimization.

Most sjstems have more than one important parameter. In the moving average sjstem that is being used as an example, both the moving average period and the stop-loss value are important. One test procedure seleds the firsl moving average value, then tests all of the stop-loss values; the second moving average value is then set and all of the stop-loss values are retested. The process is repeated until all of the moving average values have been tested. This is a 2-dimensional optimization.

Disfribution of Values to Be Tested

When more than two parameters are tested, the test time for the optimization may increase dramatically. For example, a test of 100 moving average values, 20 combinations of entry, exit bands, and 20 stop-loss values gives 100 x 20 x 20 = 40,000 tests. Even with today . S faster computers, thats going to take more than 50 hours if eadi test takes 5 seconds, it may be necessary to seled fewer values to test for each parameter and choose them more carefiilly. For a moving average frend, it is best to use all of the smaller values for tiie number of dajs and then fewer values as the numbers become larger. For exanple,

I, 2, 3,4, 5, 6, 8,10,12,15, 20, 25, 30,40, 50



would be a better set of values than usmg all numbers between 1 and 50. If you observe that the difference between a 49- and 50-day period is only lo, while the difference between the 2- and 3period test is 50° o, you realize that the set of tests at the two ends of the test series represents very different situations. If you expect to judge overall performance by the average of all tests, then using equal increments will weight this set of tests heavily toward the long end. By reducing the tests using percentage increments, you not only improve the test time but you improve the results. The moving average example shows that it is not possible to equalize the percentages due to ttie restriction of integer values at the low end, but by replacing this method with a smoothing constant used in an exponential smoothing, the problem can be fixed. The need to dishibute results evenly over a set of tests will be important for comparing two sjstems and finding robust results, topics discussed later in this chter.

Tjpes of Test Variables

Test variables can be of three forms: continuous, discrete, or coded (alphabetic). To interpret and diaplay test results properly, it is important to identify these forms in advance.

Continuous parameters refer to values, such as percentages, which can take on any ractional number within a well-defined range, if a stop-loss level is defined as a per

centage, it may be tested beginning with the fractional value .02>o and increasing in steps of005>o until 2.0°o is reached.

Discrete parameters are whole numbers, or integer values, such as the number of dajs in a moving average. Coded parameters represent a category of operations, also called a regime. For example, when the parameter value is A, a single moving average is used; when the value is B, a double moving average sjstem is tested; and when the value is C, a moving average and a breakout are used.

It is important to distinguish between the first two parameter tjpes, continuous and discrete, and the last one. The analjst can expect some pattem in the results when testing paramders that take on progressively larger or smaller values. The coded parameter, however, usually causes rule changes. There is no reason why a change of rules, which causes the sjstem to switch from one regime to another, would result in any performance pattem that makes sense across these regimes. The first rule may be profitable, the second losing, and the third profitable. The display of results, discussed later, is only valid for continuous and discrete parameters tested in an incrementally ascending or descending manner.

SELECTING THE TEST DATA

Testing a trading sfrategj on a computer is different from verifying a charting technique or taking prices from the Wall Sfreet Joumal to check results manually. If a computer is to be used for testing, there must be a complete database of price history and, if more ambitious recently, of economic statistics as well. This is readily available from numerous vendors and normally includes daily price data on U.S. and major European markets. When you purchase software to test your strategies, a database of daily prices is usually included, infraday (tick) data is limited to only a few vendors, the most well-known being CQG and Tick Data, Inc. A complete database is an important asset. It should be kept current by automatic downloading at the end of each day.

Some computer sjstems, which are designed for frading strategies, have created apecial continuous test series apecifically for optimization. While stock data or cadi market prices are ahvajs continuous (except, of course, for stock aplits), futures and options data present problems resulting from their limited confract life To make this easier for testing, some vendors provide apecial features for combining shorter data periods into a single series. This gives the user the following choices:

1 . Individual full contracts. The entire futures contract, or more than one contract, are tested. Usmg full contracts is a tedious process and can result in overlapping test periods and duplications of results. Individual tables of results must be combined in a apreaddieet to be evaluated. Figure 2 I-I gives an example of the tesl bias resulting from carelessly using all contracts within the range December 1992 through December 1996. Once the number of tests being performed at any one time is seen, it is easj to conshnct a disfribution (Figure 2 I- lb) by counting the horizontal bars at apecific points. This disfribution will show the testing bias due to duplication Because of ttie I8-montti life of ttie confracts, ttiere was only I delivery montti tested in ttie last half of I99I,



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [ 147 ] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205]