back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [ 152 ] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205]


152

I New results

rule caused the fitting of a apecific pattem at the cost of added losses in other patterns, it should not be considered an improvement.

In Figure 21-lOb, the new rule improves performance in all cases. This tjpe of pattem is desirable in optimizations. It is possible, of course, that the improvement was caused by a rule that corrects one case only in a way that is so apecific that it does not affect any other trade. That tjpe of rule fitting, on a sanple of one pattem, is likely to harm results in the long run.

ARRIVING AT VALID TEST RESULTS

It is not unusual for the results of an optimization, eapecially one with many parameters, to appear perfect. All the trades could be profitable, the equity drops might he small, and the strategj might perform in changing martlets, and yet, the final sjstem might have no real expectation of profit. To create a sjstem with pretKctive qualities, rather than one that is historically fitted, requires preparation in advance of testing.

You Cant Prove a Sjstem (or IntKcator) Wori by One Case

You can alwajs select a moving average or intKcator gave a sell signal in the S&P just before the 198 cradi. Thats not difficult when there are thousands (or millions) of combinations of trend apeeds, filters, momentum intKcators, (Kvergence rules, and so on. However, what works in a apecific case, and often an unusual one, is not likelj to work in general. Manipulating the rules is only a waste of time. Showing that a combination of technical intKcatcrs would have profited from the last maiket crash has no bearing on the next drop.

The addage that "there are lies, half lies, and statiatics" is too true. Statiatics can prove a point or prove just the opposite, depending on how the numbers are presented. A mar

ket can be in an uptrend and a downfrend at the same time, based on the time interval over which you view it. You must be constantly aware of those problems that you cannot seethe omissions

It will be necessaij for you to draw your own picture of sjstem performance, "at is the chance of a big profil this year or this week? What is ttie worst-case scenario? W-11 the method survive a sha change in market volatilitj? Can 1 choose a long-term or short-term horizon with more confidence? Is an 8-dsy moving average really better than a 10-dsy breakout? The following guidelines will get you started by giving you one way to look at test results. From this



you will be able to develop, over time, your own method of evaluation. As jack Schwager has stated, "The mistake is extrapolating probable future performance on the basis of an isolated and well-chosen example from the past."

Searching for Robustness

A robust sjstem is one that performs consistently in awide variety of situations, including pattems that are atill unforeseen. From a practical view, this translates into a method that has the fewest parameters and is tested over the most data. In a test by Futures Truth, the best performing sjstems commonly have four or less optimized variables. The characteriatics of a sjstem with forecasting ability are:

1. It must be logical. Each rule and formula must be designed to capitalize on a real fundamental or price phenomenon. Discovering a price pattem or cycle through optimization may seem to be a revelation, but it is more likely to be an illusion. By testing enough pattems, it is statistically probable that one of them will seem to fit. Without a fundamental reason for the existence of that pattem, it is not safe to use it.

2. It must adjust to changing market conations. A sjstem that assumes that maikets dont change, or thai everjthing has been seen in past data, will suffer laige equity swings. Self-adjusting features might include an inflator or deflator, stop-loss values that change with volatility, and rule shifts based on seasonal or nonseasonal years.

3. It must be tested properly. The principles of statistics state that the best tests use the most data. More data includes bull, bear, and sidewajs markets; laige and small price shocks; and periods of instability and doldrums. There is no subatitute for more data. Proper test procedures also suggest that you reserve some unseen data until the very end, so that you can validate your work on an outof-sample period. If that is not available, you can ahvajs paper trade until you can compare the accumulated trading profile with the expectations defined by your tests.

Some analjsis prefer a technique called blind simulation, commonly known as stepforward testing. In this procedure, discussed in a previous section, a series* of fixed-length tests are defined and the parameters found to be optimal are used to trade using data in the next period forward. The process then moves forward by dropping off old data and adding an equal amount of new data. The results of this second test are again traded on forward data. The method is continued until the out-ofsample results include all of the available test data. This is an important concepl and is also (Kscussed in the section "Reoptimization" later in this chapter.

From time to time each of the three characteristics above will seem to be the most important. In the long run they must all be satisfied to have a robust program.

J.ackD ScLwer.ScbwBger on Futures TecLmcalAi, -sis.L!, .ley.&Sons.l 5,p ( S) 4..1 Hill, "Simrle vs coupleK," Futures .ırcL 1 6,p57.

Performance Criteria

Assuming that the proper principles have been followed, are the test results good enough? Although the strategj shows steadj profits with a low equity drawdown, could a naive sjstem have done better? Did the famous skeptic on television throw dam at buy and sell signals on a board and show a 75>o retum while your sjstem only netted 250

Benchmarks

It IS necessary to have a benchmark that provides a way of measuring success, it is best if this is a well-documented intKcator, such as the S&P Index, the Lehman Brothers Treasury Index, a class of Fund Managers, or the CommotKty Trading Advisor Index. Newspaper articles that highlight the apectacular profits or losses of one manager is not a good benchmaik, but simply ex post selection (Jiindsight).

it may seem desirable to have achieved the recognition for having the largest gain during a single month, but you should focus on having very good gain over each year. The largest gain can only come with high risk. You should not be surprised if those investment advisor-s posting remarkably high retums in one month have had very erratic performance overall.

Measuring Test Results



number of perfomiance criteria are needed to evaluate any tradmg strategj durmg the test phase and to compare results under actual martlet conations.

1 . Net profits or losses. Although not the most interesting statiatic, the net profit is the motivation for trading. While you would not select a sjstem that produces a net loss, it is the other statistics that will tell which of the better results, if any, are realiatic

2. Number of trades. This simple value intKcates whether your test was long enough to depend on the results. A few trades can appear very profitable, but the trading profile is not yet clear.

3. Percentage of profitable trades. Also called reliability, a high value of 60°o tells you that the method captures profits regularly. A trend sjstem will be woridng correctly if its reliability is near 40°o.

4. Average net retum per trade. With or without commissions and slippage costs, the average return per trade gives you an intKcation of how difficult it win be to realize the sjstem retums. A theoretical average of $50 per trade for a currency is likely to net less than one-half after slippage in normal martlets, and when a U.S, economic report is released, slippage alone could be $500 on a single trade.

5. Maximum drawdown. The largest equity swing from peak to valley, this measurement can be very erratic and is not likely to be the largest drawdown seen in the future; however, it gives you some idea of the minimum capital needed to trade this market. If the value is very small, it is likely to be based on a small amount of data, too many specffic rules, or a narrow range of test parameters.

6. Annualized rate of retum. The rate of retum is the profit for a predetermined investment. The investment can be calculated in reverse by knowing the funds-at-risk to equity ratio, the maximum drawdown, or standard deviation of expected retums. These retums should be annualized to compare one test, market, or benchmark with another.

7. Total profits to total losses. This simple ratio gives a reasonable measure of the smoothness of the equity curve. As the ratio increases, the proportion of losses declines and the equity curve becomes smoother. Any value over 2.0 is very good.

8. Time to recovery. A laige drawdown may be inevitable in a realiatic sjstem, but a shorter time to recovery is most desirable. A laiger drawdown with a much faster recovery seems to be a better trade-off for mosl investors.

9. Time in the market. All else being equal, a trading sjstem that is in the maiket less than another sjstem is preferable, if two sjstems have approximately the same rdums and ride then the one that has more time out of the market is actually exposed to less risk.

10. Smoothness of retums. In adtKtion to the mtKvidual measurements described above, some form of traditional equity smoothness is desirable. This can be the standard deviation of the residuals when the linear regression of the retums is calculated, or it can be some weighted combination of the previous measurement. This measurement should reflect the tjpe of equity profile you seek.

Having decided the measurements needed to evaluate the results of the testing, some other procedural issues must be resolved before testing begins. This will help to avoid bentKng the rules to fit the results.

Defining the test range. Each optimization run should represent a broad sampling of tests over a wide range of parameter values This range should be established in advance based on expectations of what is reasonable. If the returns in this range are not profitable, you must first understand why your ideas have not worked before looking for other periods in which the sjstem might show profits. A thorough review of your trading rules may show that you have defined them wrong, not limited them to the best situations, or simply progranmied them incorrectly.

Use the average of all test results. Most tests will show both profits and losses, with some areas of very attractive profits. If you tested 100 cases and 30 showed profits of about 3046 per year, 30 showed break-even results, and the lasl 40 showed various losses, you might say that the 30 profits represented a successful sjstem. That assumes that the market will continue to perform in a way that allows ttiose 30 parameter combinations to generate profits during the



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [ 152 ] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205]