back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [ 48 ] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150]


48

Figure 8.3 Dividing the data set into three groups: in-sample, tuning, and out-of-sample.

In-Sample data set

Tuning Data Set

Out-of-Sample data set

1982 1983 1984 1985 1986

1987 1988 1989 1990 1991

1992 1993 1994 1995 1996 1997

You can think of the tuning set as a psychological crutch. If, after having spent considerable intellectual, emotional, and psychological energy developing a trading strategy, you discover that it has a simple flaw when you run it on the out-of-sample data, then your options are limited. If you choose to retool the trading strategy, then your estimates of future performance will be inaccurate because the out-of-sample data set has been tainted. You will have to live with the uncertainty that you may have over-fitted, overoptimized, and "shrink-wrapped" your trading strategy around the out-of-sample data. The benefit of using a tuning set is that it gives you a second chance.

If you decide to use a tuning set in your backtesting, then the standard method is to select a period of time between the in-sample data set and the out-of-sample data set as the tuning set. For example, if you are developing a strategy for the S&P 500 futures, then your in-sample data set might range from 1982 to 1986, your tuning set might range from 1987 to 1991, and your out-of-sample data set might range from 1992 to 1997, as shown in Figure 8.3.

leave-one-out Testing

For the data partitioning strategy previously discussed, the three data sets are chosen in advance and are held fixed through the creation and testing of the trading strategy. In contrast, leave-one-out testing forces the out-of-sample data set to shift throughout the entire data set. With this method, the historical data set is first divided into many groups. One of these groups is temporarily assigned to be the out-of-sample set and the remaining groups become the in-sample set. Next, a trading strategy is partially developed for this particular assignment of the data groups. The cycle repeats as

Table 8.1 Data group allocation in leave-one-out system testing

Data Groups Assigned to Be In-Sample Data Groups Assigned to Be Out-of-Sample

1 2 34 5

1 2 3 5 4

1 24 5 3

1 34 5 2

2 34 5 1



another group is chosen to be the out-of-sample data set, the rest assigned as in-sample, and partial system development continues some more. The process repeats until all the data groups have been used as the out-of-sample data set.

An example serves to illustrate the point. Suppose you divide the historical data into five groups. Then the in-sample and out-of-sample sets are shown in Table 8.1.

The Markowitz/Xu Data Mining Correction Formula

The Markowitz/Xu data mining correction formula is a little known but extremely powerful formula that corrects for overoptimization. You should use the Markowitz/Xu data mining correction formula whenever the out-of-sample data set has been tainted by running more than one trading strategy through it. The Markowitz/Xu formula gives an estimate of the actual return of the top-performing strategy. The intuitive idea behind the formula is that the return of the top-performing strategy should be adjusted in the direction of the average return of all the trading strategies that were run through the data set. The form of this adjustment is given by the following formula:

H = R + B(H-R)

where . . .

H is the new estimate of the return of the best strategy, R is the average return of all of the strategies, H is the return of the best performing strategy, is a number between 0 and 1.

Notice that H will always be between the average return of all of the trading strategies and the return of the top-performing strategy. When = 0 then H = R + 0 (H-R)=R. When B = l then H = R + 1 (H-R) = R + H-R = H. The critical question is: What is the formula for B? Intuitively, should grow smaller as the number of trading strategies grows larger. The smaller is, the closer will move to R, the average return of all of the trading strategies. In addition, should grow smaller as the variance of the returns increases. The formula for is:

- N/D

where

N = C(1 + l/(n- 1)+ 1/(T- 1))-A/(T- 1) D = C(1 + l/(n- 1))



where

A is the average squared difference of the daily returns of each strategy and the

avetage daily tetutn of all sttategies, is the avetaged squared difference of the avetage daily tetutn of each sttategy

and the average daily return of all of strategies, n is the number of trading sttategies,

7* is the number of days for which you have computed daily tetums

Spreadsheet Example

Figute 8.4 shows a spreadsheet that contains an example of the Markowitz/Xu data mining correction formula in action. The first 18 columns of the spreadsheet are organized as follows:

• Column A. Date.

• Column B. The daily returns of Strategy 1. For example, on 10/7/97, the daily return of Sttategy 1 is -.005 oi -.5%.

Figure 8.4 An example spreadsheet for calculating the markowitz/xu correction.

A 1

I

Strategy 1 Strategy 2

Strategy 3

loo sir 1

loo str2

toostr3

strl -avo

Str2 - avo

str3-avo

<StTl-.VH)"*

(«tr2-avgV2

(st/3-avo)*2

10/7

-0.0050 -0.0040

0.0037

-0.0050

-0.0040

0.0037

4)0050

-0.0040

0.0036

2.5E-05

1.6E-05

1.3E-05

10/8

0.0004 0 0049

0.0010

0 0004

0.0048

0.0010

0.0004

0.0048

0.0010

1.8E-07

2.3E-05

9.9E-07

10/9

-0.0011 0 0002

0.0017

-0.0011

0 0002

0.0017

-0.0012

0 0001

0.0018

1.4E-06

1.6E-06

2.8E-06

10/10

-0 0047 -0.0026

0.0041

-0.0047

-0.0026

0.0041

-0.0048

-0.0026

0.0040

2.3E-05

6.8E-06

1.8E-05

10/11

-0.0O37 0.0027

0.0041

-0.0037

0.0027

0.0041

•0.0037

0.0027

0.0041

1.4E-05

7.2E-06

1.8E-05

10/15

•0.0021 -0.0036

-0.0026

-0.0021

-0.0038

-0.0028

-0.0021

-0.0039

-0.0029

4.8E-06

1.5E-05

8 3E-06

10/18

0.0009 0 0050

0.0030

0.0009

0.0050

0.0030

0.0009

0.0049

0.0029

8.0E-07

2.4E-05

8.8E-06

10/17

0.0O07 -0.0050

-0.0016

0.0O07

-0.0050

•0.0018

0.0006

-0.0051

-0.0017

3.9E-07

2.6E-05

2.8E-06

10/18

-0.0021 0.0006

0.0016

-0.0021

0.0006

0.0018

•0.0022

0 0005

0.0015

4.8E-06

2.8E-07

2.3E-06

10/21

-0.0044 -0.0036

0.0030

-0.0044

-0.0036

0.0030

-0.0045

-0.0037

0.0030

2.0E-05

1.3E-05

8.9E-06

10/22

0.0010 0 0009

-0 0041

0.0010

0.0009

-0.0041

0.0010

0 0008

-0.0042

9.8E-07

6.8E-07

1.7E-05

10/23

0.0020 0.0022

0.0044

0.0020

0.0022

0.0044

0.0020

0.0022

0.0043

3.9E-06

4.6E-06

1.9E-05

10/24

0.0030 -0.0010

-0.0009

0 0030

-0.0010

-0.0009

0 0029

-0.0011

-0.0009

8.7E-06

1.2E-OS

8.3E-07

10/25

-0.0030 0 0022

0 0040

-0.0030

0.0022

0 0040

-0.0031

0.0022

0.0040

9.5E-06

4.7E-06

1.8E-05

10/28

-0.0049 -0.0029

0.0025

-0.0049

-0.0029

0 0025

0 0050

-0.0029

0.0024

2 5E-05

8.5E-06

5.8E-06

10/29

-0.0013 0.0046

-0.0010

-0.0013

0 0048

-0.0010

•0 0013

0.0045

-0.0011

1.7E-06

2.0E-05

1.2E-06

10/30

0.0012 0 0000

0.0028

0.0012

0.0000

0.0028

0.0011

0.0000

0.0027

1.3E-06

9.0 -

7.3E-06

average dally return of Individual strategies

-0.0014

0 00002

0.0015

Individual

average - global average

-0.0014

-0.00003

0.0014

(Indlvidua

average - global average)"2

t 4£-

B.629E-10

2.0S7E-O3

numbar of strategies

number of days

5.13E-05

average dally ratum of all strategies (global average)

9.11E-OS

1.35E-06

2.03E-06

1.54E-06

Beta

0.7607

Adjusted return (dally)

0.00115

1.00115

Adjusted return (annual)

1.33144



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [ 48 ] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150]