back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [ 45 ] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166]


45

0.22

-f(x)-gOO

Figure 5.2 Volatility and the likelihood.

Different volatility forecasting models may be ranked by the value of the out-of-sample likelihood, but the effectiveness of this method does rely on the correct specification of the return distributions. Generally speaking, we assume that these return distributions are normal, but if they are not normal then the results of out-of-sample normal likelihood tests will not be reliable. If likelihood criteria are to be used it is advisable to accompany results with a test for the assumed distribution of returns (§10.1).

Much of the literature on volatility forecasting uses a root mean square error (RMSE) criterion instead of a likelihood (§A.5.3). But while a RMSE may be fine for assessing price forecasts, or any forecasts that are of the mean parameter, there are problems with using the RMSE criterion for volatility forecasting (Makridakis, 1993). In fact, the minimize the RMSE criterion is equivalent to the maximize the likelihood criterion when the likelihood function is normal with a constant volatility.7 Hence RMSEs are applicable to mean predictions, such as those from a regression model, rather than variance or covariance predictions.8

Not only is the RMSE criterion applicable to means rather than variances, one statistical performance measure that has, unfortunately, slipped into common

7To see this, suppose returns are normal so (from §A.6.3) the likelihood L is most easily expressed as: -2 In L = (2 ) + 7Tn cr + E(.v, - u)2/cr.

Now maximizing L is equivalent to minimizing -21nZ., and when volatility is constant this is equivalent to minimizing £(.v, - u)2. This is the same as minimizing V(E(.y, - u)2), that is, the root of the sum of the squared errors between forecasts .v, and the mean.

8Of course, a variance is a mean, but the mean of the squared random variable, which is chi-squared distributed, not normally distributed, so the likelihood function is totally different and does not involve any sum of squared errors. Many thanks to Peter Williams for explaining these issues during enlightening discussions when we were colleagues at Sussex University.



use is an RMSE between a volatility forecast and the realized volatility, which is just one observation on the process volatility. As a statistical criterion this makes no sense at all, because the correct test is an F-test, not an RMSE.9 In fact the only justification for using the RMSE between a forecast and the ex-post realized volatility is that it is a simple distance metric.

Notwithstanding these comments, a popular approach to assessing volatility forecasting accuracy is to use the RMSE to compare the forecast of variance with the appropriate squared return. The difference between the variance forecast and the squared return is taken as the forecast error. These errors are squared and summed over a long post-sample period, and then square-rooted to give the post-sample RMSE between the variance forecast and the squared returns. However, these RMSE tests will normally give poor results, because although the expectation of the squared return is the variance, there is a very large standard error around this expectation. That is, the squared returns will jump about excessively while the variance forecasts remain more stable. The reason for this is that the return r, is equal to a,z„ where z, is a standard normal variate, so the squared return yields very noisy measurements due to excessive variation in z2.

Another popular statistical procedure is to perform a regression of the squared returns on the variance forecast. If the variance is correctly specified the constant from this regression should be zero and the slope coefficient should be one. But since the values for the explanatory variable are only estimates, the standard errors-in-variables problem of regression described in §A.4.2 produces a downward bias on the estimate of the slope coefficient.

The R2 from this regression will assess the amount of variation in squared returns that is explained by the successive forecasts of a2. However, the excessive variation in squared returns that was mentioned above also presents problems for the R2 metric. In fact this R2 will be bounded above, and the bound will depend on the data generation process for returns. For example, Andersen and Bollerslev (1998) show that if returns are generated by the symmetric GARCH(1, 1) model (4.2), then the true R2 from a regression of the squared returns on the variance forecast will be

Relation (5.1) provides an upper bound for the R2 for GARCH(1, 1) forecasts, and similar upper bounds apply to other standard forecasting models. Table 5.1 shows how the true R2 varies with some common values for the estimates of a and (3. Most of the R2 are extremely small, and the largest value in the table is around 1/3, nothing like the maximum value of 1 that one normally expects with R2. Therefore it is not surprising that most of the R2 that are reported in

The only justification for using the RMSE between a forecast and the ex-post realized volatility is that it is a simple distance metric

R2 = a2/(l - p2 - 2 ).

(5.1)

9Hypothesis tests of the form : aA = aB would be relevant; that is, to test whether the process volatiliu underlying the forecast is the same as the process volatility that generated the realization we have observed, ex post. Therefore an F-test based on the test statistic \1< \ f°r tne equality of two variances would apply.



The fact that the R2 from a regression of squared returns on the forecasts of the variance is low does not mean that the model is misspecified

Alpha

Beta

Alpha

Beta

Alpha

Beta

0.05

0.85

0.0130

0.075

0.83

0.0301

0.0500

0.05

0.86

0.0143

0.075

0.84

0.0334

0.81

0.0550

0.05

0.87

0.0160

0.075

0.85

0.0375

0.82

0.0611

0.05

0.88

0.0182

0.075

0.86

0.0428

0.83

0.0689

0.05

0.89

0.0210

0.075

0.87

0.0500

0.84

0.0791

0.05

0.0250

0.075

0.88

0.0601

0.85

0.0930

0.05

0.91

0.0309

0.075

0.89

0.0756

0.86

0.1131

0.05

0.92

0.0406

0.075

0.1023

0.87

0.1447

0.05

0.93

0.0594

0.075

0.91

0.1589

0.88

0.2016

0.05

0.94

0.1116

0.075

0.92

0.3606

0.89

0.3344

the literature are less than 0.05. Earlier conclusions from this literature, that standard volatility models have very poor forecasting properties, should be reviewed in the light of this finding. The fact that the R2 from a regression of squared returns on the forecasts of the variance is low does not mean that the model is misspecified.

5.1.2 Operational Criteria

An operational evaluation of volatility and correlation forecasts will focus on the particular application of the forecast. Thus any conclusions that may be drawn from an operational evaluation will be much more subjective than those drawn from the statistical methods just described. The advantage of using an operational criterion is that the volatility forecast is being assessed in the actual context in which it will be used. The disadvantage of operational evaluation is that the results might imply the use of a different type of forecast for every different purpose.

Some operational evaluation methods are based on the P&L generated by a trading strategy. A measurement of trading performance is described in §A.5.3 that is relevant for price forecasting, where an underlying asset is bought or sold depending on the level of the price forecast. A performance criterion for volatility or correlation forecasts should be based on hedging performance (Engle and Rosenberg, 1995) or on trading a volatility- or correlation-dependent product.

The metric for assessing a forecast of implied volatility could involve buying or selling straddles

For example, the metric for assessing a forecast of implied volatility could involve buying or selling straddles (a put and a call of the same strike) depending on the level of the volatility that is forecast. Straddles have a V-shaped pay-off and so will be in-the-money if the market is volatile, that is, for a large upward or downward movement in the underlying. The forecast of

Table 5.1: R2 from regression of squared returns on GARCH(1, 1) variance forecast



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [ 45 ] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166]