back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [ 36 ] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166]


36

where/(•) is the normal density function. Choosing and o2 to maximize L, or equivalently and more simply to minimize -21nL, yields the maximum likelihood estimates of these parameters, as described in Appendix 6.

In GARCH models the likelihood functions are more complex because the variance is time-varying (Engle, 1982; Bollerslev, 1986). For a normal symmetric GARCH model the log-likelihood of a single observation r, is, ignoring the term in ln(2u) because it does not affect the estimates:

/, = -i[lno-2 + (e2/a2)]

and HI, should be maximized with respect to the variance parameters. Denote the variance parameters by 9, so in the case of GARCH(1,1) the variance parameters are 9 = ( , a, P). Then the first derivatives may be written

Most univariate GARCH models should encounter few convergence problems if the model is well specified and the data are well behaved

5/,/59 = (l/(2a2))[(e2/a2)-l]g,

(4.14)

where the gradient vector is15

g, = do2 /59.

These derivatives may be calculated recursively, taking the ordinary least squares estimate of unconditional variance as pre-sample estimates of e2 and a2 in (4.14) and calculating the gradient vectors by the recursion

to,-

where z, = (1, s2 ,, o2 \). Solving the first-order conditions ,/ - 0 yields a set of non-linear equations in the parameters that may be solved using some quasi-Newton variable metric algorithm such as the Davidon-Fletcher-Powell (DFP) or the Berndt-Hall-Hall-Hausmann (BHHH) algorithm that is recommended by Bollerslev (1986). The BHHH iteration is

9,+ ] = 9,- + X,H, g,,

(4.15)

where X, is a variable step length chosen to maximize the likelihood in the appropriate direction, H, is the Hessian matrix S(g,g,) and g, = Eg,, both evaluated at 9,. The iteration is deemed to have converged when the gradient vector g is zero.

15The algorithm may take a long time unless analytic derivatives are used to calculate the gradient. This problem has limited the usefulness of -distributed GARCH models for very leptokurtic data, since they require numerical derivatives to be calculated at each iteration.



4.3.3 Estimation Problems

Sometimes convergence problems arise because the more parameters in the GARCH model the flatter the likelihood function becomes, therefore the more difficult it is to maximize

Sometimes convergence problems arise because the more parameters in the GARCH model the flatter the likelihood function becomes, therefore the more difficult it is to maximize. The likelihood function becomes like the surface of the moon (in many dimensions) so it may be that only a local optimum is achieved. In that case a different set of estimates may be obtained when the starting values for the iteration are changed (§4.3.2). In order to ensure that the estimates correspond to a global optimum of the likelihood function one would have to run the model with many starting values and each time record the likelihood of the optima. If this type of convergence problem is encountered one should use a more parsimonious parameterization of the GARCH model, if possible.

The boundary problem might be mitigated by changing the starting values of the parameters

Convergence problems with GARCH models can also arise because the gradient algorithm used to maximize the likelihood function has hit a boundary. If there are obvious outliers in the data then it is very likely that the iteration will return the value 0 or 1 for either the alpha or the beta parameter (or both). It may be safe to remove a single outlier (as in the Ford example given above) if the circumstances that produced the outlier are thought to be unlikely to happen in future. Alternatively, the boundary problem might be mitigated by changing the starting values of the parameters, or changing the data set so that the likelihood function has a different gradient at the beginning of the search. Otherwise the model specification will have to be changed. A sure sign of using the wrong GARCH model is when the iteration refuses to converge at all, even after you have checked the data for outliers, changed the starting values or chosen a different data period.

Most univariate GARCH models should encounter few convergence problems if the model is well specified and the data are well behaved. Changes in the data will induce some changes in the coefficient estimates, as was evident in the rolling estimates of GARCH parameters that were shown in Figure 4.8. However, if the model is well tuned the parameter estimates should not change greatly as new data arrive, except when there are structural breaks in the data generation process.

4.3.4 Choosing the Best GARCH Model

Which is the best GARCH model to use and when? The vanilla GARCH model already offers many advantages, even without asymmetric effects, and is widely used. But does it capture the right type of volatility clustering in the market? When should an asymmetric or non-linear or other more complex GARCH model be used?

The first question to answer for a chosen GARCH model is how well it models the conditional volatility of the process. If a GARCH model is capturing



volatility clustering adequately, the returns should have no significant autoregressive conditional heteroscedasticity once they have been standardized by their conditional volatility. An indication of the success of GARCH models to really capture the volatility clustering is that, even in very high-frequency exchange rate data where GARCH effects are strong and complex (§4.2.5), returns are nearly normally distributed when divided by their conditional volatility (Andersen et al, 1999a, 1999b).

In §4.1.1 we saw that standard tests for autoregressive conditional heteroscedasticity are based on autocorrelation in squared returns. Returns themselves may not be autocorrelated, but if volatility clustering is present in the data they will not be independent because squared returns will be autocorrelated. Therefore a simple test for a GARCH model is that the standardized returns squared, r*2 - r2/<52, where a2 is the estimate of the GARCH conditional variance, should have no autocorrelation.

Such tests may be based on an autocorrelation test statistic such as the Box-Pierce statistic described in §11.3.2. For large sample sizes T, the Box-Pierce test statistic Q~%j, s:

If a GARCH model is capturing volatility clustering adequately, the returns should have no significant autoregressive conditional

heteroscedasticity once they have been standardized by their conditional volatility

Q= r£cp(«)2,

where ( ) is the nth-order autocorrelation coefficient in squared standardized returns.

( )

l=n+\

*2 *2 rt ri-n

If there is no autocorrelation in the squared standardized returns the GARCH model is well specified. But what if several GARCH models account equally well for GARCH effects? In that case choose the GARCH model which gives the highest likelihood in post-sample predictive tests, as explained in §5.5.1 (see also Appendix 5).

4.4 Applications of GARCH Models

Whilst the square root of time rule might be a useful approximation to reality over very short-term horizons, the clustering of volatility means that there will be a large approximation error if this rule were to be applied over the longer term (§3.3). One needs a model that generates volatility term structures that converge to the long-term average volatility level as maturity increases, rather than the constant term structures that result from moving average volatility models. This is one of the main advantages of GARCH models.

Whilst the square root of time rule might be a useful approximation to reality over very short-term horizons, the clustering of volatility means that there will be a large approximation error if this rule were to be applied over the longer term



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [ 36 ] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166]