back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [ 81 ] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134]


81

example, longer-term traders. Fourth, at extremely high frequencies, FX rates exhibit distinct microstructure effects due to the price formation process as studied in Chapter 5.

In this section, we investigate the importance of this heterogeneity for the modeling of the foreign exchange (FX) markets using the GARCH setting. More specifically, we show that estimates of a GARCH process with data in physical time are likely to be spurious, even though estimates for one particular frequency seem to be reasonable. Estimates are only consistent when the seasonal patterns are taken into account. However, even when these seasonal patterns are accounted for, the aggregation properties of the GARCH model break down at the intradaily frequencies, revealing the presence of traders with different risk profiles. In addition to the presence of different trader categories, we observe microstructure effects when analyzing returns over time intervals shorter than about 90 min. At the other extreme, the instability of coefficient estimates over different subperiods of 6 months suggests the presence of seemingly random long-term fluctuations. Finally, these misspecifications of the GARCH process result in its quite poor out-of-sample predictive power for the volatility as compared to realized volatility.

8.2.1 Parameter Estimation of GARCH Models

The GARCH(1,1) process is defined as follows:

o-,2 = a0 + ] + B\ot x (8.2)

where rr2 is the conditional variance and sj is the squared innovation.

To test the effects of the temporal heterogeneity of the markets, this GARCH(1,1) process is estimated for several frequencies. The lowest analyzed frequency is daily and the highest frequency is defined by a homogeneous time series with 10-min intervals. At the higher frequencies (intervals less than 2 hr), we include a fourth-order autoregressive (AR(4)) term iit = E/=i 4>iri-i m Equation 8.1 to account for the statistically significant (negative) autocorrelation of the returns at these frequencies (see Section 5.2.1). The regression equation for the return process is

rt = iit + e, (8.3)

At lower frequencies such a term is not needed, and we use the process of Equation 8.1.

The parameters of the process are estimated as follows. Let denote the set of parameters characterizing the process. Assuming that the innovations et are normally distributed, the log-likelihood function is

£((?) =-] (2 )-1 ( 2) + (8.4)

where the index t has been substituted by /. The number of observations used for the estimation is . An initial fraction of data must be reserved and used for the



build-up ofcr?, because of the memory of the volatility process. An estimate 6 for the parameters is given by the solution of the maximization problem

max C(6)

The log-likelihood procedure has many desirable properties.2 The solution is independent of the coordinate system in which the parameters are defined, such that the estimation can be done in any parametrization and the results will be identical, up to the chosen parameter transformation. This property is true for finite samples and any data set, assuming a non-degenerate maximum. Even if the process is misspecified (i.e., the data were not generated by the estimated process), the maximum is identical in any coordinate system. Estimating GARCH processes by maximum likelihood is difficult because of the presence of a one-dimensional manifold in the parameter space where the likelihood function is large and almost constant (for a discussion of this point and a good practical solution using the property mentioned above, see Zumbach, 2000).

The assumption of conditional normality can be relaxed by assuming a Student-? distribution for e, (Baillie and Bollerslev, 1989) or the generalized exponential distribution (Nelson, 1991). Both of these distributions have fat tails. In the case of the Student-? distribution, the log-likelihood function takes the following form:

where v is the number of degrees of freedom of the Student-? distribution and is the usual gamma function. Both forms of the log-likelihood function are valid for any process following Equation 8.1, not only GARCH but also the process we shall study in Section 8.3.1.

The maximum of the likelihood function is found by an iterative procedure that combines two methods: a genetic algorithm (GA) (Goldberg, 1989; Pictet et al, 1995) and the Berndt, Hall, Hall, and Hausman (BHHH) algorithm (Berndt et al, 1974) which is a variant of the gradient descent method. The initial solutions are chosen randomly to avoid any a priori bias in the estimation and stored in "genes," which form an initial population. Starting from this population, the genetic algorithm constructs a new population using its selection and reproduction method (Pictet et al, 1995). The solutions with the highest log-likelihood found by the genetic algorithm are used as starting points of the BHHH algorithm, which leads to a further improvement. Once convergence of the BHHH is achieved, the next generation of the GA is computed on the basis of the previous solutions

2 See Davidson and MacKinnon (1993) for a general reference.

cm = -§

-f [ln(v - 2) + 2 In [ - /2 (f)] - In 4Er=i[ln() + (v+l)In(l +

(8.5)



obtained with the BHHH algorithm and a set of solutions from the previous generation. This iterative procedure continues until no improvement of the solution is found. The BHHH algorithm alone can be trapped in local maxima of the log-likelihood instead of finding the global maximum. The chosen combination with a genetic algorithm has the advantage of avoiding local maxima. The method is rather fast, notwithstanding the very large number of observations (368,000 data points for the 10-min frequency). Robust standard errors are computed using the variance-covariance matrix estimation of White (1980).

8.2.2 Temporal Aggregation of GARCH Models

If the empirical data can be described as generated by one GARCH(1,1) process at one particular data frequency, the behavior of the data sampled at any other frequency is theoretically determined by temporal aggregation (or disaggregation) of the original process. These theoretically derived processes at different frequencies can be compared to the empirically estimated processes at the same frequencies. Significant deviations between empirical and theoretical results lead to the rejection of the hypothesis of only one GARCH process. We can show then that there is more than one relevant frequency in the volatility generation, and the market can be called temporally heterogeneous, as already found in Section 7.4.

There are two approaches for the theoretical aggregation of GARCH models. The GARCH model can be viewed as either a jump process (Drost and Nijman, 1993) or a diffusion process (Nelson and Foster, 1994). Both approaches lead to very similar results, so we only report results based on Drost and Nijman (1993). In both approaches, the sum of oc\ and fi\ (of Equation 8.2) tends to 1 as the frequency increases. The autoregressive parameter f}\ tends to 1, whereas the moving average parameter a\ tends to 0. In other words, the higher the frequency, the longer the clusters of volatility as measured in numbers of time series intervals.

Because previous results confirmed the adequacy of these theoretical results at the daily and weekly frequencies (Drost and Nijman, 1993), we use the daily estimations as a starting point to compute the results for the higher frequencies. High frequencies also have the advantage of high statistical significance.

Drost and Nijman (1993) show that symmetric weak GARCH( 1,1) processes are closed under temporal aggregation. A process is symmetric if the marginal distribution of returns is symmetric. The term "weak GARCH(1,1)" is exactly defined by Drost and Nijman (1993). It encompasses all processes that essentially follow Equation 8.2 with some weak, nonlinear deviations that are not visible in the autocorrelation of volatility. More precisely, if st is a symmetric weak GARCH(1,1), following the equation a2 = an + ois2 l + Pcr2 ], then the high-frequency parameters an, oc, and and the kurtosis - E[sf]/(E[sj])2 determine the corresponding low-frequency parameters. We obtain the symmetric weak GARCH(1,1) process e<m)tm, with

Cf2(m)tm - OO(m) + o7(m)S2m)rm m + P(m)V2(m)lm-m (8.6)



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [ 81 ] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134]