back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [ 97 ] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134]


97

either with the assumption of being constant or as a type of average value if value changes are recognized. It is generally accepted that correlations in financial time series vary over time (Longin and Solnik, 1995) and are even subject to correlation "breakdown" or large changes in correlation in critical periods. In the discussion that follows, we probe the stability of correlation as a function of time, for a number of financial instruments, in order to determine the relevance of using high-frequency data. We go on to investigate the manner in which present correlation values are in turn correlated to their past values (autocorrelation of correlations). A model of the self-memory of correlation is proposed as the basis for the formulation of a long-term correlation forecast.

The impact of time series data frequency on correlations should also be clearly established. This is especially relevant as higher frequency data becomes more widely available and more often used in order to improve statistics. Previous authors have demonstrated a dramatic decrease in correlation as data frequency enters the intra-hour level, for both stock (Epps, 1979) and foreign exchange returns (see Guillaume et al, 1997; Low etal, 1996). This discussion attempts to characterize and investigate more deeply the Epps effect in a number of financial time series through the examination of 7 years of high-frequency data.

10.3 covolatility weighting

The calculation of correlation coefficients is straightforward but some inconvenience is introduced via its simple definition. The correlation calculation requires two equally spaced (i.e., homogeneous) time series as input. This necessity is easily satisfied where low-frequency (< one tick per week) data are concerned. However, the problem requires more careful treatment at higher data frequencies where one cannot dictate the time or number of observations. One often faces two main problems when estimating correlation between two high-frequency time series. The first involves correlating two time series of inherently different frequencies. If the two time series are both regular with respect to data arrival intervals but of different frequencies, one might create from them two equally spaced, homogeneous time series, which both have frequencies equal to the lesser frequent of the two. This easy situation does not occur very often, though. It is more common to be faced with time series such as foreign exchange (FX) rates, where data frequency can vary from very few quotes to hundreds of quotes per hour. What is the best way to measure the dependence between an FX rate and another one that is perhaps less active or has activity peaks and valleys at completely different daytimes? Ideally, one would prefer the correlation calculation to be updated more often when more information exists and less often when it does not exist. A way to do this is to introduce a time scale that compresses physical time if there is no information and to expand it when it exists. This is similar to the idea presented in Chapter 6, where z/-time was introduced to model volatility patterns. This method has been



found useful for a number of applications, but is time-consuming to implement in practice. Moreover, we have the multivariate problem of two time series for which we would need a common time scale.

A second problem arising when estimating correlation between two high-frequency financial time series is that of missing values or data gaps. Large data gaps are actually an extreme case of the first problem (varying and nonmatching data arrival frequencies), but there is no harm in discussing the two problems separately. Despite ones best efforts, data gaps sometimes occur due to failures in the data acquisition chain. One can only make an educated guess about the correlation between two time series when such a gap occurs; it cannot be measured. More commonly, there are financial instruments whose time series have regular and large data gaps as part of their inherent character. Consider, for example, attempting to correlate a stock index (e.g., the Dow Jones Industrial Average,) which exists for 8 hr per day, 5 days per week (except holidays), with another stock index that exists for a similar amount of time each day but with a relatively large time shift (e.g., the Financial Times 100 index). There are a number of different schools of thought regarding the correlation between two financial instruments when one or both are not actually active. These sometimes consider derivatives of the instruments rather than the underlying instruments themselves. Other arguments confuse time-lagged correlation with direct correlation, but these are entirely different issues. When faced with varying activity rates and data gaps, it would be convenient to use some form of data interpolation to solve these problems. Unfortunately, the experience of many practitioners has not been reassuring (see Press et al, 1992).

Some methods for approximating a homogeneous time series from unevenly spaced, tick-by-tick data involve some form of data imputation. Methods of imputing data vary in complexity and effectiveness and most have been found to be beneficial under at least some set of conditions and assumptions. However, all forms of imputation rely on a model, and a standard supposition is that critical characteristics of the data do not change between in-sample and out-of-sample periods. There is always the possibility that imputation will introduce a false bias into variance and covariance calculations, but nevertheless it is difficult to avoid some form of it in cases where data is not of an infinitely high frequency. Some useful attempts have been made to circumvent imputation all together. One interesting and recent example is described in de Jong and Nijman (1997). This work builds on efforts described in Cohen et al. (1983) and Lo and MacKinlay (1990a,b). The authors develop a covariance estimator, which uses irregularly spaced data whenever and wherever it exists in either of two time series. However, methods such as this one rest on the assumption that the processes generating transaction times and the prices themselves are independent. This assumption may be quite reasonable, depending on the instruments involved, but proving so is rarely trivial and we prefer to avoid it altogether.

In this discussion, we propose and illustrate a simple measure of correlation that avoids imputation based on data models or assumptions on distributional characteristics. Although the inputs for this alternative measure are homogeneous



time series derived through simple linear interpolation, the method filters out any underestimation of variances and covariances caused by lack of sampling variation. In addition, rather than making the strong assumption that price and transaction time are independent, this method makes use of the arrival time variable in order to compensate for the sometimes large differences that can exist in financial time series frequencies. Data gaps of varying size are common and we ignore any discussion of whether correlation actually exists during this period, because in any case we cannot measure it directly. Our goal is rather to develop a measure of correlation where information exists and to avoid updating our measure where data do not exist, a fact that should be recalled when results are interpreted. This implies that a lower data frequency or data gaps in one time series may limit the use of another one, and the unavoidable price to pay is a certain loss of statistical significance. However, the method is specifically meant to measure correlations at high data frequencies where statistical significance is high by nature.

10.3.1 Formulation of an Adjusted Correlation Measure

The standard linear correlation coefficient is a measure of correlation between two time series ,- and ,- and is defined as follows:

, EU (A*,- - ( ))( ,- - ( ))

( (, ;) == = (10.1)

VE"=i ( *.- - < *»2 E?=i ( - ( ))2

with the sample means,

( )=] - and ( )=] ( .2)

L- n L- n

1 = 1 /=1

The sample is of size T with n = T/At homogeneously spaced observations. Correlation values are unitless and may range from - 1 (completely anticorrelated) to 1 (completely correlated). A value of zero indicates two uncorrelated series.

The two variables ,- and ,- are usually returns of two financial assets. In risk assessment (but not in portfolio allocation), the deviation of returns from the zero level is often considered instead of the deviation from the sample means (Ax) and ( ). In this special case, we can insert ( ) = ( ) = 0 in Equation 10.1.

An estimate of the local covolatility for each of these observations is defined by further dividing each time span (At) over which ,- and ,- are calculated into m equal subintervals from which subreturn values, Axj and , can be obtained. This redefined time series now consists of h = T/ At equally spaced return observations where At = mAt. The return definitions conform to Equation 3.7, based on logarithmic middle prices as in Equation 3.6. To obtain a homogeneous series, we need linear interpolation as introduced in Equation 3.2. The choice of linear interpolation method is essential.

For each of the previous coarse returns, ,- (as for ,-), there exists a corresponding estimation of covolatility between the two homogeneous time series of



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [ 97 ] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134]