back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [ 24 ] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134]


24

data. Heavy-tailed distributions, serial dependence, and heteroskedasticity as well as the choice of p may affect the behavior of the stochastic error.

The statistical significance can alternatively be increased by choosing return intervals shorter than m At. However, these short-term returns would be a different object of study. The technique of overlapping has the advantage of leaving the object of study unchanged while increasing the precision.

3.3 CONVOLUTION OPERATORS

The original inhomogeneous data can be processed by convolution operators to build new inhomogeneous time series. This approach, developed by Zumbach and Muller (2001), is fundamentally different from the construction of homogeneous time series as discussed in Section 3.2. A set of basic convolution operators is defined that can be combined to compute more sophisticated quantities, for example, different kinds of volatility or correlation. A few stylized properties of these operators are explored, but the main emphasis is to build a sufficient vocabulary of operators well suited to high-frequency data analysis.

In this process, we should keep in mind a few important considerations:

The computations must be efficient. Even if powerful computers are becoming cheaper, typical tick-by-tick data in finance are 100 or even 10,000 times more dense than daily data. Clearly, we cannot afford to compute a full convolution for every tick. For this reason, our basic workhorse is the exponential moving average (EMA) operator, which can be computed very efficiently through an iteration formula. A wealth of complex but still efficient operators can be constructed by combining and iterating the basic operators.

A stochastic behavior is the dominant characteristic of financial processes. For tick-by-tick data, it is not only the values but also the time points of the series which are stochastic. In this random world, pointwise values are of little significance and we are more interested in average values inside intervals. Thus the usual notion of return also has to be changed. With daily data, a daily return is computed by Equation 3.7, as a pointwise difference between the price today and the price yesterday. With high-frequency data, a better definition of the daily return may be the difference between the average price of the last few hours and an average price from one day ago. In this way, it is possible to build smooth variables well suited to random processes. The calculus has to be revisited in order to replace pointwise values by averages over some time intervals.

Analyzing data typically involves a characteristic time range; a return r[r ], for example, is computed on a given time interval r. With high-frequency data, this characteristic time interval can vary from a few minutes to several weeks. This is taken care of by making explicit all of these time range dependencies in the formulation of operators.



We usually want smooth operators with smooth kernels (weighting functions of moving averages). A simple example of a discontinuous operator is an average with a rectangular weighting function, say of range r. The second discontinuity at "now- ," corresponding to forgetting events, creates unnecessary noise. Instead, we prefer kernels with a smooth decay to zero. Only at t = now, we often prefer a jump in the kernel form. This jump gives a positive weight to the last piece of information and thus a rapid response in real time. For a discontinuous kernel, the weight at t = now is inversely proportional to the range of the operator. Therefore, there is a trade-off between a fast reaction, which has more noise, and a smooth average behavior with a slow reaction time. Besides this fundamental noise created by the advance of events, it is better to have continuous and smooth operators.

The generalization to inhomogeneous time series introduces a number of technical peculiarities. In this Section 3.3, only macroscopic operators are treated, which, because of their time-translation invariance, can be represented by convolutions. A convolution is defined as an integral, therefore the series should have representation in continuous time. Actual data is known only at discrete sampling times, so some interpolation needs to be used in order to properly define the convolution integral. The same problem is present when constructing an artificial homogeneous time series from inhomogeneous data as in Section 3.2.1. Another technical peculiarity originates from the fact that our macroscopic operators are ultimately composed of iterated moving averages. All such EMA operators have noncompact kernels where the kernels decay exponentially, but strictly speaking they are positive. This implies an infinite memory; a build-up must be done over an initialization period before the error of an operator value becomes negligible.

The examples of Sections 3.3 and 3.4 are from the foreign exchange market. The data set is USD-CHF for the week of Sunday, October 26, to Sunday, November 2, 1997. This week has been selected because on Tuesday, October 28, some Asian stock markets crashed, causing turbulences in many markets around the world, including the FX market. Yet the relation between a stock market crash originating in Asia and the USD-CHF foreign exchange rate is quite indirect, making this example interesting. The prices of USD-CHF for the example week are plotted in Figure 3.4. When not specified otherwise, all figures from Figure 3.4 to 3.17 display quantities for the same example week. All of these figures have been computed using high-frequency data. The results have been sampled each hour using linear interpolation. The computations have been done in physical time, therefore exhibiting the full daily and weekly seasonalities contained in the data.

Finally, we want to emphasize that the techniques presented in this section are suitable for application to a wide range of statistical computations in finance such as in risk management. An early application can be found in Pictet et al. (1992) and a recent application is in Zumbach et al. (2000).



1.39-

1.38--1-i-i--i-i-i--i-i-i-j-i-i-i--i-i-i-j-i-i-i-j-i-i-i-

26.10 27.10 28.10 29.10 30.10 31.10 1.11 2.11

Date

-IGURE 3.4 The FX rate USD-CHF for the week of Sunday, October 26, to Sunday, November 2, 1997. The high-frequency data are sampled hourly, using linear interpolation vith geometric middle price Vbid ask.

-.3.1 Notation Used for Time Series Operators

or time series operators in Sections 3.3 and 3.4, we use a suitable notation that .ometimes differs from the conventions used for homogeneous time series. The ".tier z is used to represent a generic time series. The elements or ticks, (tj, zj), >f a time series z consist of a time tj and a scalar value Zj. As everywhere in Chapter 3, t may stand for any (business) time scale, not only physical time. The generalization to multivariate inhomogeneous time series is fairly straightforward except for the business time scale aspect) and will not be discussed. The value = z(tj) and the time point tj constitute the y-th element of the time series z. "be sequence of sampling (or arrival) times is required to be growing, tj > tj-\. "he strict inequality is required in a true univariate time series and is theoretically otways true if the information arrives through one channel. In practice, the arrival :tme is known with finite precision, say of a second, and two ticks may well iave the same arrival time. Yet for most of the formulae that follow, the strict nonotonicity of the time process is not required. In the special case where the tme series is homogeneous, the sampling times are regularly spaced, tj - f, i = <t. If a time series depends on some parameters , these are made explicit between :quare brackets, z[9]-

An operator £2, from the space of time series into itself, is denoted by S2[z], as already illustrated by Figure 3.1 (b). The operator may depend on some parameters iW; z]. The value of at time t is £l[z\(t). For linear operators, a product notation £lz is also used. The average over a whole time series of length T is



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [ 24 ] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134]