back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [ 133 ] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166]


133

Recent interest in the possibility that financial markets are governed by low-dimensional but deterministic chaotic systems has precipitated a new generation of models for forecasting high-frequency data. Some of the methods that have been applied to the detection of chaos in high-frequency financial data are described in §13.3, and the chapter finishes by describing nearest neighbour prediction algorithms that are based on the assumption of chaotic dynamics.

13.1 High-Frequency Data

13.1.1 Data and Information Sources

Much of the research into high-frequency financial returns has been initiated by Olsen and Associates (O&A) who have made some very useful data available to the public. One of the O&A data sets provides tic data on some major US dollar exchange rates and interest rates. They have also provided several years of tic data on the USD-DEM exchange rate from 1 January 1987 to 31 December 1993. Other information in the first O&A data set includes the identification of the institution that has given each quote, and time-stamped headline news items from Money Market Headline News. More recently, O&A have provided half-hourly bid-ask quote data from 1 January 1996 to 31 December 1996 on 25 exchange rates and 4 precious metals, and half-hourly transaction price data on 12 Euromarket futures, the Dow Jones Industrial Average and the S&P 500. These data sets are available for research purposes, at a nominal cost, from www.olsen.ch. O&A have also organized two highly successful international conferences on high-frequency data in finance (HFDF).1

Real-time tic data covering many markets are available from a number of commercial data vendors, such as Bloomberg (www.bloomberg.com), Reuters (www.reuters.com) and LIFFE (www.liffe.com). Some also provide the essential software that allows the latest information to be downloaded and formatted on a regular basis. But even with this there are enormous data management issues surrounding the storage and filtering of tic data, and of course these must be addressed before any results from a historical analysis of high-frequency time series analysis can be put into practice.

The Computer Society and the Neural Network Council of the Institute of Electrical and Electronics Engineers (www.ieee.com) produce several publications that are aimed at practitioners in financial markets and have organized a number of international conferences on computational intelligence for financial engineering. These have encompassed a number of areas in the time series analysis of high-frequency data on financial markets such as data filtering, foreign exchange forecasting and the prediction of volatility and prices.

Proceedings of the HFDF conferences in 1995 and 1996 are available from www.olsen.ch.



13.1.2 Data Filters

A large amount of information can be filtered out of tic data on bid and ask quotes. If equally spaced time series are to be extracted, the data must be sorted within a fixed time bucket. Within each -minute interval the open, close, high and low of the bid and ask quotes may be recorded, and the volume of quote activity. Then a closing price series is normally obtained by taking an average of the latest bid and ask quotes during the interval.2

Since these data are only quotes and not the actual transaction prices, some preliminary filters should be applied in order that they be analysed as if they were price series. Error filters may be applied as described in Guillaime et al. (1997), so that price data are not recorded from impossible or erroneous quotes. Of course it is not possible to remove all quotes that are made by players simply attempting to bid the market up or down, but rules may be applied to filter out obvious bad quotes.

For example, a cleaned price series {p*} can be constructed from a price series {p,} by defining a suitably large price increment c. If the price changes more than this amount, but the next price does not verify this change then the price is ignored. So p* is defined recursively by setting p* = px and then

p* = p? ! if \p, - p*-\ \ > but \pt+\ -p*-i \ < c, and p*= p, otherwise.

Continuous tic data will also cover periods such as weekends and bank holidays where little or no activity is recorded, and it is normal to remove these periods before examining the data. Long series of zero returns will distort the statistical properties of prices and returns and make volatility and correlation modelling extremely difficult. Figure 13.1 shows the DEM-USD rate in the O&A data, where the closing price is taken as the geometric average of the latest bid and ask quotes in a 1-hour time interval and weekends have been removed.

It is not possible to remove all quotes that are made by players simply attempting to bid the market up or down, but rules may be applied to filter out obvious bad quotes

13.1.3 Autocorrelation Properties

The autocorrelation properties of high-frequency returns have been established by a large empirical literature:

>- There is little autocorrelation in returns, except perhaps some negative autocorrelation at very high frequency (Goodhart and Figliouli, 1991; Bollerslev and Domowitz, 1993; Zhou, 1992).

>- There is. however, a lot of autocorrelation in squared returns, and this conditional heteroscedasticity becomes more pronounced as the sampling frequency increases (Andersen and Bollerslev, 1996; Baillie and Bollerslev,

2Prices are obtained either from the arithmetic average, price = (bid + ask)/2, or from the geometric average, price = (bid*ask) The latter is preferable since it is the log prices that are normally applied to compute returns; however, the difference between these two averages is negligible (of the order of 10 5 for a bid-ask spread of 1 %).



1.75

Figure 13.1 Hourly price data on the German mark-US dollar exchange rate, from 1 October 1992 to 30 September 1993.

Table 13.1: Box-Pierce autocorrelation statistics for the return on DEM-USD rates

1-hr

6-hr

12-hr

1-day

1-week

Returns 6(1) 6(2) 6(20)

Squared returns

6(1)

6(2)

6(20)

1.08 7.73 44.9

117.0 196.0 246.0

0.02 0.06 27.6

-0.08 1.24 70.0

0.75 0.77 28.1

0.01 1.89 52.8

1.50 2.12 22.1

1.19 2.68 29.7

0.70 0.71 26.4

0.69 1.09 11.1

1990; Zhou, 1996; Drost and Nijman, 1993; Ghose and Kroner, 1995; Taylor and Xu, 1997).

These stylized facts will be supported with an example based on part of the O&A 1993 data set. Olsen and Associates have collected bid and ask quotes on the exchanges rates for the US dollar with sterling, the Deutsche Mark and the Japanese yen in real time during the period from 1 October 1992 to 30 September 1993. Each quote is time-stamped to the nearest minute and obvious error or outliers are flagged so that they can be removed.

Using the data shown in Figure 13.1, analysis of the non-overlapping returns of different frequencies from 1 hour to 1 week is shown in Table 13.1. The Box-Pierce 0(/7)-statistics for/7th-order autocorrelation that are described in §11.3.2 are computed for /7 = 1,2 and 20 on both returns and squared returns. They are asymptotically distributed as ( ), so the relevant 1% critical values are 6.63 for Q(l), 9.21 for Q(2) and 37.6 for 0(20).

The first three rows of Table 13.1 indicate very little autocorrelation in returns, except at the highest frequency. The 1-hr data do exhibit some negative



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [ 133 ] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166]