back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [ 46 ] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134]


46

TABLE 4.5 List of filter parameters.

Description of parameter

Symbol

Equation number

Range of mean x

4.3,4.4

Parameters of A-*min used in the level filter

(after Equation 4.7)

Critical deviation from mean x

Critical size of value change

4.11

Interaction range in change filter (normal

4.13

value, special value for bid-ask spread)

Range of quote density

4.15

Weight of new quote in quote density (normal

4.15

value, special value for repeated quotes)

Range of short-term, standard and long-term

A&r

4.16

volatility (ufast, u, us]ow)

Relative time interval offset for volatility

4.17

Absolute time interval offset for volatility

4.17

Relative limits of quote interval A§ (upper,

4.19

lower)

Weight of squared granule in volatility offset

4.21

Parameters used for volatility offset £o f°r

(after Equation 4.21)

bid-ask spreads

Range (memory) of the quote diversity

4.22

analysis

All parameters of the impact of quote diversity

4.24

Activity of active periods, for

4.25

Activity of inactive periods, for &

4.25

Range of short-term volatility used for

A smooth

4.28

Range of the variance of volatility fluctuations

A&r

4.29

used for i?

Weight of the level filter

clevel

4.32

Trust capital dilution factor (normal value,

4.34-4.36

special value at initialization from scratch)

Window size parameter

4.43

Critical credibility for statistics update

Cent

(Section 4.5.5)

(normal value, special value at initialization

from scratch)

Lower limit of allowed domain (prices, FX

Pmin

4.44 (and Section 4.6.2)

forwards, interest rates)

Factor in transformation of bid-ask spreads

4.45

Standard credibility threshold for accepting a

(Section 4.7.1)

quote



of quotes is low. When a new quote of a sparse series comes in, there are only few quotes to compare and these quotes can be quite old and thus not ideal for filtering. This is the place where some additional information from the covariance matrix becomes useful. This can technically be done in several ways.

The only method outlined here is the artificial quote method. If the sparse rate (e.g., in form of a middle price) is included in a covariance matrix that also covers some denser rates, we can generate some artificial quotes of the sparse series by exploiting the most recent quotes of the denser series and the covariance matrix. The expectation maximization (EM) algorithm of Morgan Guaranty (1996) is a method to produce such artificial quotes; there are also some alternative methods. Results are good if all the series included in the generation of artificial quotes are highly correlated or anticorrelated to the sparse series.

Artificial quotes may suffer from three uncertainties: (1) they have a stochastic error in the value because they are estimated, (2) there is an uncertainty in time due to asynchronicities in the quotes of the different financial instruments (Low et al, 1996), and (3) only a part of the full quote is estimated from the covariance matrix (e.g., the middle price, whereas the bid-ask spread has to be coarsely estimated as an average of past values). Therefore, an additional rule may be helpful by using artificial quotes only if they are not too close to good quotes of the sparse series.

In some cases, we can simply use arbitrage conditions to construct an artificial quote, such as the triangular arbitrage of FX cross rates explained in Section 2.2.2. The following algorithmic steps are done in the artificial quote method:

Define a basket of high-frequency time series which are fairly well correlated or anticorrelated to the sparse series.

Generate artificial quotes from the correlation matrix and mix them with the normal quotes of the sparse series, thus reinforcing the power of the univariate filtering algorithm.

Eliminate the artificial quotes from the final output of the filter (because a filter is not a gap-filler).

This algorithm has the advantage of leaving the univariate filtering algorithm almost unchanged. The multivariate element only enters in the technical form of additional quotes. Quotes are the usual input of univariate filtering.

4.9 BEHAVIOR AND EFFECTS OF THE DATA FILTER

Data cleaning is a necessity because unfiltered outliers would spoil almost any data application. However, there is a legitimate concern about unwanted side effects caused by data cleaning. Are too many ticks rejected? Does filtering open a door to arbitrary data manipulation?

The rejection rates are low as shown by the typical examples presented in Table 4.6. The investigated data filter is a standard filter developed and used by Olsen & Associates (O&A), following the guidelines of Chapter 4. A proper buildup time is essential for such an adaptive filter as explained in Section 4.3.1. In all



TABLE 4.6 Data cleaning: Rejection rates.

Percentage of ticks rejected by a standard data cleaning filter of Olsen & Associates, for different financial markets. The analyzed test samples always consist of irregularly spaced high-frequency data over a period of one year. The reported rejection rates originate from the filter working in real-time mode.

Market

Financial

Analyzed

Number of

Rejected

instrument

time

all ticks

outlier

rejected

period

in period

ticks

ticks

Major FX rates

EUR-USD

Mar 99-Feb 00

3,457,116

0.07%

0.30%

USD-JPY

Jan 89-Dec 89

683,555

0.24%

0.49%

USD-JPY

Jan 99-Dec 99

1,324,421

0.06%

0.48%

Minor FX rates

USD-MYR

Jan 99-Dec 99

1,950

7.59%

8.41%

USD-MXP

Jan 99-Dec 99

55,227

1.14%

1.66%

Spot interest rates

Jan 99-Dec 99

10,471

0.08%

50.27%

(3 months)

Short-term interest

Jan 99-Dec 99

34,561

8.54%

8.54%

rate futures

(Mar 00, LIFFE)

examples, the build-up period was the 3 months preceding the analyzed period. All the examined raw data have been collected from the Reuters real-time data feed. Two rejection rates are indicated: (1) the rejection rate of "classical" outliers only, and (2) the rate of all rejected ticks, including those monotonically drifting or excessively repeated ticks identified by special parts of the cleaning algorithm. These "nonclassical" data errors are explained in Section 4.2.2 and can directly or indirectly lead to bad data quality, as the normal outliers. Therefore, they are eliminated by a good data filter.

For frequently quoted, major financial instruments, less than 0.5% of the ticks are rejected, as indicated by examples of major FX rates (EUR-USD and USD-JPY) in Table 4.6. The two analyzed USD-JPY samples are separated in time by 10 years. The percentage of outliers has clearly decreased over these 10 years. Data quality seems to have improved. However, the percentage of all rejected ticks has remained almost stable, due to an increase of monotonically drifting and excessively repeated ticks. These bad ticks are generated by improper computerized quoting, which has obviously become more widespread over the years. Minor FX rates such as USD-MYR and USD-MXP in Table 4.6 typically have higher rejection rates, which may exceed 5%. In less liquid markets, the competitive pressure to publish high-quality data seems to be lower. The spot interest rate of GBP with a maturity of 3 months in Table 4.6 has the high rejection rate of 50%, but there are just 0.1% true outliers. The high number of 50% is solely due to the quoting habit of one single bank that excessively repeated few quotes at high frequency over long periods. This behavior is also found for other, similar financial instruments. Market data from exchanges are often more reliable because of the centralized data generation. The percentage of outliers is



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [ 46 ] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134]