back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [ 40 ] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134]


40

same quote twice does not make this quote more reliable. Two nonidentical quotes from the same contributor may imply that the second quote has been produced to correct a bad first one. Another interpretation might be that an automated quoting system has a random generator to send a sequence of slightly varying quotes to mark presence on the information system. Different quotes from entirely different contributors are the most reliable case for pair filtering.

The basic tool is a function to compare the origins of the two quotes, considering the main source (the information provider), the contributor ID (bank name), and the location information. This implies that available information on contributors has a value in data cleaning and should be collected rather than ignored. An "unknown" origin is treated just like another origin name. The resulting independence measure I[j is confined between 0 for identical origins and 1 for clearly different origins. In some cases (e.g., same bank but different subsidiary), a value between 0 and 1 can be chosen.

Fjj is not yet the final formulation but has to be put in relation with the general origin diversity of the time series. An analysis of data from only one or very few origins must be different from that of data with a rich variety of origins. The general diversity D can be defined as a moving average of the , ! of valid neighbor quotes,

D = EMA[ tick-time, fi; ,] (4.22)

where R is the range (center of gravity) of the kernel. The "tick-time" is a time scale that is incremented by one at each new quote. The "next point" interpolation is again appropriate in the EMA computation. Only "valid" quotes are used; this is possible on a higher level of the algorithm (see Section 4.5.5). By doing so, we prevent D from being lowered by bad mass quotes from a single computerized source. Thus we are protected against a difficult filtering problem. The high number of bad mass quotes from a single contributor will not force the filter to accept the bad level.

The use of D makes the independence variable 1 adaptive through the following formula:

Uj = lu+f(D)(\ -lij) (4.23)

with

/<»> -

If the diversity is very low (e.g., in a single-contributor source), this formula (reluctantly) raises the independence estimate Itj to allow for some positive trust capital to build up. For a strictly uniform source (/ = D - 0), will reach 0,5, which is one half of the 1 value of truly independent quotes in a multicontributor series.



The output variable /,y resulting from Equation 4.14 is always confined between 0 and 1 and is generally used in Equation 4.14. Some special cases need a special discussion:

Repeated quotes. Rarely, the raw data contains long series of repeated quotes from the same contributor, and the obtained value of / may still be too high. A solution would be a special filtering element focused on repeated ticks.

High-quality data. The collected data may be mixed with old, historical, commercially available daily data that were of distinctly higher quality than the data from a single, average-quality contributor. When comparing two quotes from this historical daily data, we may force l[- = 1 although these quotes come from the same "contributor." This special filtering element is necessary only if there are huge, proven quality differences between contributors.

In multivariate filtering (see Section 4.8.1), artificial quotes that might be injected by a multivariate covariance analysis should have /(. = 1 when compared to each other or to any other quote.

4.4.6 A Time Scale for Filtering

Time plays a role in the adaptive elements of the level filter as well as in almost all parts of the change filter. Value changes are tolerated more easily when separated by a large time interval between the time stamps. When using the term "time interval," we need to specify the time scale to be used.

The algorithm works with any time scale, but some are more suitable than others. If our tolerance for quote level changes is as large over weekends as over working hours, we have to accept almost any bad quote from the few weekend contributors. These weekend quotes are sometimes test quotes or other outliers in the absence of a liquid market. Our solution is a time scale that compresses the weekends and other inactive periods and thus leads to a lower tolerance.

Accounting for the low weekend activity is vital, but the exact treatment of typical volatility patterns during working days is less important. Therefore, we cannot accept using only physical time (= calendar/clock time), but the following solutions are possible:

1. A very simple business time with two states: active (working days) and inactive (weekend from Friday 21:00:00 GMT to Sunday 21:00:00 GMT, plus the most important and general holidays). The speed of this business time as compared to physical time would be either 1.4 (in active state) or 0.01 (in inactive state).

2. An adaptively weighted mean of three simple, generic business time scales # with smoothly varying weights according to built-in statistics. This solution suits those filter developers that prefer to avoid the complex # technology of Chapter 6.



TABLE 4.4 Active periods of the three generic markets.

Daytimes limiting the active periods of three generic, continent-wide markets; in Greenwich Mean Time (GMT). The scheme is coarse, modeling just the main structure of worldwide financial markets. The active periods differ according to local time zones and business hours. The Asian market starts on the day before from the viewpoint of the GMT time zone.

Market

start,&

tend,k

East Asia

21:00

7:00

Europe

6:00

16:00

America

11:00

21:00

3. An adaptively weighted mean of three generic business time scales ft as defined by Chapter 6 or Dacorogna et al. (1993).

The second solution differs from the third one only in the definition of the basic #-time scales. The adaptivity mechanism is the same for both solutions.

Three generic ft-times are used, based on typical volatility patterns of three main markets: Asia, Europe, and America. In the second solution, these ft times are defined as follows:

djh = 1 3.4 if fstara < td < ?end,A on a working day dt \ 0.01 otherwise (inactive times, weekends, holidays)

where td is the daytime in Greenwich Mean Time (GMT) and the generic start and end times of the working-daily activity periods are given by Table 4.4. They correspond to typical observations in several markets. The active periods of exchange-traded instruments are subsets of the active periods of Table 4.4. The time scales ft are time integrals of dfti/dt from Equation 4.25. Thus the time ft flows either rapidly in active market periods or very slowly in inactive periods. Its long-term average speed is similar to physical time. The implementation of Equation 4.25 requires some knowledge about holidays. The database of holidays to be applied may be rudimentary (e.g., Christmas holidays) or more elaborate to cover all main holidays of the financial centers on the three continents. The effect of daylight saving time is neglected here as the market activity model is coarse.

If the three i-times are chosen as defined by Chapter 6 (the third solution of the list), effects like daylight saving time and local holidays (i.e., characteristic for one continent) are also covered. The activity in the morning of the geographical markets is higher than in the afternoon-a typical behavior of FX rates and, even more so, interest rates, interest rate futures, and other exchange-traded markets.

Once the three scales ftk are defined (by the integrals of Equation 4.25 in our suggestion), their adaptively weighted mean is constructed and used as the time scale ft for filtering. This #-time is able to approximately capture the daily and weekly seasonality and the low volatility of holidays. High precision is not



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [ 40 ] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134]