back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [ 41 ] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134]


41

required as & is only one among many ingredients of the data cleaning algorithm, many of which are based on rather coarse approximations. This is the definition of #-time:

£ wk &k (4.26)

all*

with

J2wk = 1 (4-27>

all*

where "all k" means "all markets." This is three in our case, but the algorithm also works for any other number of generic markets. The weights wk are adaptive to the actual behavior of the volatility. A high wk reflects a high fitness of &k, which implies that the volatility measured in &k has low seasonal variations.

The determination of the wk might be done with methods such as the maximum likelihood estimation of a volatility model. However, this would be unreliable given the local convergence issues and the existing modeling limitations of Equation 4.26. The proposed heuristic method always returns an unambiguous solution. The volatility of changes of the filtered variable is measured on all #£-scales in terms of a variance similar to Equation 4.16:

ak = , EMA

(Sx)2

A smooth i

8&k + <5#o

(4.28)

where 8&k is the interval between validated neighbor quotes in tfy-time, 8x is the corresponding change of the filtered variable, <5r>o is defined by Equation 4.17 and the time scale of the EMA is #£-time. The notation is as in Sections 3.3.5 and 3.4.3. Smoothing with a short range A Smooth is necessary to diminish the influence of quote-to-quote noise. The EMA computation assumes a constant value of (8x)2/{8$k + <5#o) for the whole quote interval. This means the "next point" interpolation of Equation 3.52.

The fluctuations of the variable ak indicate the badness of the model. In the case of a bad fit, ak is often very low (when the -scale expands time) and sometimes very high (when the #£-sca]e compresses time). The fluctuations are quantified in terms of the variance Fk,

Fk = EMA[ A#r; ( ak - EMA[ A&r\ crk])2] (4.29)

= MVar[ A&r, 2; ak ]

where the time scale is #fc-time; the MVar operator is explained in Section 3.3.8.



/* quote

4 4+\ U+2

f,-+7 Time

:IGURE 4.2 The scalar filtering window moves forward in time by including new scalar juotes and dismissing old ones.

"be range A&r has to be suitably chosen. In our approximation, the fluctuations iirectly define the weight of the kh market:

(4.30)

Fk Sail k "fy

vhich satisfies Equation 4.27 and can be inserted into Equation 4.26.

.5 THE SCALAR FILTERING WINDOW

~he scalar filtering window is located at the bottom of the hierarchical structure the algorithm as shown in Table 4.1. It covers the set of all recent scalar quotes ontained in a time interval. This neighborhood of quotes is used to judge the :redibility of new incoming scalar quotes. In the course of the analysis, these new juotes are included and old quotes are dismissed at the back end of the window ollowing a certain rule. Thus the window is moving forward in time. This nechanism is illustrated by Figure 4.2.

All the scalar quotes within the window have a provisional credibility value, vhich is modified with new incoming quotes. When the quotes leave the window, heir credibilities are regarded as finally determined. Sufficiently credible quotes are then used to update the statistics needed for adaptivity.



At the initialization of a filter from scratch, the window is empty. When the first scalar quote enters, it cannot be filtered by pair filtering yet, only the level filter applies.

4.5.1 Entering a New Quote in the Scalar Filtering Window

Whenever a new scalar quote enters the window, an analysis is made based on earlier results and the new quote.

There are two possible ways in which a new quote enters the scalar filtering window:

1. The normal update. A new scalar quote from the data source enters, is analyzed, and finally becomes the newest member of the scalar filtering window. The window variables are updated accordingly. These operations are described by Sections 4.5.2 through 4.5.6.

2. A filter test. A new scalar quote from any source is merely tested. It is analyzed as in a normal update, but it does not become a member of the window. No window variable is changed by this test. Thus we execute the steps of Section 4.5.2 and avoid those of Sections 4.5.3 through 4.5.6. The resulting trust capital of the new scalar quote is returned.

4.5.2 The Trust Capital of a New Scalar Quote

The algorithm of the filtering window is organized in an iterative way. Whenever a new quote enters the window, an update is made based on earlier results and an analysis of the new quote.

When the new, ih scalar quote arrives, it already satisfies certain basic validity criteria (e.g., a price is not negative) and has possibly been transformed to a logarithmic value. This is ensured by the higher-level quote splitting algorithm explained in Section 4.6. The following filtering operations are done with the incoming Ith scalar quote:

1. The base trust capital T;o is computed as the result of the level filter, Equation 4.6, if the scalar quote is a bid-ask spread. Otherwise, 7}n = 0. The resulting 7)o of Equation 4.6 is multiplied by a configured constant cjevei that determines the importance of level filtering.

2. The new quote is compared to all old quotes of the window through pair filtering steps as described in Section 4.4.3. The trust capitals Tjj resulting from Equation 4.13 determine the trust capital Tt of the new quote and also affect the trust capitals 7} of the old quotes.

For computing Tjj, we need the expected squared value change V from Equation 4.20 and A#corr from Equation 4.19 and therefore the number Q of valid quotes in the time interval from quote j to quote /. For this, we use the valid-quote age Qj of the old quotes

Q = Qj + \

(4.31)



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [ 41 ] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134]