back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [ 151 ] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166]


151

model performs well according to the preferred performance measures. In a full backtest many successive diagnostic statistics should be analysed for their time series qualities. Only when backtesting results are stable and robust over time can one impart some degree of confidence in the model.

Backtests will need to include a variety of market conditions such as trending markets, stable markets and periods of extreme movements. And since the results are likely to depend on the market regime, a view has to be taken on the likelihood of each regime in the future. Some examples of backtesting a time series model can be found in §12.5.5 and §9.5.1.

A.5.3 Statistical and Operational Evaluation Methods

A predictive test is a single evaluation of the model performance based on comparison of actual data with the values predicted by the model. Some common evaluation metrics are now described. There are two types of post-sample performance measures, statistical and operational. Statistical evaluation methods compare model predictions with observable quantities, such as asset prices or returns. Common statistical performance measures include:

>- Information ratio (IR): the mean prediction error divided by the standard

deviation of the prediction error; >- Root mean square error (RMSE): the square root of the mean of the squared

prediction errors;

>• Likelihood of the prediction: the product of the likelihoods of each point predicted, assuming some form of density function for the quantities being predicted;

>• Mean absolute error: the mean of the absolute values of the prediction errors;

>• Mean square error: the mean of the squares of the prediction errors;

>- Normalized root mean square error: the RMSE divided by the estimated

standard deviation of the prediction errors; *- Out-of-sample correlation coefficient: the correlation between the predictions

and the actual values, having made the appropriate stationarity

transformation if necessary.20

There are two main concerns with the statistical approach to model evaluation:

~~ Most of these standard statistical criteria for model accuracy do not distinguish outperformance from underperformance, since they ignore the sign of the prediction error.

Often the distributional assumptions that underpin statistical evaluation

* t - nj-.pte. r. j pr.ce prediction model one would not compute the predicted-actual price correlation - it would neeu to be done on the respective returns.



procedures are difficult to justify. The associated hypothesis tests based on these statistics will usually only be valid under certain assumptions, such as returns being normally distributed.

A basic example of an operational evaluation method is the backtesting of a value-at-risk model by counting exceptional losses (§9.5.1). Operational evaluation methods focus on the context in which the prediction is used, by imposing a metric on prediction results. More generally, when predictions are used for trading or hedging purposes, the performance of a trading or hedging metric provides a measure of the models success.

Most trading metrics for measuring prediction performance are variants of a profit and loss (P&L) metric. The basic framework of a P&L metric is to define an indicator variable for the position at time t as

{1 if the position is long - 1 if the position is short 0 if the position is neutral

If p, is the realized asset price at time / then I,~\{p, - p,-\) is the gain or loss on the position at time t. Often a P&L metric includes a fixed transactions cost c, so the net profit or loss on the position at time t is

g, = I,-\iPt -Pt-\) -c\I,- I- (A.5.3)

The performance of price predictions may be expressed in term of total P&L up to time T (the time horizon of the post-sample test). That is, P&Lr = ~Ef=l g,. But perhaps a better statistic is the mean P&L (g = [P&Lr]/T) because it avoids the obvious scaling problem as the length of post-sample test period increases. One might also wish to adjust for risk to penalize predictions that give highly variable P&Ls during the test. A common performance measure is the normalized mean P&L over the post-sample testing period, g/\„ where sg is the estimated standard deviation of g over the prediction period.

This is fairly standard so far, but the question of how the price predictions p, are translated into positions has not yet been answered. The definition of the position indicator is perhaps the most flexible part of the definition of trading metric, since it should reflect as far as possible the actual trading strategy. As a simple example, one could define a single threshold and then put

1 if >/>, !+ -1 if </>,-! -t

0 otherwise

Clearly many other definitions of position indicators are possible. Hedging strategies can also be defined in this framework, where rebalancing limits are placed on the option delta or the portfolio beta that is predicted by the model. An example of a P&L metric from a volatility trading perspective was given in §5.1.2.



Appendix 6 Maximum Likelihood Methods

Maximum likelihood is a standard method for fitting the parameters of a density function. It has already been mentioned in this context in §10.2. Under the classical assumptions of linear regression ordinary least squares estimation and maximum likelihood estimation are equivalent, so there is no explicit need for likelihood methods when estimating linear models or testing linear restrictions on their parameters. However, non-linear statistical models are normally estimated by maximum likelihood because maximum likelihood estimators (MLEs) are almost always consistent (§A.1.3). Models that are usually estimated by maximum likelihood include generalized autoregressive conditional heteroscedasticity (GARCH) models (Chapter 4) and neural networks (§13.2).

A.6.1 The Likelihood Function, MLE and LR Tests

The likelihood of an observation x on a random variable is the value of its density function at x, written f(x, 0), where 0 = (9,, . . ., 9,,) are the parameters of the density function. The likelihood function of an independent set of observations (xu . . ., x„) on the same random variable with density function 0) is the product of the likelihoods of each point, that is,

\ . . ., „) = ,,0). (A.6.1)

For given random sample data (x,, . . ., x„), the value of the likelihood will depend on 0. Figure A. 11a illustrates the likelihood of a random sample for two different values of a parameter vector 6: 0O and 0 The likelihood of the sample is greater if the parameters take the values 0,. That is, L(0O xb . . ., xn) > Z,(0, . . ., x„) since the product of the values of the density is greater when 0 = 0O than when 0 = 0,.

As 0 ranges over all possible values for all parameters the likelihood function describes a (q + l)-dimensional surface. For example, when there is a single parameter 9 the likelihood function describes a curve, such as in Figure A.l lb. The greater the value of the likelihood, the more probable are the parameter values, based on the given sample data. Different sample data will give different values of the likelihood, so the values of the parameters that generate the highest likelihood will depend on the choice of the sample data.

The maximum likelihood estimator of 0 is the value of 0 that maximizes the likelihood function, given the sample data:

MLE0 = arg maxL(0\xu . . ., x„).

The likelihood ratio test of a null hypothesis H0: 0 = 0O against H,: 0 = 0, is based on the statistic



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [ 151 ] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166]