back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [ 170 ] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


170

12.8 IMPLIED f-RATIOS FOR THE VARIOUS CRITERIA 503

Bayes Theorem and Posterior Odds for Model Selection

Bayes theorem is based on the definition of conditional probability. Let £, and Ej be two events. Then by the definition of conditional probability, we have

Hence

P(£, I E,) =

{ I £)P(£,) P{E,)

Now substitute H (hypothesis about the model that generated the data) for E, and D (observed data) for £2- Then we have

Here P{D \ H) is the probability of observing the data given that H is true. This is usually called the likelihood. P(H) is our probability that H is true before observing the data (usually called the prior probability). P(H \ D) is the probability that H is true after observing the data (usually called the posterior probability). P(D) is the unconditional probability of observing the data (whether H is true or not). Often P(£)) is difficult to compute. Hence we write the relation above as

P(H I D) P(D I ) ( )

That is: Posterior probability varies with (or is proportional to) hkelihood times prior probabiHty. If we have two hypothesis H, and H2, then

Hence

( , D) P{D )

I D) I ,) )

(12.16)

The left-hand side is called posterior odds. The first term on the right-hand side is called the likelihood ratio, and the second term on the right-hand side is called the prior odds. Thus we have:

Posterior odds equals likelihood ratio times prior odds.

In the problem of choice of regressors, , and Hj involve some parameters, say P and y. The likelihood ratio is computed by a weighting procedure, the weights being determined by the prior distributions of these parameter sets. Thus we have



, I data) / LiPih) dy )

(12.17)

where L, and are the respective likelihoods and P, and are the respective prior distributions for the parameters in models 1 and 2. The first term on the right-hand side is sometimes called the Bayes factor.

There have been many suggestions for the literature for P, and Pj and thus for the computation of Bayesian posterior odds."" Since our purpose here is to illustrate how the implied F-ratio changes with the sample size according to some Bayesian arguments, we will present one formula suggested by Leamer. He suggests that the posterior probabilities should satisfy the following properties:

1. There must be no arbitrary constants.

2. The posterior probability of a model should be invariant to linear transformations of the data.

3. There should be a degrees of freedom adjustment: of two models that both yield the same residual sum of squares, the one with the fewer explanatory variables should have the higher posterior probability.

Based on these criteria, Leamer suggests prior distributions P, and Pj and computes the posterior odds as prior odds times the Bayes factor given by (in the notation we are using)

„ fRRSS"" =

[uRSs)

,7*2« (12.18)

We say that the evidence favors the restricted model if < 1, Using the F-ratio defined in (12.14), we get the condition as

F< 4- - 1) (12.19)

Compared with the F-ratios presented in Table 12.4, this criterion produces large changes in the F-ratios as the sample size n changes. The F-ratios Leamers criterion implies are presented in Table 12.5.

12.9 Cross-Validation

One of the major criticisms of the different criteria for selection of regressors that we discussed in the preceding section is that a model we choose may be the best for the period used in estimation but may not be the best when we use it for prediction in future periods. To handle this criticism, it is often suggested that we use only part of the data for the purpose of estimation and save the

"An early survey is in K. M. Gaver and M. S. Geisel, "Discriminating Among Alternative Models: Bayesian and Non-Bayesian Methods," in P. Zarembka (ed.). Frontiers of Econometrics (New York: Academic Press, 1974).

PiH, I data) / L,P,0)



Table 12.5

Critical F-Values Implied by Bayesian Posterior Odds Criterion

n -

= 1

1.74

2.44

4.01

4.68

1.48

2.18

3.89

4.60

1.29

1.98

3.78

4.53

0.99

1.62

3.53

4.37

5% point of F

6.60

4.96

4.03

3.94

*2 =3

k= 1

2.42

3.08

4.34

4.90

1.97

2.69

4.20

4.82

1.66

2.40

4.07

4.74

1.20

1.89

3.79

4.56

5% point of F

5.41

3.71

2.79

2.70

k, = 5

Jt= I

3.45

3.95

4.70

5.13

2.67

3.36

4.54

5.05

2.16

2.93

4.40

4.96

1.47

2.23

4.07

4.76

5% point of F

5.05

3.33

2.40

2.30

Source: E. E. Leamer, Specification Searcties, Wiley, New York, 1978, p. 116.

rest for the purpose of prediction and to check the adequacy of the model chosen. We estimate the different models using the first part and then use the estimated parameters to generate predictions for the second part. The model that minimizes the sum of squared prediction errors is then chosen as the best model.

This procedure of splitting the data into two parts-one for estimation and the other for prediction-is called cross-validation. Actually, we have two sets of prediction errors. The prediction errors for the first part (estimation period) are known as within sample prediction errors. The sum of squares of these prediction errors is the usual residual sum of squares. The prediction errors from the second part are known as the out-of-sample prediction errors. Different criteria in cross-validation depend on different weights given to the sums of squares of these two sets of prediction errors. A criterion often used is to give equal weights to these two sets of prediction errors.

What the cross-validation procedure does is to impose a penalty for parameter instability. If the parameters are not stable between the estimation period and prediction period, the sum-of-squared out-of-sample prediction errors will be large even if the sum of squares of within-sample prediction errors is small. Thus the procedure of model selection by cross-validation impUes choosing the model that minimizes: the usual residual sum of squares plus a penalty for parameter instability.

Instead of splitting the data into two sets and using one for estimation and the other for prediction, we can follow the procedure of using one observation at a time for prediction (and the remaining observations for estimation). That is, in fact, the idea behind "predicted residuals" discussed earlier in Section 12.4. There the sum of squares of the predicted residuals, PRESS, was sug-



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [ 170 ] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]