back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [ 168 ] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


168

Let RSS denote the residual sum of squares from the jth model with kj explanatory variables. We define

n - kj

an estimate of from the jih model. We denote as the estimate of from a model that includes all the explanatory variables. We now discuss briefly the criteria hsted in Table 12.3.

Theils Criterion

Theils criterion" is based on the assumption that one of the models considered is the correct model. In this case if a] = RSSj/{n - k) is the estimate of cr-from the jib model, then E{dr]) = for the correct model but is > a for the misspecified model." Thus, choosing the model with the minimum will on the average lead us to pick the correct model. Since minimizing 6- is the same as maximizing R [see equation (4.22)], we refer to the rule alternatively as the maximum R rule.

A major problem with this rule is that a model that has all the explanatory variables of the correct model but also a number of irrelevant variables will also give E{(f) = . Thus the rule will not help us pick the correct model in this case. This indeed is confirmed by the results in Schmidt and Ebbeler" concerning the power function of the maximum R criterion. The probability

"H. Theil, Economic Forecasts and Policy, 2nd ed. (Amsterdam: North-Holland, 1961). "The proof of this result can be found in several books. See, for instance, G. S. Maddala, Econometrics (New York: McGraw-Hill, 1977), pp. 461-462. It is shown there that a model that has all the explanatory variables of the correct model but also a number of irrelevant variables will also give --) = -.

"R Schmidt, "Calculating the Power of the Minimum Standard Error Choice Criterion," International Economic Review, February 1973, pp. 253-255. The numerical errors in Schmidts paper are corrected in D. H. Ebbeler, "On the Probability of Correct Model Selection Using the Maximum R" Choice Criterion," International Economic Review, June 1975. pp. 516-520. There are three types of misspecifications considered in these papers: omitted variables, irrelevant variables, and wrong variables. For the first two cases, Ebbeler shows that some simple analytical results are available.

Table 12.3 Some Criteria for Choice Among Regression Models

Criterion Minimize"

Theils (1%1) RSSy(« - k)

Hockings Sp (1976) RSS/[(« - kKn - k - 1)]

Mallows Cp (1973) RSS, + 2fc,o

Amemiyas PC (1980) RSS/« + k)l{n - k)

Akaikes AIC (1973, 1977) RSS,exp[2(A:, + \)ln

"li,, number of explanatory variables; RSS,, residual sum of squares for the jth model; , (residual sum of squares)/(« - <r) in the model that includes all the <r explanatory variables.



1. Mallows Cp criterion.*

2. Hockings Sp criterion.

3. Amemiyas PC criterion.

Suppose that the correct equation involves variables and the equation we consider involves , (< k) variables. The problem is how to choose the number ky, as well as the particular set of / , variables. In the prediction criteria we are considering this is done by minimizing the mean-squared error of prediction E{yf - yff, where yis the future value of yand yis the predicted value. If we denote the future values of the explanatory variables by x,f, then £[ - /) ,] is called the conditional MSE (mean-square error) of prediction and / - y) is called the unconditional MSE of prediction.

To get the unconditional MSE of prediction we have to make some assumptions about x,f. Amemiya assumes that the regressors for the prediction period are stochastic and that the values of x,/have the same behavior as the variables x„ during the sample period. Under this assumption he shows that

estimate of Eiy, - y - - --

n n

where RSS, is the residual sum of squares fsrom the model with ky regressors.

Now CT has to be estimated. If we use aj, ~ RSS/(« - k), where RSS is the residual sum of squares from the complete model with explanatory variables, we get Mallows criterion. On the other hand, if we use u] = RSS,/(n -/ ,), then we get Amemiyas PC criterion. Note, however, that RSS,/(n - ky) is

"C. L. Mallows, "Some Comments on C," Technometrics, November 1973, pp. 661-676. The criterion was first suggested by Mallows in 1964 in an unpublished paper. "R. R. Hocking, "The Analysis and Selection of Variables in Multiple Regression," Biometrics, March 1976, pp. 1-49. This has been further discussed in two papers by M. L. Thompson, "Selection of Variables in Multiple Regression, Part 1: A Review and Evaluation," and "Part 2: Chosen Procedures, Computations and Examples," International Statistical Review, Vol. 46, 1978, pp. 1-19 and pp. 129-146. Also see U. Hjorth, "Model Selection and Forward Validation," Scandinavian Journal of Statistics, Vol. 9, 1982, pp. 1-49.

"T. Amemiya, "Selection of Regressors," International Economic Review, Vol. 21, 1980, pp. 331-354.

"We will not be concerned with the proof here. It can be found in Amemiya, "Selection of Regressors."

of picking the correct model is considerably below 1 when the alternative model includes a number of irrelevant variables.

Criteria Based on Minimizing the Mean Square Error of Prediction

Theils criterion is based on minimizing the standard error of the regression. The following three criteria are based on minimizing the mean-square error of prediction:



This is in fact the basis for the maximum R- rule.

T. Kinal and K. Lahiri, "A Note on Selection of Regressors," International Economic Review, Vol. 25, No. 3, October 1984.

*L. Breiman and D. Freedman, "How Many Variables Should Be Entered in a Regression Equation?" Journal of the American Statistical Association, March 1983, pp. 131-136.

an upward-biased estimate of cr because of the fact that it is an estimate from a misspecified regression.** On the other hand, u] is an unbiased estimate of -if it is assumed that the model with A;, variables is the correct model and the model with variables includes a number of irrelevant variables. This is the "optimistic" assumption that Amemiya makes. However, if we are comparing different models with different sets of variables by the PC criterion, we cannot make the "optimistic" assumption that every one of these is the true model. This is one of the major problems with Amemiyas PC criterion. It is more reasonable to assume that is the appropriate estimate of a as assumed in Mallows Cp criterion.

The important thing to note in the discussion above is that the regressors in the sample period are assumed nonstochastic, whereas they are assumed stochastic in the prediction period. Hocking, by contrast, assumes the regressors to have a multivariate normal distribution and derives the criterion given in Table 12.3 by minimizing the unconditional MSE of prediction. Kinal and Lahiri show that in the stochastic regressor case Amemiyas PC and Mallows Cp both reduce to Hocking Sp criterion. Breiman and Freedman"" give an alternative justification for Hockings Sp criterion. It is not necessary for us to go through the different ways in which the criteria Cp and Sp have been derived and justified. This discussion can be found in the papers by Thompson cited earlier. The important thing to note is that Amemiyas PC and Mallows Cp depend on the assumption of nonstochastic regressors, whereas Hockings Sp depends on the assumption of stochastic regressors. A consequence of this assumption is that (as noted by Kinal and Lahiri) both the Cp and PC criteria require an estimate of the variance cr- of the disturbance in the true model, whereas Sp does not. It just requires an estimate of the variance of the disturbance term in the restricted model.

One more important thing to note is that the maximum R criterion and the predictive criteria C, S, and PC answer two different questions. In the case of the maximum R criterion, what we are trying to do is pick the "true" model, assuming that one of the models considered is the "true" one. In the case of the prediction criteria, we are interested in "parsimony" and we would Uke to omit some of the regressors (even if a model that includes them is the true model) if this improves the MSE of prediction. For the latter problem, the question is whether we need to assume the existence of a true model or not. For the Cp and PC criteria, as we have seen, we do need to think in terms of a "true" model. For the Sp criterion (or in the stochastic regressor case), we do not have to worry about whether one of the models is the true model or not and what regressors are there in the "true" model.



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [ 168 ] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]