back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [ 97 ] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


97

Equation (7.11)

Equation (7.12)

1961

295.96

298.93

302.05

304.73

Thus the prediction from the equation including L are further off from the true values than the predictions from the equations excluding L. Thus if prediction was the sole criterion, one might as well drop the variable L.

The example above illustrates four different ways of looking at the multicolUnearity problem:

1. Correlation between the explanatory variables L and Y, which is high. This suggests that the multicoUinearity may be serious. However, we explained earUer the fallacy in looking at just the correlation coefficients between the explanatory variables.

2. Standard errors or l-ratios for the estimated coefficients: In this example the /-ratios are significant, suggesting that multicoUinearity might not be serious.

3. Stability of the estimated coefficients when some observations are deleted. Again one might conclude that multicoUinearity is not serious, if one uses a 5% level of significance for this test.

4. Examining the predictions from the model: If multicoUinearity is a serious problem, the predictions from the model would be worse than those from a model that includes only a subset of the set of explanatory variables.

The last criterion should be applied if prediction is the object of the analysis. Otherwise, it would be advisable to consider the second and third criteria. The first criterion is not useful, as we have so frequently emphasized.

7.5 Solutions to the MulticoUinearity Problem: Ridge Regression

One of the solutions often suggested for the multicoUinearity problem is to use what is known as ridge regression first introduced by Hoerl and Kennard." Simply stated, the idea is to add a constant \ to the variances of the explanatory variables before solving the normal equations. For instance, in the exam-

"A. E. Hoerl and R. W. Kennard, "Ridge Regression: Biased Estimation for Non-orthogonal Problems," and "Ridge Regression: Applications to Non-orthogonal Problems," Technome-trics, Vol. 12, 1977, pp. 55-67 and 69-82.

conclude that the results are not statistically significant at the 5% level. Note that the test for stability that we use is the "predictive" test for stability.

Finally, we might consider predicting for the first two quarters of 1961 using equations (7.11) and (7.12). The predictions are:



"These methods are all reviewed in N. R. Draper and R. Craig Van Nostrand, "Ridge Regression and James-Stein Estimation. Review and Comments," Technometrics, Vol. 21, No. 4, November 1979, pp. 451-466. The authors do not approve of these methods and discuss the shortcomings of each.

pie in Section 7.2, we add 5 to 5„ and 522- It is easy to see that the squared correlation now drops to

,2 0 930

" - 205(118) ~

Thus the intercorrelations are decreased. One can easily see what a mechanical solution this is. However, there is an enormous literature on ridge regression.

The addition of \ to the variances produces biased estimators but the argument is that if the variance can be decreased, the mean-squared error will decline. Hoerl and Kennard show that there always exists a constant \ > 0 such that

i MSE(p,) < i: MSE(p,)

where , are the estimators of , from the ridge regression and 0, are the least squares estimators and is the number of regressors. Unfortunately, \ is a function of the regression parameters , and error variance cr, which are unknown. Hoerl and Kennard suggest trying different values of \ and picking the value of X so that "the system will stabilize" or the "coefficients do not have unreasonable values." Thus subjective arguments are used. Some others have suggested obtaining initial estimates of , and cf and then using the estimated \. This procedure can be iterated and we get the iterated ridge estimator. The usefulness of these procedures has also been questioned."

One other problem about ridge regression is the fact that it is not invariant to units of measurement of the explanatory variables and to linear transformations of variables. If we have two explanatory variables x, and Xj and we measure x, in tens and x, in thousands, it does not make sense to add the same value of X to the variances of both. This problem can be avoided by normalizing each variable by dividing it by its standard deviation. Even if x, and are measured in the same units, in some cases there are different linear transformations of X and X2 that are equally sensible. For instance, as discussed in Section 7.4, equations (7.5), (7.6), (7.7) are all equivalent and they are all sensible. The ridge estimators, however, will differ depending on which of these forms is used.

There are different situations under which the ridge regression arises naturally. These will throw light on the matter of the circumstsances under which the method will be useful. We mention three of them.

1. Constrained least squares. Suppose that we estimated the regression coefficients subject to the condition that

S ? = (7.14)



-W. G. Brown and B. R. Beattie, "Improving Estimates of Economic Parameters by the Use of Ridge Regression with Production Function Applications," American Journal of Agricultural Economics, Vol. 57, 1975, pp. 21-32.

Then we would get something like the ridge regression. The X thaie use is the Lagrangian multiplier in the minimization. To see this, supposethat we have two explanatory variables. We get the constrained least squares estimator by minimizing

X (y - P,, - + m + Pi - c)

where \ is the Lagrangian multiplier. Differentiating this expression with respect to p, and P2 and equating the derivatives to zero, we get the normal equations

2 E (y - P.. - -,) + 2\p, = 0 2 S (y - Pi-i - P22)(-2) + 2\P2 = 0 These equations can be written as

(5„ + \)p, + 5,2P2 = S,y

5,2P, + (522 + )P2 = 2,

where 5„ = 2 x\, = 2 , and so on. Thus we get the ridge regression and \ is the Lagrangian multipUer. The value of X is decided by the criterion Pi + P2 = c. In this case there is a clear-cut procedure for choosing \.

It is rarely the case that we would have prior knowledge about the p, that is in the form 2 ? = c. But some other less concrete information can also be used to choose the value of X in ridge regression. Brown and Beatties ridge regression on production function data used their prior knowledge on the relationship between the signs of the p,s.

2. Bayesian interpretation. We have not discussed the Bayesian approach to statistics in this book. However, roughly speaking, what the approach does is to combine systematically some prior information on the regression parameters with sample information. Under this approach we get the ridge regression estimates of the p,s if we assume that the prior information is of the form that P, ~ IN(0, (t). In this case the ridge constant \ is equal to cr/crl. Again, is not known but has to be estimated. However, in almost all economic problems, this sort of prior information (that the means of the p,s are zero) is very unreasonable. This suggests that the simple ridge estimator does not make sense in econometrics (with the Bayesian interpretation). Of course, the assumption that p, has mean zero can be relaxed. But then we will get more complicated estimators (generalized ridge estimators).

3. Measurement error interpretation. Consider the two-variable model we discussed under constrained least squares. Suppose that we add random errors with zero mean and variance \ to both jc, and X2. Since these errors are random, the covariance between jc, and X2 will be unaffected. The variances of x, and Xj



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [ 97 ] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]