back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [ 98 ] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


98

will both increase by \. Thus we get the ridge regression estimator. This interpretation makes the ridge estimator somewhat suspicious. Smith and Campbell" say that a one-liner summary of this is: "Use less precise data to get more precise estimates."

These are situations in which the ridge regression can be easily justified. In almost all other cases, there is subjective judgment involved. This subjective judgment is sometimes equated to "vague prior information." The Bayesian methods allow a systematic analysis of the data with "vague prior information" but a discussion of these methods is beyond the scope of this book.

Because of the deficiencies of ridge regression discussed above, the method is not recommended as a general solution to the multicoUinearity problem. Particularly the simplest form of the method (where a constant \ is added to each variance) is not very useful. Nevertheless, for the sake of curiosity, we will present some results on the method. For the consumption function data in Table 7.1 we estimated the regression equation

+ 2 ,-2 + • • • + 3gy,-g + U,

Needless to say, the y/s are highly intercorrelated. The results are presented in Table 7.2. Note that as X increases, there is a smoothing of the coefficients and the estimate of Po declines. The OLS coefficients, of course, are very erratic. But the estimates of Po (portion of current income going into current consumption) are implausibly low with the ridge regression method. The sudden pickup of coefficients after the fifth quarter is also something very implausible. Maybe we can just estimate the effects only up to four lags. The OLS estimates are erratic even with four lags. The computation of the ridge regression estimates with four lags is left as an exercise.

Table 7.2 Ridge Estimates for Consumption Function Data

Value of

0.0002

0.0006

0.0010

0.0014

0.0020

0.70974

0.42246

0.29302

0.24038

0.21096

0.18489

0.20808

0.28187

0.22554

0.19578

0.17773

0.16096

0.27463

0.15615

0.14612

0.13865

0.13324

0.12764

-0.48068

-0.06079

0.03052

0.05761

0.07060

0.08088

0.25129

-0.00301

0.02429

0.04473

0.05736

0.06902

-0.23845

-0.06461

-0.00562

0.02304

0.04010

0.05578

0.12432

0.01705

0.03600

0.05116

0.06135

0.07138

-0.11278

0.06733

0.07964

0.08491

0.08862

0.09254

0.19838

0.12632

0.11941

0.11563

0.11367

0.11220

0.93453

0.94277

0.94892

0.95189

0.95363

0.95529

"Gary Smith and Frank Campbell, "A Critique of Some Ridge Regression Methods" (with discussion), Journal of the American Statistical Association, Vol. 75, March 1980, pp. 74-103.



7.6 Principal Component Regression

Another solution that is often suggested for the multicoUinearity problem is the principal component regression, which is as follows. Suppose that we have explanatory variables. Then we can consider linear functions of these variables:

Z = OiX, -I- 02-2 + • • • +

Zl = byXy + biXi + • • • + bx etc.

Suppose we choose the as so that the variance of Z is maximized subject to the condition that

a] + aj + + al = I

This is called the normalization condition. (It is required or else the variance of Zl can be increased indefinitely.) Zi is then said to be the first principal component. It is the linear function of the xs that has the highest variance (subject to the normalization rule).

The detailed derivation of the principle components is given in the appendix. We will discuss the main features and uses of the method which are easy to understand without the use of matrix algebra. Further, for using the method there are computer programs available that give the principal components (zs) given any set of variables x,, X2.....X/,.

The process of maximizing the variance of the Unear function z subject to the condition that the sum of squares of the coefficients of the xs is equal to 1, produces solutions. Corresponding to these we construct k linear functions Z, Z2, • • . , Zk- These are called the principal components of the xs. They can be ordered so that

var(zi) > var(z2) > • • • > var(z/i)

Zl, the one with the highest variance is called the first principal component, Z2 with the next highest variance is called the second principal component, and so on. These principal components have the following properties:

1. var(zi) + var(z2) + + var(Zi) = var(X) -I- var(x2) + + var(xj.

2. UnUke the xs, which are correlated, the zs are orthogonal or uncorrelated. Thus there is zero multicoUinearity among the zs.

Sometimes it is suggested that instead of regressing on x,, X2, . . . , x, we should regress on Z, Z2. • - • . z,,- But this is not a solution to the multicoUinearity problem. If we regress on the zs and then substitute the values of zs in terms of xs, we finally get the same answers as before. This is similar to the example we considered in Section 7.4. The fact that the zs are uncorrelated does not mean that we wUl get better estimates of the coefficients in the original regression equation. So there is a point in using the principal components only if we regress on a subset of the zs. But there are some problems with this procedure as well. They are:



Variable

Coefficient

0.032

0.187

0.414

0.322

1.29

0.243

0.285

0.85

Constant

-19.73

4.125

-4.78

18 = 0.973

F3,4 =

168.4

"E. Malinvaud, Statistical Methods of Econometrics, 2nd ed. (Amsterdam: North-Holland, 1970).

"S. Chatterjee and B. Price, Regression Analysis by Example (New York: Wiley. 1977).

1. The first principal component z,, although it has the highest variance, need not be the one that is most highly correlated with y. In fact, there is no necessary relationship between the order of the principal components and the degree of correlation with the dependent variable y.

2. One can think of choosing only those principal components that have high correlation with and discard the rest, but the same sort of procedure can be used with the original set of variables Xi, Xj, . . . , x,, by first choosing the variable with the highest correlation with y, then the one with the highest partial correlation, and so on. This is what "stepwise regression programs" do.

3. The linear combinations zs often do not have economic meaning. For example, what does 2(income) -I- 3(price) mean? This is one of the most important drawbacks of the method.

4. Changing the units of measurement of the xs will change the principal components. This problem can be avoided if all variables are standardized to have unit variance.

However, there are some uses for the principal component method in exploratory stages of the investigation. For instance, suppose that there are many interest rates in the model (since all are measured in the same units, there is no problem of choice of units of measurement). If the principal component analysis shows that two principal components account for 99% of the variation in the interest rates and if by looking at the coefficients, we can identify them as short-term component and long-term component, we can argue that there are only two "latent" variables that account for all variations in the interest rates. Thus the principal component method will give us some guidance as to the question: "How many independent sources of variation are there?" In addition, if we can give an economic interpretation to the principal components, this is useful.

We illustrate the method with reference to a data set from MaUnvaud." We have chosen this data set because it has been used by Chatterjee and Priceto illustrate the principal component method. We will also be using this same data set in Chapter 11 to illustrate the errors in variables methods.

The data are presented in Table 7.3. First let us estimate an import demand function. The regression of on x,, Xj, gives the following results.



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [ 98 ] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]