back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [ 94 ] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


94

As an illustration, consider the case where

S„ = 200 S„ = 350 Sn = 150 S2, = 263 Sn =

so that the normal equations are

2000, + 150§2 = 350 150p, + 11302 = 263

The solution is 0, = 1 and 02-1- Suppose that we drop an observation and the new values are

S, = 199 5h = 347.5 S,2 = 149 52, = 261.5 522 = 112

Now when we solve the equations

1990, + 14902 = 347.5 1490, + 11202 = 261.5

we get 0, = -i 02 = 3.

Thus very small changes in the variances and covariances produce drastic changes in the estimates of the regression parameters. It is easy to see that the correlation coefficient between the two explanatory variables is given by

" - 200(113) - -

which is very high.

In practice, addition or deletion of observations would produce changes in the variances and covariances. Thus one of the consequences of high correlation between x, and x, is that the parameter estimates would be very sensitive to the addition or deletion of observations. This aspect of multicollinearity can be checked in practice by deleting or adding some observations and examining the sensitivity of the estimates to such perturbations.

One other symptom of the multicollinearity problem that is often mentioned is that the standard errors of the estimated regression coefficients will be very high. However, high values of need not necessarily imply high standard errors, and conversely, even low values of can produce high standard errors. In Section 4.3 we derived the standard errors for the case of two explanatory variables. There we derived the following formulas:



cov(p„ = - s ,f" (7.3)

where is the variance of the error term. Thus the variance of p, will be high if

1. CT is high.

2. 5,1 is low.

3. 12 is high.

Even if /-?2 is high, if is low and S,, high, we will not have the problem of high standard errors. On the other hand, even if /2 is low, the standard errors can be high if ct- is high and S,, is low (i.e., there is not sufficient variation in JC,). What this suggests is that high values of /-,2 do not tell us anything whether we have a multicoUinearity problem or not.

When we have more than two explanatory variables, the simple correlations among them become all the more meaningless. As an illustration, consider the following example with 20 observations on x,, X2, and x.

X, = (1, 1, 1, 1, 1, 0, 0, 0, 0, 0, and 10 zeros) x2 = (0, 0, 0, 0, 0, 1, 1, 1, 1, 1, and 10 zeros) x3 = (1,1, 1, 1, 1, 1, 1, 1, 1, 1, and 10 zeros)

Obviously, x3 = X, + x2 and we have perfect multicoUinearity. But we can easily see that r,2 - ~h and /",3 - /"23 - 1/V3 = 0.59, and thus the simple correlations are not high. In the case of more than two explanatory variables, what we have to consider are multiple correlations of each of the explanatory variables with the other explanatory variables. Note that the standard error formulas corresponding to (7.1) and (7.2) are

where and S„ are defined as before in the case of two explanatory variables and Rj represents the squared multiple correlations coefficient between x, and the other explanatory variables. Again, it is easy to see that ( ,) will be high if

1. CT is high.

2. S„ is low.

3. Rj is high.

Thus high Rj is neither necessary nor sufficient to get high standard errors and thus multicoUinearity by itself need not cause high standard errors.

There are several rules of thumb that have been suggested in the literature to detect when multicoUinearity can be considered a serious problem, r in-



7.2 SOME ILLUSTRATIVE EXAMPLES 273

stance, Klein says: "Intercorrelation of variables is not necessarily a problem unless it is high relative to the overall degree of multiple correlation." By Kleins rule multicollinearity would be regarded as a problem only if R; < Rj where R; is the squared multiple correlation coefficient between and the explanatory variables, and R; is as defined earlier. However, note that even if Rl < Rl we can still have significant partial correlation coefficients (i.e., significant r-ratios for the regression coefficients). For example, suppose that the correlations between y, x, and X2 are given by

1.00

0.95

0.95

0.95

1.00

0.97

0.95

0.97

1.00

Then it can be verified that Rl 12 = 0.916 and/-f, = 0.941. Thus/?J < r]2- But rlx.2 = rl2., - 0.14. Since the relationship between the r-ratio and partial is given by (see Section 4.5)

f + degrees of freedom

we will get /-values greater than 3 if the number of observations is greater than 60.

We can summarize the previous discussion as follows;

1. If we have more than two explanatory variables, we should use Rj values to measure the degree of intercorrelations among the explanatory variables, not the simple correlations among the variables.

2. However, whether multicollinearity is a problem or not for making inferences on the parameters will depend on other factors besides Rjs, as is clear from equation (7.4). What is relevant is the standard errors and /-ratios. Of course, if R] is low, we would be better off. But this argument is only a poor consolation. It is not appropriate to make any conclusions about whether multicollinearity is a problem or not just on the basis of Rls. The Rjs are useful only as a complaint. Moreover, the Rjs depend on the particular parametrization adopted, as we discuss in the next section.

L. R. Klein, An Introduction to Econometrics (Englewood Cliffs, N.J.: Prentice Hall, 1962), p. 101.

W course, some other measures based on the eigenvalues of the correlation matrix have been suggested in the literature, but a discussion of this is beyond our scope. Further, all these measures are only a "complaint" that the explanatory variables are highly intercorrelated; they do not tell whether the problem is serious.



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [ 94 ] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]