back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [ 158 ] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


158

and p is defined eariier. From this he concludes that the use of a proxy, although generally advisable may not be a superior strategy to dropping the error-ridden variable altogether. Kinal and Lahiri" analyze this problem in greater detail and generality. They give several alternative expressions to Aigners. They conclude that including even a poor proxy is advisable under a wide range of empirical situations when the alternative is to discard it altogether.

2. The second major qualification is that the proxy variable does not always fall in the pure errors-in-variables case. Usually, the proxy variable is "some variable" that also depends on the same factors, that is, p is of the form

p = CLX + bz +

Since z is unobserved and does not have any natural units of measurement, we will assume that 8=1. We can then write

p = ax + z + e (11.19)

We will also assume that A/,, is not zero. Now it does not necessarily follow that including the proxy p leads to a smaller bias in the estimator of in equation (11.16). Thus, except in cases where the proxies fall in the category of pure errors in variables, it does not follow that using even a poor proxy is better than using none at all.

3. The third qualification is that the reduction-in-bias argument does not apply if the proxy variable is a dummy variable. This is the case where we do not observe z but we know when it is in different ranges. For instance, we do not know how to measure "effective education" but we use dummies for the amount of education (e.g., grade school, high school, college). In this case it does not necessarily follow that using the proxy results in a smaller bias compared with the omission of z altogether.

4. The reduction-in-bias argument also does not apply if the other explanatory variables are measured with error.In equation (11.16) suppose that the variable x is measured with error so that what we observe is Z = X + V. We will assume that cov(jc, v) = cov(jc, z) = cov(v, e) = 0. We can consider two estimates of p, one using the proxy p and the other omitting it. Now we cannot say anything about the direction of the biases, nor about whether the bias will increase or decrease with the introduction ofp}

T. Kinal and K. Lahiri, "Specification Error Analysis with Stochastic Regressors," Econometrica, Vol. 51, 1983. pp. 1209-1219. -Maddala, Econometrics, p. 161.

-See Maddala, Econometrics, pp. 161-162, for an example due to D. M. Grether.

See Maddala, Econometrics, pp. 304-305 referring to the results from the papers by Welch

and Griliches on estimation of the effects of schooling on income.

-"Finis Welch, "Human Capital Theory: Education. Discriminations and Life Cycles," American Economic Review, May 1975, p. 67; Zvi Griliches, "Estimating the Returns to Schooling: Some Econometric Problems," Econometrica, Vol. 45, 1977, p. 12.



11.6 proxy variables 47

Coefficient of the Pro Variable

The preceding discussion referred to a situation where our interest was in the coefficients of variables other than the proxy variable. There are also many situations where our interest is in -y the coefficient of the unobserved variable in (11.16). Since z is not observed, we cannot think of any natural units of measurement for z. Thus it is not the magnitude of -y but the sign of -y with which we are concerned. A question we would like to ask is under what conditions the use of the proxy p will give us the correct sign for -y. To answer this question we need a method of combining subjective assessments of how good the proxies are with objective information on the observed variables.

This problem has been analyzed by Krasker and Pratt.- Consider equation (11.16). Let the correlation coefficient between the unobserved variable z and the proxy variable pber*. The condition that the coefficient of the proxy variables has the same sign as 7 the coefficient of the unobserved variable in (11.16) is

{r*y>Rl,+ \ - ,.„ (11.20)

As an example they consider the determinants of motor vehicle deaths. The data are cross-section data in 1960 for the 48 contiguous states in the United States. The variables are:

y, = logarithm of the number of motor vehicle deaths per capita in the tth state in 1960

X,, = dummy variable defined to be 1 if the /th state had mandatory motor vehicle inspection, 0 otherwise

X2, = logarithm of per capita gasoline consumption in state / in 1960

X3, = logarithm of the fraction of the rth states population that was 18-24 in 1960

JC4, = logarithm of the number of automobiles (per capita) older than 9 years in state / in 1960

The estimated regression equation (with standard errors in parentheses) is y, = -4.53 - Q.lZxy, + 1.11x2, + \.A9Xi, + Q.QAx,

(1.77) (0 06) (0 23) (0 37) (0 12)

X2, is considered a proxy variable for an unobserved variable z which is "per capita exposure to situations that create the possibility of fatal accidents." The

W. S. Krasker and J. W. Pratt, "Bounding the Etfects of Proxy Variables on Regression Coefficients," Econometrica, Vol. 54, 1986, pp. 641-655.



11.7 Some Other Problems

We introduced the simple errors in variables model in Section 11.2 as a starting point of our analysis. This simple model may not be applicable with most economic data because of the violation of some of the assumptions implied in this simple model. The crucial assumptions made in that model are:

1. The errors have zero mean.

2. The covariances between the errors and the systematic parts are zero.

3. The covariances between the errors themselves are zero.

Based on these conclusions, we showed that the OLS estimator p is not consistent and is biased toward zero. We will now show that:

1. The problem of obtaining consistent estimators can be solved in some cases if we have more equations in which the same error-ridden variable occurs.

2. If we consider correlated errors, the least squares estimators need not be biased toward zero. In fact, the OLS estimator p may overestimate (rather than underestimate) p.

Thus the conclusions derived in Section 11.2 will not be correct.

question is whether the proxy has the same sign as the coefficient of this unobserved variable z. The 1 - from a regression of :,, on jc,„ :,,, x, is 0.3895. The R~ from a regression of x,, on ,. i/. •?.> x, is 0.6193. Hence, according to (11.20), we can be sure that the coefficient of Xi, has the same sign as the coefficient of z, regardless of other correlations if we are sure that the correlation between jCj, and z exceeds

(0.3895 + 1 - 0.6193)"2 = 0.878

Krasker and Pratt conclude that "To ensure that the signs of the coefficients coincide with the signs of the unobservable true variables, the proxies must be of much higher quality than could be hoped for in the actual context."

Krasker and Pratt also give alternative formulas and methods of determining the correctness of the sign for the other coefficient p in (11.16) as well. Since these expressions are somewhat complicated, we will not reproduce them here. The condition given in (11.20) would enable us to judge the sign of -y. One reason why they get such a stringent condition is that they relax the usual assumptions made in the errors-in-variables models. They do not make the assumption that the error in p is independent of z, x, or u.

To compute the Krasker-Pratt criterion (11.20) we have to compute the /?s from a regression of the proxy on the other explanatory variables, and a regression of the proxy on the other explanatory variables and the dependent variable.



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [ 158 ] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]