back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [ 58 ] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


58

var(p,) =

where

On the other hand,

5„(1 - tn) cr2

var(P,) =

Thus p, is a biased estimator but has a smaller variance than p,. In fact, the variance would be considerably smaller if r\i is high. However, the estimated standard error need not be smaller for p, than for Pj. This is because , the estimated variance of the error, can be higher in the misspecified model. It is given by the residual sum of squares divided by degrees of freedom, and this can be higher (or lower) for the misspecified model.

Let us denote the estimated variance by 5. Then the formula connecting the estimated variances is

54Pi) 1 -

5 ,) 1 - ,

Thus the standard error of p) will be less than the standard error of p, only if > /2i-

We have considered the case of only one included and one omitted variable. In the case where we have - 1 included variables and the ki\\ variable omitted, formula (4.16) generalizes to

£(P,) = p, + ,Pj / = 1, 2,----k - I (4.17)

where fc, is the regression coefficient of x, in the auxiliary regression of jc on x„ X2, . . . , kix- That is, we consider the regression of the omitted variable jc on all the included variables. In the general case" where we have several included variables and several

"We will not go through the derivation here because the use of matrix notation is unavoidable. Proofs can be found in many textbooks in econometrics. See, for instance, Maddala, Econometrics, p. 461.

where b,! = S 1 is the regression coefficient from a regression of X2

on jc).

Thus , is a biased estimator for p, and the bias is given by bias == (coefficient of the excluded variable)

(regression coefficient in a regression of the excluded variable on the included variable j

If we denote the estimator for p, from (4.15) by p,, the variance of p, is given by



Example 1: Demand for Food in the United States

Consider the estimation of the demand for food in the United States based on the data in Table 4.9. Suppose that the "true" equation is

& = ct + xPd + PjF + «

However, we omit the income variable. We get

= 89.97 - 0.107 cP- = 2.338

(1185) (0 118)

(Figures in parentheses are standard errors.)

The coefficient of has the wrong sign. Can this be attributed to the omission of the income variable? The answer is yes, because the coefficient of is a biased estimate with the bias given by

bias = (coefficient of the income variable)

X (the regression coefficient of income on price)

The coefficient of income is expected to be positive. Also, given that the data are time-series data, we would expect a positive correlation between and Y. Hence the bias is expected to be positive, and this could turn a negative coefficient to a positive one. In this case the regression equation with F included gives the result

= 92.05 - 0.142Pc + 0.236 = 1.952

(5 841 (0 067) (0 031)

omitted variables, we have to estimate the "auxiliary" multiple regressions of each of the excluded variables on all the included variables. The bias in each of the estimated coefficients of the included variables will be a weighted sum of the coefficients of all the excluded variables with weights obtained from these auxiliary multiple regressions.

Suppose that we have k explanatory variables, of which the first are included and the remaining (k - ,) are omitted. Then the formula corresponding to (4.16) and (4.17) is

= P, + i P, i - 1, 2, . . . , A: (4.18)

where b, is the regression coefficient of the rth included variable in a regression of theyth omitted variable on all the included variables. Note that we pick the coefficients of the rth included variable from the {k - / ,) auxiliary multiple regressions.

The formulas (4.16)-(4.18) can be used to get some rough estimates of the direction of the biases in estimated coefficients when some variables are omitted because of lack of observations or because they are not measurable. We will present two examples, one in which the omitted variable is actually measured and the other in which it is not.



(Figures in parentheses are standard errors.) Note that the coefficient of F/, is now negative. Also, note that the standard error of the coefficient of is higher in the misspecified model than in the "true" model. This is so despite the fact that the variance of p, is expected to be smaller in the misspecified model. (Check the relationship between /j and /21 with the data.)

Example 2: Production Functions and Management Bias

In the estimation of production functions we have to omit the quality of inputs and managerial inputs because of lack of measurement. Consider the estimation of the production function

= , , + + : +

where

= log of output

X = log of labor input

2 = log of capital input

= log of managerial input

Now jcj is not observable. What will be the effect of this on estimates of p, and P2? From the formula (4.17) we have

F(P,) = p, + 1 F(P2) = P2 +

where fcj, and 632 are the regression coefficients in the regression of jc3 on jc, and x2. Now p, + P2 is often referred to as "returns to scale." Let us denote this by S. The estimated returns to scale, is p, + P2. Thus

{¨) = 5 + (6 1 + )

Since is expected to be positive, the bias in the estimation of returns to scale will depend on the sign of fc,, + 32. If we assume that managerial input does not increase proportionately with measured inputs of labor and capital, we would expect fc,, + /732 to be negative and thus there is a downward bias in the estimates of returns to scale.

Inclusion of Irrelevant Variables

Consider now the case of inclusion of irrelevant variables. Suppose that the true equation is

= p,x, +

but we estimate the equation

= p,x, + P2X2 + V



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [ 58 ] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]