back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [ 29 ] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


29

(a) Regression of on x. (b) Regression of x on

Figure 3.4. Minimization of residual sum of squares in the regression of on x and x on y.

and X and of on y. A question arises as to which of these is the appropriate one. Following are some general guidelines on this problem.

1. If the model is such that the direction of causation is known (e.g., if we say that advertising expenditures at time-? influence sales at time t but not the other way around), we should estimate the regression using sales as the explained variable and advertising expenditures as the explanatory variable. The opposite regression does not make sense. We should estimate this equation whether our objective is to estimate sales for given advertising expenditures or to estimate advertising expenditures for given sales (i.e., always estimate a regression of an effect variable on a causal variable).

2. In problems where the direction of causation is not as clear cut, and where and x have a joint normal distribution, both the regressions on X and JC on make sense and one should estimate a regression of on : to predict given x and a regression of on to predict x given y.

3. In models where and x are both measured with error, we have to estimate both the regressions of on x and : on to get "bounds" on p. This problem is discussed in Chapter 11.

4. The case of salary discrimination mentioned earlier is an example where the problem can be posed in two different and equally meaningful ways. In such cases both regressions make sense.

5. Sometimes, which regression makes sense depends on how the data are generated. Consider the data presented in Table 3.2. : is labor-hours of work and is output, and the observations are for different workers. Which of the two regressions makes sense depends on the way the data



3.5 Statistical Inference in the Linear Regression Model

In Section 3.4 we discussed procedures for obtaining the least squares estimators. To obtain the least squares estimators of a and p we do not need to assume any particular probability distribution for the errors «,. But to get interval estimates for the parameters and to test any hypotheses about them, we need to assume that the errors u, have a normal distribution. As proved in the appendix to this chapter, the least squares estimators are

1. Unbiased.

2. Have minimum variance among the class of all linear unbiased estimators. S

The properties hold even if the errors u, do not have the normal distribution provided that the other assumptions we have made are satisfied. These assumptions, we may recall, are:

1. Eiu,) = 0.

2. V(m,) = CT for all i.

3. u, and Uj are independent for / j.

4. X, are nonstochastic.

We will now make the additional assumption that the errors u, are normally distributed and show how we can get confidence intervals for a and p and test any hypotheses about a and p. We will relegate the proofs of the propositions to the appendix and state the results here.

First, we need the sampling distributions of a and p. We can prove that they have a normal distribution (proofs are given in the appendix). Specifically, we have the following results:

were generated. If the workers are given some hours of work {x), and the output they produced (y) was observed, then a regression of on x is the correct one to look at. In this case x is the controlled variable. On the other hand, if the workers were assigned some amount of output (y) and the hours of work (x) that they took to produce that output was observed, it is a regression of x on that is meaningful. In this case is the controlled variable. Here what we are considering is an experiment where one of the variables is controlled or fixed. With experimental data which of the variables should be treated as dependent and which as independent will be clear from the way the experiment was conducted. With most economic data, however, this is not a clear-cut choice.



& mad are jointly normally distributed with

£(«) = a and

var(p) =

cov{a, ) = ff (-)

These results would be useful if the error variance were known. Unfortunately, in practice, cr- is not known, and has to be estimated. If RSS is the residual sum of squares, then

=-- is an unbiased estimator for

n - 2

Also

has a x*-distribution with degrees of freedom (n - 2)

FlJrther the distribution of RSS is independent of the distributions of a and p. (Proofs of these propositions are relegated to the appendix.)

This result can be used to get confidence intervals for cr or to test hypotheses about ct-. However, we are still left with the problem of making inferences about a and p. For this purpose we use the f-distribution.

We know that if we have two variables x, ~ N(0, 1) and X2 ~ with degrees of freedom and .x, and as independent, then

Xl standard normal

Vxjk \ independent averaged

has a r-distribution with d.f. k.

In this case (P - P)/Va/S„ ~ N(0, 1). (We have subtracted the mean and divided it by the standard deviation.) Also. RSS/a ~ xl-2 and the two distributions are independent. Hence we compute the ratio

(n - 2)tr

has a r-distribution with d.f. (n - 2). Now cr cancels out and writing RSS/ in - 2) as d- we get the result that (P - p)/Vd/5„ has a l-distribution with



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [ 29 ] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]