back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [ 27 ] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


27

2 y, = « + P 2

= d + 0jc (3.6)

§ = 0 2(y, - et - px,)(- x>} = 0

2 - = A E -"z + E (3.7)

Equations (3.6) and (3.7) are called the normal equations. Substituting the value of d from (3.6) into (3.7), we get /

The method of least squares requires that we should choose a and p as estimates of a and p, respectively, so that

Q=liy.-&~ (3.5)

is a minimum. Q is also the sum of squares of the (within-sample) prediction errors when we predict y, given x, and the estimated regression equation. We will show in the appendix to this chapter that the least squares estimators have desirable optimal properties. This property is often abbreviated as BLUE (best least unbiased estimators). We are relegating the proof to the appendix so that readers not interested in proofs can proceed.

The intuitive idea behind the least squares procedure can be described figuratively with reference to Figure 3.2, which gives a graph of the points (x, y,). We pass the regression line through the points in such a way that it is "as close as possible" to the points. The question is what is meant by "close." The procedure of minimizing in (3.5) implies that we minimize the sum of squares of vertical distances of the points from the line. Some alternative methods of measuring closeness are illustrated in Figure 3.4.

With the readily available computer programs, readers interested in just obtaining results need not even know how all the estimators are derived. However, it is advisable to know a little bit about the derivation of the least squares estimators. Readers not interested in the algebraic detail can go to the illustrative examples.

To minimize Q in equation (3.5) with respect to a and p, we equate its first derivatives with respect to a and to zero. This procedure yields (in the following equations, 2 denotes ,)



= nxiy - ) + 2 ?

(3.8)

Let us define

« 2

Syy = 2 (y. - = E ? -

5 = 2 ix, - x)iy, - y) = 1 x,y, - nxy

5., = 2ix,- xf = 2x] - nx Then (3.8) can be written as

05„ = 5 or 0 = Hence the least squares estimators for a and p are

(3.9)

P =

and a = - px

(3.10)

The estimated residuals are

u, = y, - a - Px, The two normal equations show that these residuals satisfy the equations

1 u, = 0 and 2 , , = 0 The residual sum of squares (to be denoted by RSS) is given by RSS = 2 (y, - - px,)2

= 2ly.-y - - x)f

= 2iy,- yf + E ix. - - 20 2 iy. - y)ix, - X) = S„ + p25„ - 2p5

But p = SJS. Hence we have

RSS = S„-f

= S„ - p5,,

5j,j, is usually denoted by TSS (total sum of squares) and p5 is usually denoted by ESS (explained sum of squares). Thus

TSS = ESS + RSS

(total) (explained) (residual)

Some other authors like to use RSS to denote regression sum of squares and ESS as error sum of squares. The confusing thing here is that both the words "explained" and "error" start with the letter "e" and the words "regression" and "residual" start with the letter "r." However, we prefer to use RSS for



- = TSS - RSS p5 " TSS ~ TSS ~ 5,

Summary

The estimates for the regression coefficients are

= and a = -

The residual sum of squares is given by and the coefficient of determination is given by

RSS = 5„ - = 5 - 0S„ = Syyil - ,)

rv « «

»

The least squares estimators 0 and d yield an estimated straight line that has a smaller RSS than any other straight line.

The Reverse Regression

We have until now considered the regression of on x. This is called the direct regression. Sometimes one has to consider the regression of x on as well. This is called the reverse regression. The reverse regression has been advocated in the analysis of sex (or race) discrimination in salaries. For instance, if

= salary

x = qualifications

-This is also the negation used in J. Johnston. Econometric Methods, 3rd ed. (New York: McGraw-Hill, 1984).

residual sum of squares and ESS for explained sum of squares.- We will reserve the word residual to denote = - a ~ and the word error to denote the disturbance w in equation (3.3). Thus residual is the estimated error.

The proportion of the total sum of squares explained is denoted by rj,, where r„ is called the correlation coefficient. Thus /7, = ESS/TSS and I - ri = RSSTSS. If ti, is high (close to 1), then x is a good "explanatory" variable for y. The term is called the coefficient of determination and must fall between zero and 1 for any given regression. If rj, is close to zero, the variable x explains very little of the variation in y. If r is close to 1, the variable x explains most of the variation in y.

The coefficient of determination rj, is given by



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [ 27 ] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]