back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [ 59 ] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


59

4.10 Degrees of Freedom and

If we have n observations and estimate three regression parameters as in equation (4.1), we can see from the normal equations (4.2)44.4) that the estimated residuals u, satisfy three linear restrictions:

S , = 0 2 , = 0 2 , = (4.19)

or, in essence, there are only (n - 3) residuals to vary because, given any (n - 3) residuals, the remaining three residuals can be obtained by solving equations (4.19). This point we express by saying that there are in - 3) degrees of freedom.

"See Maddala, Econometrics, p. 157.

The least squares estimators , and 2 from this misspecified equation are given by

p Si,S2y - Si2Sjy

where 5,, = 2 A, 5, = 2 , , 5,2 = S . and so on. Since = p,x, + we have £{52,) = P,5,2 and £(5„) = p,5„. Hence we get

£(P,) = , and 2) = 0

Thus we get unbiased estimates for both the parameters. This result, coupled with the earlier results regarding the bias introduced by the omission of relevant variables might lead us to believe that it is better to include variables (when in doubt) rather than exclude them. However, this is not so, because though the inclusion of irrelevant variables has no effect on the bias of the estimators, it does affect the variances. The variance of p„ the estimator of , from the correct equation is given by

V(P,) = 11

On the other hand, from the misspecified equation we have

"< - (

where r,2 is the correlation between x, and JCj. Thus var (P,) > var(P,) unless r,2 = 0. Hence we will be getting unbiased but inefficient estimates by including the irrelevant variable. It can be shown that the estimator for the residual variance we use is an unbiased estimator of . Thus there is no further bias arising from the use of estimated variance from the misspecified equation."



degrees of freedom

As we increase the number of explanatory variables RSS declines but there is a decrease in the degrees of freedom as well. What happens to d depends on the proportionate decrease in the numerator and the denominator. Thus there will be a point when will actually start increasing as we add more explanatory variables. It is often suggested that we should choose the set of variables for which is the minimum. We discuss the rationale behind this procedure in Chapter 12.

This is also the reason why, in multiple regression problems, it is customary to report what is known as adjusted R-, denoted by R}. The measure R defined earlier keeps on increasing (until it reaches 1.0) as we add extra explanatory variables and thus does not take account of the degrees-of-freedom problem.

is simply R adjusted for degrees of freedom. It is defined by the relation

\ - R= " 7 - (4.20)

n - - \

where is the number of regressors. We subtract [k + 1) from n because we estimate a constant term in addition to the coefficients of these regressors. We can write (4.20) as

(1 - mSyy (1 - R)Sy,

- 1 n - - \

(4.21)

Since 5",, and n are constant, as we increase the number of regressors included in the equation, d and (1 - R-) move in the same direction as d and R move in the opposite direction. Thus the set of variables that gives minimum d is also the set that maximizes R}.

Also, from equation (4.20) we can easily see that if i? < ( j) i R > {n - - l)/{n -1) and hence 1 - R > \. Thus R is negative! For example, with 2 explanatory variables and 21 observations, if i? < o.l, R will be negative.

There is a relationship between the t tests and F-tests outUned earlier and R-. If the / ratio for the coefficient of any variable is less than 1, then dropping that variable will increase R. More generally, if the F-ratio for any set of variables is less than 1, then dropping this set of variables from the regression equation will increase R. Since the single-variable case is a special case of the many-variable case, we will prove the latter result. Equation (4.21) shows the relationship between R and d. So, instead of asking the question of whether dropping the variables will increase R, we can as well ask the question of whether d will decrease.

Let o] be the estimate of when we drop r regressors. Then

Tn the hmiting case when the number of parameters estimated is equal to the number of observations, we get a = 0/0.

As we saw earlier, the estimate of the residual variance is given by



n - ik - r) - I

Since the unrestricted residual sum of squares is in - - the F-test outlined earlier is given by

, Ijn - + r - Daj - in - - l)ff2]/r

tin - - - - I)

Solving for 6/6-2 yields

ai a + F n - - I

- =-7 where a = -

(T a + I r

Thus &l § CT according as F § 1. What this says is that if the F-ratio associated with a set of explanatory variables is < 1, we can increase by dropping that set of variables. Since for 1 degree of freedom in the numerator, F = fi, what this means is that if the absolute value of the /-ratio for any explanatory variable is less than I, dropping that variable will increase R. However, we have to be careful about /-ratios for individual coefficients and F-ratios for sets of coefficients and we will discuss the relationships between / and F ratios. There are two cases that create problems:

Case I. The /-ratios are less than 1 but the F-ratio is greater than 1. Case 2. The /-ratios are all greater than 1 but the F-ratio for a set of varibles is < 1. ,

Case 1 occurs when the explanatory variables are highly intercorrelated. (This is called multicoUinearity, which we discuss in Chapter 7.) In this case that all the /-ratios are less than 1 does not mean that we can increase R by dropping all the variables. Once we drop one variable the other /-ratios will change.

In case 2, though by dropping any one variable we cannot increase R, it might be possible to get a higher R by dropping a set of explanatory variables. Suppose that we have a regression equation in which all the explanatory variables have /-ratios which are greater than I. Obviously, we cannot increase R by dropping any one of the variables. But how do we know whether we can increase R by dropping some sets of variables without searching over all the sets and subsets?

To answer this question we will state a simple rule that gives the relationship between / and F ratios. Consider a set of variables that are candidates for exclusion; and let Fik,ri) be the F-ratio associated with these variables in is the sample size). Then the rule says: If Fik,n) c, the absolute /-values of each of the discarded variables must be less than that is, if Fik,n) < 1, the absolute /-value of each of the k variables is < Vi The converse, however, is not true." Thus if we do not have at least variables with absolute /-values

"Potlun Rao. "On a Correspondence Between / and F Values in Multiple Regression," American Statisttcian, Vol. 30, No. 4, 1976, pp. 190-191. We just present the resuUs here; those interested in the derivation can refer to Potluri Raos paper.

restricted residual sum of squares



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [ 59 ] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]