back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [ 56 ] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


56

0.07

1 1\ 20 10/

+ 4 - + - + -I (0.7) = 0.213

which is higher because of the last term turning positive. If :, and x2 are highly correlated, we will observe wide discrepancies in the variances of the prediction error for the same Euclidean distance of the value of xq from the sample mean. Thus the simple relationship we found in the case of simple regression does not hold in multiple regression.

4.8 Analysis of Variance and Tests of Hypotheses

In Section 4.3, result 6, we discussed an F-test to test hypotheses about (3, and - An alternative expression for this test is defined by the statistic

/7 = (RRSS - URSS)/r

URSS/(« - - I)

where URSS = unrestricted residual sum of squares

RRSS = restricted residual sum of squares obtained by imposing the restrictions of the hypothesis r = number of restrictions imposed by the hypothesis

The derivation of this test is given in the appendix to this chapter. A proof in the general case can be found in graduate textbooks in this area.

As an illustration, consider the hypothesis (3, = 1, 2 = 0 in the illustrative example in Section 4.3. The unrestricted residual sum of squares is URSS = 1.4. To get the restricted residual sum of squares RRSS, we have to minimize

2 (y, - « - l.Ox,, - . 2.

Since both Pi and pj are specified, there is only a to be estimated and we get a = - l.Oxj. Thus

RRSS = X [y, - - (., - x,)f = 5„ + 5„ - 25„ = 10.0 + 12.0 - 2(10.0) = 2.0

Also, we have r = 2, n = 23, = 2. Hence

(2.0 - 1.4)/2 0.3 1.4/20 0.07

which is exactly the value we obtained in Section 4.3. In the special case where the hypothesis is

P, = p2 = • • • = P. = 0

See, for instance, Maddala, Econometrics, pp. 457-460.



4 8 ANALYSIS OF VARIANCE AND TESTS OF HYPOTHESES

we have URSS = 5,(1 - R, and RRSS = 5™. Hence the test is given by

[Syy - 5,,(1 - R)yk

n - - I

Syy{\ - R)Kn - - I) I ~ R

(4.14)

which has an F-distribution with degrees of freedom and {n - - 1). What this test does is test the hypothesis that none of the xs influence y; that is, the regression equation is useless. Of course, a rejection of this hypothesis leaves us with the question: Which of the xs are useful in explaining y?

It is customary to present this test in the form of an analysis of variance table similar to Table 3.3 we considered for simple regression. This is shown in Table 4.5. What we do is decompose the variance of into two components: that due to the explanatory variables (i.e., due to regression) and that which is unexplained (i.e., residual).

As an illustration, consider the hospital cost regression in Section 4.6. The analysis of variance table is given in Table 4.6. This F value is highly significant. The 1% point from the F-tables with 8 and 168 d.f. is 2.51 and the observed F is much higher.

Of course we reject the hypothesis that p, = = • • • = (3 = 0. All this means is that the case-mix variables are important in explaining the variation in average cost per case between the hospitals. But it does not say which variables are important.

Table 4.5 Analysis of Variance for the Multiple Regression Model

Source of Variation

Sum of Squares, SS

Degrees of Freedom, df.

Mean Square, SSId.f

Regression

RSyy

R%ylk = MS,

F MS, MS2

Residual Total

(1 - R)Syy

n- - I n - 1

" - f = MS.

n - - I

Table 4.6 Analysis of Variance for the Hospital Cost Regression in Section 4.6

Source of Variation

Sum of Squares, SS

Degrees of Freedom, d.f.

Mean Square, SSId.f

Regression Residual

10,357 23,311

8 168

1294.625 F 138.756

1294.625 138.756 = 9.33

Total

33,668



Nested and Nonnested Hypotheses

The hypotheses we are interested in testing can usually be classified under two categories: nested and nonnested. Consider, for instance, the regression models

= P,jc, + P2X2 + P3JC3 + model 1 = P,jc, + P2X2 + model 2

A test of the hypothesis : = 0 versus ,: # is a test of the hypothesis that the data are generated by model 2 versus that the data are generated by model 1. Such a hypothesis is called a nested hypothesis because the parameters in model 2 form a subset of the parameters in model 1. A hypothesis

: Pi + + = 0 versus H,: P + P2 + # 0

can also be called a nested hypothesis because we can reparametrize the original equation as

= (p, -I- p2 + ) , + - ) + - Xx) + = 7 , - P2(X2 - Xl) - ( - Xl) +

where 7 = , - , - ,. Now consider the parameter set as (7, P2, P3) and : 7 = 0 versus ,: 7 5* 0. Similarly, if we have the hypothesis

: Pi + P2 + = 0, P2 - = 0

we reparametrize the original equation by defining 71 = p, -I- P2 -I- Pj, 72 = P2 - . = so that = 7 , P2 = 72 + 7 , Pi = 7i - 72 - 273 and the original model becomes

= (71 7 72 - 27 ) + (72 + ) 2 + + = 7X, + 72(2 - Xl) + 7 ( + 2 - 2x,) +

Now we consider the parameter set (71, 72, 73) and specifies the values of 7i and 72-

Suppose, on the other hand, that we consider the two regression models:

= p,jc, + w, model 3

= P2X2 + U2 model 4

: the data are generated by model 3

: the data are generated by model 4

Now the parameter set specified by model 3 is not a subset of the parameter set specified by model 4. Hence we say that hypotheses and , are nonnested.

In the following sections we consider nested hypotheses only. The problem of selection of regressors, for instance, whether to include Xi only or JC2 only, is



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [ 56 ] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]