back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [ 33 ] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


33

Since / = 2.228 from the Mables with 10 degrees of freedom, the 95% confidence interval for is 235 ± 2.228(0.131) = 235 ± 0.29, that is, (234.71, 235.29).

Prediction of Expected Values

Sometimes given Xo, it is not but £( ) that we would be interested in predicting; that is. we are interested in the mean of , not as such. We will give an illustrative example of this. Since £( ) = a + Xq, we would be predicting this by ) = a + Px, which is the same as Sq that we considered earlier. Thus our prediction would be the same whether it is or ( ) that we want to predict. However, the prediction error would be different and the variance of the prediction error would be different (actually smaller). This means that the confidence intervals we generate would be different. The prediction error now is

£( ) - ) = (a - «) + ( - )

Note that this is the same as the prediction error - with - « missing. Since the variance of this is ct, we have to subtract from the error variance we obtained earlier. Thus

vartya) - )! =

1 , (0 -n S,r

The standard error of the prediction is given by the square root of this expression after substituting for cr. The confidence intervals are given by £( ) ± tSE. Where / is the appropriate /-value.

Illustrative Example

Consider the sample of the athletic sportswear store considered in Section 3.3 earlier. The regression equation estimated from the data from 5 months presented there is

S = 1.0 + l.2x X = 3.0 RSS = 8.8

J = lO.O + 0.90 : -

where = consumer expenditures JC = disposable income

We are given of = 0.01, x = , and 5„ = 4000. Given Xo = 250, our predic- of is

fo = 10.0 + 0.9(250) = 235



= 2.1a2

Since

the standard error of the prediction is

V2.1(2.93) = V6J53 = 2.48

The 5% point from the /-distribution with 3 d.f. is 2.353. The 90% confidence interval for given Xc = 6 is, therefore

8.2 ± 2.48(2.353) = (2.36, 14.04)

Thus the 90% confidence interval for sales revenue if advertising expenditures are $600 is ($2360; $14,040).

Consider now the case where the sales manager wants us to predict the average sales per month over the next two years when advertising expenditures are $600 per month. He also wants a 90% confidence interval for the prediction.

Now we are interested in predicting £( ), not . The prediction is still given by 1.0 + 1.2(6) = 8.2. The variance of the prediction error is now

i , (6 - 3)2 5 10

= 1.1ct2

Substituting = 2.93 as befote and taking the square root, we now get the standard error as

SE[£(yo)] = 1.795

The 90% confidence interval now is

8.20 ± 2.353(1.795) = 8.20 ± 4.22

Thus the 90% confidence interval for the average sales is ($3980; $12,420). Note that this confidence interval is narrower than the one we obtained for .

3.8 Outliers

Very often it happens that the estimates of the regression parameters are influenced by a few extreme observations or outliers. This problem can be detected if we study the residuals from the estimated regression equation. Actually, a

Suppose that the sales manager wants us to predict what the sales revenue will be if advertising expenditures are increased to $600. He would also like a 90% confidence interval for his prediction. We have xq = 6. Hence

= 1.0 + 1.2(6) = 8.2 The variance of the prediction error is



Data Set:

Variable:

Observation

10.0

8.04

9.14

7.46

6.58

6.95

8.14

6.77

5.76

13-0

7.58

8.74

12.74

7.71

8.81

8.77

7.11

11.0

8.33

9.26

7.81

8.47

14.0

9.96

8.10

8.84

7.04

7.24

6.13

6.08

5.25

4.26

3.10

5.39

19.0

12.50

12.0

10.84

9.13

8.15

5.56

4.82

7.26

6.42

7.91

5.68

4.74

5.73

6.89

detailed analys of residuals should accompany every estimated equation. Such analysis will enable us to see whether we are justified in making the assumption that:

1. error variance V{u,} = o for all i. This problem is treated in Chapter 5.

2. The error terms are serially independent. Ibis problem is treated in Chapter 6.

3. The regression relationship is linear. This problem is treated in Section 3.9.

What we wfll be concerned with here is detecting some outlying observations using analysis of residuals. A more detail discussion of analysis of residuals is given in Chapter 12. Actually, what we are doing is a diagnostic checking of our patient (regression equation) to see whether anything is wrong.

An outlier is an observation that is far remove from the rest of the observations. This observation is usually generated by some unusual factors. However, when we use the least squares method this single observation can produce substantial changes in the estimated regression equation. In the case of a simple regression we can detect outliers simply by plotting the data. However, in the case of multiple regression such plotting would not be possible and we have to analyze the residuals

A good example to show that a simple presentation of the regression equation with the associated standard errors and does not give us the whole picture is given by Anscombe.* There are four data sets presented in Table 3.5. The values of X for the first three data sets are the same. All four data sets give the same regression equation.

F. J. Anscombe, "Graphs in Statistical Analysis," The American Statistician, Vol. 27, No. 1, February 1973, pp. 17-2!.

3.5 Four Data Sets That Give an Identical Regression Equation



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [ 33 ] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]