back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [ 39 ] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


39

Thus

where

where

Ax, y) = ) )

fiy) =

VZttct,

fix\y) = /7z:„-,r,-=2

2aKl - p2)

iy - )

(x - „

Thus we see that the marginal distribution of is normal with mean and variance o. The conditional distribution of x given is also normal with

mean = H--(y - mj

variance = alii - p)

The conditional distribution of given x is just obtained by interchanging x and in the foregoing relationships.

Thus for a bivariate normal distribution, both the marginal and conditional distributions are univariate normal." Note that the converse need not be true, that is, if the marginal distributions of X and Y are normal, it does not necessarily follow that the joint distribution of X and Y is bivariate normal. In fact, there are many nonnormal bivariate distributions for which the marginal distributions are both normal."

Galtons Result and the Regression Fallacy

Consider now the mean of Ffor given value of X. We have seen that it is given by

EiY]X = x) = niy + ix - m,)

"This result is more general. For the multivariate normal distribution we can show that all marginal and conditional distributions are also normal.

"C. J. Kowalski, "Non-normal Bivariate Distributions with Normal Marginals," The American Statistician, Vol. 27, No. 3, June 1973, pp. 103-106; K. V. Mardia, Families of Bivariate Distributions (London: Charles Griffin, 1970).



Summary

1. The present chapter discusses the simple linear regression model with one explained variable and one explanatory variable. The term regression literally means "backwardation," but that is not the way it is used today, although that is the way it originated in statistics. A brief history of the term is given in Section 3.1, but it is discussed in greater detail in Section 3.12 under the title "regression fallacy."

2. Two methods-the method of moments (Section 3.3) and the method of

"F. L. Thorndike, "Regression Fallacies in the Matched Group Experiment," Psychometrika, Vol. 7, No. 2, 1942, pp. 85-102.

The slope of this line is fxjja and if - a, since p < 1 we have the result that the slope is less than I, as observed by Galton. By the same token, if we consider E{X F = y) we get

r = y) = m, + (y - m,) (I,

Since we have assumed that = cr,, the slope of this line is also less than unity (note that we are taking dx/dy as the slope in this case). Thus if Galton had considered the conditional means of parents heights for given values of offsprings heights, he would have found a "regression" of parents heights toward the mean. It is not clear what Galton would have labeled this regression.

Such "regression" toward average is often found when considering variables that are jointly normally distributed and that have almost the same variance. This has been a frequent finding in the case of test scores. For example, if

: = score on the first test

= score on the second test

then considering the conditional means of for given values of x shows a regression toward the mean in the second test. This does not mean that the students abilities are converging toward the mean. This finding in the case of test scores has been named a regression fallacy by the psychologist Thorn-dike."

This, then, is the story of the term "regression." The term as it is used now has no implication that the slope be less than 1.0, nor even the implication of linearity.



least squares (Section 3.4)-are described for the estimation of the parameters in the linear regression model. A third method, the method of maximum likelihood (ML), is presented in the appendix. For the normal linear regression model all of them give the same results.

3. Given two variables and x there is always the question of whether to regress on x or to regress x on y. This question can always be answered if we know how the data were generated or if we know the direction of causality. In some cases both the regressions make sense. These issues are discussed in the latter part of Section 3.4.

4. In Sections 3.5 and 3.6 we discuss the procedures of obtaining

(a) Standard errors of the coefficients.

(b) Confidence intervals for the parameters.

(c) Tests of hypotheses about the parameters.

The exact formulas need not be repeated here because they are summarized in the respective sections. The main results that need to be presented are:

(a) The estimates of the regression parameters with their standard errors. Sometimes the -ratios are given instead of the standard errors. If we are interested in obtaining confidence intervals, the standard errors are more convenient. If we are interested in tests of hypotheses, presentation of the /-ratios is sometimes more convenient.

(b) The coefficient of determination r.

(c) SEE or SER. This is an estimate of the standard deviation a of the error term.

However, these statistics by themselves are not sufficient. In Section 3.8 we give an example of four different data sets that give the same regression output.

5. While considering the predictions from the linear regression model, it is important to note whether we are obtaining predictions for the particular value of or for the mean value of y. Although the point prediction is the same for the two cases, the variance of the prediction error and the confidence intervals we generate will be different. This is illustrated with an example in Section 3.7. Sometimes we are interested in the inverse prediction: prediction of x given y. This problem is discussed in Section 3.10.

6. In regression analysis it is important to examine the residuals and see whether there are any systematic patterns in them. Such analysis would be useful in detecting outliers and judging whether the linear functional form is appropriate. The problem of detection of outliers and what to do with them is discussed in Section 3.8. In Section 3.9 we discuss different functional forms where the least squares model can be used with some transformations of the data.

7. Throughout the chapter, the explanatory variable is assumed to be fixed (or a nonrandom variable). In Section 3.10 (optional) we discuss briefly what happens if this assumption is relaxed.

8. The last three sections and the appendix, which contains some derivations of the results and a discussion of the ML estimation method and the LR test can be omitted by beginning students.



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [ 39 ] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]