back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [ 112 ] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


112

if the individual belongs to it, (first group)

+ 2

if the individual belongs to ttj (second group)

, + «2

Now estimate the multiple regression equation

= Po + Pii + 2X2 + + fikXk + Get the residual sum of squares RSS. Then

- = - ,+ 2-2

Thus, once we have the regression coefficients and the residual sum of squares from the dummy dependent variable regression, we can very easily obtain the discriminant function coefficients."

The Unear probabiUty model is only slightly different from the formulation of Fisher. In the Unear probabiUty model we define

1 if the individual belongs to it, 0 if the individual belongs to 1T2

This merely amounts to adding / ,/(, , + «2) to each observation of as defined by Fisher. Thus only the estimate of the constant term changes.

"R. A. Fisher, "The Use of Multiple Measurements in Taxonomic Problems," Annals of Eugenics, 1936, pp. 179-188.

"We are omitting the algebraic details here. They can be found in Maddala, Limited-Dependent, pp. 18-21. Also, the tests of significance for the coefficients of the linear discriminant function or the linear probability model discussed later are described there.

given a new applicant with specified socioeconomic characteristics, we would want to predict whether the applicant will get a loan or not. Let us define a linear function.

•Z = Xo + S

Then it is intuitively clear that to get the best discrimination between the two groups, we would want to choose the X, so that the ratio

between-group variance of Z

---:---- IS maximum

within-group variance ot Z

Fisher* suggested an analogy between this problem and muUiple regression analysis. He suggested that we define a dummy variable



8.9 The Probit and Logit Models

An alternative approach is to assume that we have a regression model

(8.12)

where y is not observed. It is commonly called a "latent" variable. What we observe is a dummy variable y, defined by

y, =

1 ify*>0 0 otherwise

(8.13)

The probit and logit models differ in the specification of the distribution of the error term in (8.12). The difference between the specification (8.12) and the linear probability model is that in the linear probability model we analyze the dichotomous variables as they are, whereas in (8.12) we assume the existence of an underlying latent variable for which we observe a dichotomous realization. For instance, if the observed dummy variable is whether or not the person is employed, y* would be defined as "propensity or ability to find employment." Similarly, if the observed dummy variable is whether or not the person has bought a car, then y would be defined as "desire or ability to buy a car." Note that in both the examples we have given, there is "desire" and "ability" involved. Thus the explanatory variables in (8.12) would contain variables that explain both these elements.

Note from (8.13) that multiplyingy* by any positive constant does not change y,. Hence if we observe y„ we can estimate the ps in (8.12) only up to a positive multiple. Hence it is customary to assume var(M,) = 1. This fixes the scale of y*. From the relationships (8.12) and (8.13) we get

P, = Prob(y, = 1) = Prob

M, >

1 - F

L \

Po +

where F is the cumulative distribution function of u. If the distribution of is symmetric, since 1 - F(-Z) = F(Z), we can write

/ * \

= Po + S jX.j

(8.14)

Since the observed y, are just realizations of a binomial process with probabilities given by (8.14) and varying from trial to trial (depending on :,), we can write the likelihood function as



Hence

Note that for the logit model

P "

log = Po + S jX,j

1 - P, 7=1

The left-hand side of this equation is called the log-odds ratio. Thus the log-odds ratio is a Unear function of the explanatory variables. For the Unear probability model it is P, that is assumed to be a linear function of the explanatory variables.

If the errors u, in (8.12) follow a normal distribution, we have the probit model (it should more appropriately be called the normit model, but the word probit was used in the biometrics Uterature). In this case

fZ,fa J /

F(Z,) =

-0= \/2

dt (8.17)

Maximization of the likeUhood function (8.15) for either the probit or the logit model is accompUshed by nonUnear estimation methods. There are now several computer programs available for probit and logit analysis, and these programs are very inexpensive to run.

The likeUhood function (8.15) is concave (does not have multiple maxima), and hence any starting values of the parameters would do. It is customary to start the iterations for the logit and probit models with the estimates from the Unear probability model.

Since the cumulative normal and the logistic distributions are very close to each other except at the tails, we are not likely to get very different results using (8.16) or (8.17), that is, the logit or the probit method, unless the samples are large (so that we have enough observations at the tails). However, the estimates of the parameters fi, from the two methods are not directly comparable. Since the logistic distribution has a variance , the estimates of fi, obtained from the logit model have to be multiplied by V3/tt to be comparable to the

*This is proved for a general model in J. W. Pratt, "Concavity of the Log-Likelihood," Journal of the American Statistical Association, 1981, pp. 137-159.

• L = (1 - P-) (8-15)

v,= l v, = 0

The functional form for F in (8.14) will depend on the assumption made about the error term u. If the cumulative distribution of u, is logistic we have what is known as the logit model. In this case



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [ 112 ] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]