back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [ 113 ] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


113

1 if the loan application was accepted 0 if the loan application was rejected

Three models were estimated: the linear probability model, the logit model, and the probit model. The explanatory variables were:

AI = applicants and coapplicants income (10 dollars)

XMD = debt minus mortgage payment (10 dollars)

DF = dummy variable, 1 for female, 0 for male

DR = dummy variable, 1 for nonwhite, 0 for white

DS = dummy variable, 1 for single, 0 otherwise

DA = age of house (10 years)

NNWP = percent nonwhite in the neighborhood (x 10)

NMFI = neighborhood mean family income (lO dollars)

NA = neighborhood average age of homes (1(F years)

"T. Amemiya, "Qualitative Response Model: A Survey," Journal of Economic Literature, 1981, p. 1488.

"This example is from G. S. Maddala and R. P. Trost, "On Measuring Discrimination in Loan Markets," Housing Finance Review, 1982, pp. 245-268.

estimates obtained from the probit model (where we normalize ct to be equal to I).

Amemiya" suggests that the logit estimates be multiplied by 1/1.6 = 0.625 instead of V3/it, saying that this transformation produces a closer approximation between the logistic distribution and the distribution function of the standard normal. He also suggests that the coefficients of the linear probability model Lpnd the coefficients of the logit model related by the relations:

- 0.25Pl except for the constant term = 0.25Pl + 0.5 for the constant term

Thus if we need to make comparable to the probit coefficients, we need to multiply them by 2.5 and subtract 1.25 from the constant term. Alternative ways of comparing the models would be:

1. To calculate the sum of squared deviations from predicted probabilities.

2. To compare the percentages correctly predicted.

3. To look at the derivatives of the probabilities with respect to a particular independent variable.

Illustrative Example

As an illustration, we consider data on a sample of 750 mortgage applications in the Columbia, South Carolina, metropolitan area.- There were 500 loan applications accepted and 250 loan appHcations rejected. We define



Table 8.3 Comparison of the Probit, Logit, and Linear Probability Models: Loan Data from South Carolina"

Variable

Linear Probability Model

Logit Model

Probit Model

1.489(4.69)

2.254 (4.60)

2.030 (4.73)

-1.509(5.74)

-1.170(5.57)

-1.773 (5.67)

0.140 (0.78)

0.563 (0.87)

0.206 (0.95)

-0.266 (1.84)

-0.240(1.60)

-0.279(1.66)

-0.238 (1.75)

-0.222 (1.51)

-0.274(1.70)

-1.426 (3.52)

-1.463 (3.34)

-1.570(3.29)

NNWP

-1.762(0.74)

-2.028(0.80)

-2.360(0.85)

NMFI

0.150(0.23)

0.149(0.20)

0.194(0.25)

-0.393 (1.34)

-0.386 (1.25)

-0.425(1.26)

Constant

0.501

0.363

0.488

"Figures in parentheses are f-ratios not standard errors.

The results are presented in Table 8.3.

The coefficients of the probit model were left as they were computed. The other coefficients were adjusted as follows:

1. The coefficients of the logit model were multiplied by 0.625.

2. The coefficients of the linear probability model were multiplied by 2.5 throughout and then 1.25 was subtracted from the constant term.

These are the adjustments described in the text. The three sets of coefficients reported in Table 8.3 are not much different from each other (particularly those of the logit and the probit models).

One can compare the three models by comparing the R"s. We will illustrate this with another example in the next section after defining the different measures of Rs for the qualitative dependent variable models.

The Problem of Disproportionate Sampling

In many applications of the logit, probit, or linear probability models it happens that the number of observations in one of the groups is much smaller than the number in the other group. For instance, in an analysis of bank failures, the number of failed banks would be much smaller than the number of solvent banks. In a study of unemployment, the number of unemployed persons is much smaller than the number of employed persons. Thus either we have to get a very large data set (which is what is done in the studies of unemployment

After getting the estimates from the three models, it is always desirable to adjust the coefficients so that they are all on a comparable level. The linear probability model can be estimated by any multiple regression program. As for the logit and probit models, there are many computer programs available now (TSP, RATS, LIMDEP etc.).



Robert B. Avery and Gerald A. Hanweck, "A Dynamic Analysis of Bank Failures," Bank Structure and Competition (Chicago: Federal Reserve Bank of Chicago, 1984), pp. 380-395. "For a discussion of this point, see Maddala, Limited-Dependent, pp. 90-91. On p. 91 there is an error. The constant term should be decreased (not increased).

based on census tapes) or we have to sample the two groups at different sampling rates. For instance, in an analysis of bank failures, all the failed banks are considered in the analysis, but only a small percentage of the solvent banks are sampled. Thus the two groups are sampled at different rates. In the example of loan applications in Columbia, South Carolina, (we have presented estimates in Table 8.3) the sampUng was actually at different rates. There were 4600 applications in the accepted category and 250 applications in the rejected category. To have enough observations on females and blacks in the rejected category, it was decided to include all the 250 observations from the rejected category and get a random sample of 500 observations from the accepted category. Thus the sampling rate was 100% for the rejected group and 10.87% for the accepted group.

In such cases a question arises as to how one shouldanalyze the data. It has been commonly suggested that one should use a weighted logit (or probit or linear probability) model similar to the weighted least squares method we discussed under heteroscedasticity in Chapter 5. For instance, Avery and Han-weck argue:- "In addition, because failed and non-failed banks were sampled at different rates, it was also necessary to weight observations in estimation" (p. 387). However, this is not a correct procedure. The usual logit model can be used without any change even with unequal sampling rates. Thus the results presented in Table 8.3 are based on the usual estimation procedures with no weighting used.

Regarding the estimation of the coefficients of the explanatory variables, if we use the logit model, the coefficients are not affected by the unequal sampUng rates for the two groups. It is only the constant term that is affected. In Table 8.3 the logit coefficients are all correct, except for the constant term which needs to be decreased by log p, - log pi, where p, and are the proportions of observations chosen from the two groups for which = 1 and 0, respectively, and the logarithm is the natural logarithm. In the example in Table 8.3, the constant term for the logit model has to be decreased by log (0.1087) - log (1.0) = -2.22. Since the coefficients in Table 8.3 have been adjusted (as described earlier), we have to multiply this by 0.625. Thus the decrease in the constant terms is - 1.39, that is an increase of 1.39.

Note that this result is valid for the logit model, not for the probit model or the linear probability model. However, even for these models, although one cannot derive the results analytically, it appears that the slope coefficients are not much affected by unequal sampUng rates.

Weighting the observations is the correct procedure if there is a heteroskedasticity problem. There is no reason why the unequal sampling proportions should cause a heteroskedasticity problem. Thus weighting the observations is clearly not a correct solution. If our interest is mainly in examining which vari-



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [ 113 ] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]