back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [ 111 ] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


111

8.8 THE LINEAR PROBABILITY MODEL 323

least squares method. In this case the model is called the linear probability model. Another method, called the "linear discriminant function," is related to the linear probability model. The other alternative is to say that there is an underlying or latent variable y* which we do not observe. What we observe is

ify* > 0 otherwise

This is the idea behind the logit and probit models. First we discuss these methods and then give an illustrative example.

8.8 The Linear Probability Model and the Linear Discriminant Function

The Linear Probability Model

The term linear probability model is used to denote a regression model in which the dependent variable is a dichotomous variable taking the value 1 or zero. For the sake of simplicity we consider only one explanatory variable, x.

The variable is an indicator variable that denotes the occurrence or nonoccurrence of an event. For instance, in an analysis of the determinants of unemployment, we have data on each person that shows whether or not the person is employed, and we have some explanatory variables that determine the state of employment. Here the event under consideration is employment. We define the dichotomous variable

1 if the person is employed 0 otherwise

Similarly, in an analysis of bankruptcy of firms, we define

{1 if the firm is bankrupt 0 otherwise

We write the model in the usual regression framework as

y, = x. + u, (8.11)

with £(«,) = 0. The conditional expectation E{y\x) is equal to px,. This has to be interpreted in this case as the probability that the event will occur given the X,. The calculated value of from the regression equation (i.e., y, = Px,) will then give the estimated probability that the event will occur given the particular value of X. In practice these estimated probabilities can lie outside the admissible range (0, 1).

Since y, takes the value 1 or zero, the errors in equation (8.11) can take only two values, (1 - px,) and (-px,). Also, with the interpretation we have given equation (8.11), and the requirement that £( ,) = 0, the respective probabilities of these events are px, and (1 - Px,). Thus we have



.)

1 - fix,-

fix,-

-fiXi

(1 - fiXi)

Hence

var(«,) = fix/(l - fix,) + (1 - ,)(- , = fix,{l - fix,)

= E(y)[i - ]

Because of this heteroskedasticity problem the ordinary least squares (OLS) estimates of fi from equation (8.11) will not be efficient. We can use the following two-step procedure." First estimate (8.11) by least squares.

Next compute y,<l - fi) and use weighted least squares, that is, defining

H, = Vy,<l - -v,) we regress y/w, on x,-lw,. The problems with this procedure are:

1. (1 - in practice may be negative, although in large samples this will be so with a very small probability since y,(l " S,) is a consistent estimator for ([1 - ]).

2. Since the errors u, are obviously not normally distributed, there is a problem with the application of the usual tests of significance. As we will see in the next section, on the linear discriminant function, they can be justified only under the assumption that the explanatory variables have a multivariate normal distribution.

3. The most important criticism is with the formulation itself: that the conditional expectation { ,\ ,) be interpreted as the probability that the event will occur. In many cases £( , :,) can Ue outside the limits (0, 1).

The limitations of the Unear probability model are shown in Figure 8.3, which shows the bunching up of points along = 0 and = 1. The predicted values can easily lie outside the interval (0, 1) and the prediction errors can be very large.

In the 1960s and early 1970s the Unear probabiUty model was widely used mainly because it is a model that can be easily estimated using multiple regression analysis. Some others used discriminant analysis, not noting that this method is very similar to the Unear probabiUty model. For instance, Meyer and

"A. S. Goldberger, Econometric (New York: Wiley, 1964), p. 250.

"R. G. McGilvray, "Estimating the Linear Probability Function," Econometrica, 1970, pp.

775-776.

"This bunching of points is described in M. Nerlove and S. J. Press, "Univariate and Multivariate Log-Linear and Logistic Models." Report R-I306-EDA/NIH. Rand Corporation. Santa Monica, CaliL, December 1973.



The linear

regression line

A more reasonable.-

regression line

Figure 8.3. Predictions from the linear probability model.

Pifer"* analyzed bank failures using the linear probability model and Altman analyzed bankruptcy of manufacturing corporations using discriminant analysis. In both studies the bankrupt banks or corporations were taken and a paired sample of nonbankrupt banks or corporations were chosen (i.e., for each bankrupt bank or corporation, a similarly situated nonbankrupt bank or corporation was found). Then the linear probability model or linear discriminant function was estimated.

Since the linear probability model and linear discriminant function are closely related, we will discuss the latter here.

The Linear Discriminant Function

Suppose that we have n individuals for whom we have observations on explanatory variables and we observe that , of them belong to a group it, and «2 of them belong to a second group ttj where «, + «2 = We want to construct a linear function of the variables that we can use to predict that a new observation belongs to one of the two groups. This linear function is called the linear discriminant function.

As an example suppose that we have data on a number of loan applicants and we observe that «, of them were granted loans and n of them were denied loans. We also have the socioeconomic characteristics on the applicants. Now

"Paul A. Meyer and Howard W. Pifer, "Prediction of Bank Failures," Journal of Finance. 1970, pp. 853-868.

"Edward 1. Altman, "Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy," Journal of Finance, September 1968.



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [ 111 ] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]