back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [ 106 ] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


106

fl for group 2

[0 for groups 1 and 3

fl for group 3

[O for groups 1 and 2

It can be easily checked that by substituting the values for D, and Dj in (8.3), we get the intercepts a,, 0.2, «3, respectively for the three groups. Note that in combining the three equations, we are assuming that the slope coefficient is the same for all groups and that the error term has the same distribution for the three groups.

If there is a constant term in the regression equation, the number of dummies defined should always be one less than the number of groupings by that category because the constant term is the intercept for the base group and the coefficients of the dummy variables measure differences in intercepts, as can be seen from equation (8.3). In that equation the constant term measures the intercept for the first group, the constant term plus the coefficient of D, measures the intercept for the second group, and the constant term plus the coefficient of D2 measures the intercept for the third group. We have chosen group 1 as the base group, but any one group may be chosen. The coefficients of the dummy variables measure the differences in the intercepts from that of the base group. If we do not introduce a constant term in the regression equation, we can define a dummy variable for each group, and in this case the coefficients of the dummy variables measure the intercepts for the respective groups. If we include both the constant term and three dummies, we will be introducing perfect multicollinearity and the regression program will not run (or will omit one of the dummies automatically).

As yet another example, suppose that we have data on consumption and income for a number of households. In addition, we have data on:

1. S: the sex of the head of the household.

2. A: age of the head of the household, which is given in three categories: < 25 years, 25 to 50 years, and > 50 years.

3. E: education of the head of household, also in three categories: < high school, high school but < college degree, college degree.

We include these qualitative variables in the form of dummy variables:

1 if sex is male 0 if female

jy \\ if age < 25 years

[0 otherwise

1 if age between 25 and 50 years

0 otherwise

where



1 if < high school degree

0 otherwise

1 if high school degree but < college degree 0 otherwise

For each category the number of dummy variables is one less than the number of classifications. Then we run the regression equation

= a + + 7i£>i + - + 7 £> + + TsA + «

The assumption made in the dummy-variable method is that it is only the intercept that changes for each group but not the slope coefficients (i.e., coefficients of Y).

The intercept term for each individual is obtained by substituting the appropriate values for D, through D5. For instance, for a male, age < 25, with college degree, we have D, = 1, Dj = 1> A = 0, D4 = 0, D5 = 0 and hence the intercept is a + 7, - 72- For a female, age > 50 years, with a college degree, we have D, = 0, A = 0. A = 0, = 0, D5 = 0 and hence the intercept term is just a.

The dummy-variable method is also used if one has to take care of seasonal factors. For example, if we have quarterly data on and Y, we fit the regression equation

= a -I- py + X,D, + XjA + > + where Z)„ A. and A are seasonal dummies defined by

for the first quarter 0 for others

for the second quarter 0 for others

11 for the third quarter ~ o for others

If we have monthly data, we use 11 seasonal dummies:

1 for January

] 0 for others

1 for February

0 for others etc.

If we feel that, say, December (because of Christmas shopping) is the only month with strong seasonal effect, we use only one dummy variable:

1 for December 0 for other months



Illustrative Example

The Environmental Protection Agency (EPA) publishes auto mileage estimates that are designed to help car buyers compare the relative fuel efficiency of different models. Does the EPA estimate provide all the information necessary for comparing the relative fuel efficiency of the different models? To investigate this problem Lovell estimated the following regressions.

= 7.952 + 0.693EPA R = 0.74

(1 735) (0 061)

(Figures in parentheses are standard errors.)

= 22.008 - 0.0021 - 2.7605M + 3.280G/F) + 0.415EPA = 0.82

(5.349) (0.001) (0 708) (1413) (0 097)

where = miles per gallon as reported by Consumer Union based on road tests W = weight of the vehicle (pounds) S/A) = dummy variable equal to 0 for standard transmission

and 1.0 for automatic transmission G/D = dummy variable equal to 0 for gas and 1.0 for diesel power EPA = mileage estimate by the EPA

The variables W, S/A, G/D all have correct signs and are significant, showing that the EPA did not use all the information available in giving its estimates on fuel efficiency.

Two More Illustrative Examples

We will discuss two more examples using dummy variables. They are meant to illustrate two points worth noting, which are as follows:

1. In some studies with a large number of dummy variables it becomes somewhat difficult to interpret the signs of the coefficients because they seem to have the wrong signs. The first example illustrates this problem.

2. Sometimes the introduction of dummy variables produces a drastic change in the slope coefficient. The second example illustrates this point.

The examples are rather old and outdated but they establish the points we wish to make.

The first example is a study of the determinants of automobile prices. Griliches regressed the logarithm of new passenger car prices on various spec-

M. C. Lovell, "Tests of the Rational Expectations Hypothesis," The American Economic Review, March 1986, p. 120.

Z. Griliches, "Hedonic Price Indexes for Automobiles: An Econometric Analysis of Quality Change," Government Price Statistics, Hearings, U.S. Congress, Joint Economic Committee (Washington, D.C.: U.S. Government Printing Office, I96I). Further results on this problem can be found in M. Ohta and Z. Griliches, "Automobile Prices Revisited: Extensions of the Hedonic Hypothesis," in N. Terleckyj (ed.). Household Behavior and Consumption (New York: National Bureau of Economic Research, 1975).



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [ 106 ] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]