back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [ 166 ] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


166

Hypothesis-Testing Search

Suppose that we estimate a Cobb-Douglas production function as in Section 4.11, and test the hypothesis of constant returns to scale (a + p = 1 in that example). If the hypothesis is rejected, as in that example, we do not change the specification of the model. If it is not rejected, we change the specification and estimate a production function with constant returns to scale.

E. E. Leamer, Specification Searcfies (New York: Wiley, 1978).

G. E. Mizon, "Model Selection Procedures," in M. J. Artis and A. R. Nobay (eds.). Studies in Current Economic Analysis (Oxford: Basil Blackwell, 1977), Chap. 4.

"K. R. Sawyer, "The Theory of Econometric Model Selection," unpublished doctoral dissertation, Australian National University, 1980.

G. S. Maddala (ed.), "Model Selection," Journal of Econometrics, Vol. 16, 1981.

The area of model selection is quite vast in its scope and includes diagnostic checking and specification testing (the other two areas we are discussing in this chapter). There are many references on the topic, some of which are a book by Leamer, a paper by Mizon," the dissertation by Sawyer," and a special volume of the Journal of Econometrics. Since it is impossible to cover this vast area, we discuss Learners classification of the different types of model searches usually attempted, and also Hendrys ideas behind data-based-simplification of models. In Sections 12.7 to 12.9 we go through two particular aspects of model selection:

1. Selection of regressors.

2. Use of cross-validation techniques.

Leamer talks of six types of specification searches that are usually undertaken in the process of model selection. The differences between the different searches are very minor. However, they are useful for organization of our ideas. The different searches are:

Type of Search Purpose

(1) Hypothesis-testing search Choosing a "true model"

(2) Interpretive search Interpreting the sample evidence on

many intercorrelated variables

(3) Simplification search Constructing a "fruitful" model

(4) Proxy variable search Choosing between different measures of

the same set of hypothetical variables

(5) Data selection search Selecting the appropriate data set for

estimation and prediction

(6) Post-data model Improving an existing model construction

Many of these searches have been discussed in previous chapters. But we will give further examples:



Interpretive Search

Sometimes the coefficients of the model do not make economic sense but the imposition of some constraints does. For instance, based on data for 150 households, Leamer* estimated the demand for oranges as

log = 3.1 + 0.83 log E, + 0.01 log P, - 0.56 log it, = 0.20

(I 0) (0.20) (015) (0.60)

where D, = purchases of oranges by household i E, = total expenditures by household / P, = price of oranges

It, = price of grapefruit (a substitute commodity)

(Figures in parentheses are standard errors.) The coefficients of the price variables are insignificant and have the "wrong" sign. Also, the sum of the coefficients (0.83 + 0.01 - 0.56 = 0.28) is rather far from zero. If there is no money illusion, then multiplying E„ P„ and it, by the same factor should not produce any change in . This implies that the sum of the coefficients of these variables should be zero. (This is known as the "homogeneity postulate.") Imposing this condition, Leamer gets the result:

log D, = 4.2 4- 0.52 log E, - 0.61 log P, + 0.09 log tr, R = 0.19

(0 9) (0 19) (0 14) (0 31)

The R- has fallen only slightly and the coefficients all have the right signs. Income y, and price P, are significant. The constraint has improved the specifications, and the interpretation.

Simplification Search

In the equation above the coefficient of log it, is not significant. Dropping this variable and imposing the homogeneity constraint, that is, assuming the other two coefficients to be equal in value and opposite in sign, we get

log D, = 3.7 + 0.58 log (EJP.) R = 0.18

(0 8) (0 18)

The R is only slightly smaller and we have a simplified equation. This is called a simplification search. The purpose of this search is to find a simple but useful model.

Pro Variable Search

In econometric work an investigator is faced with several definitions of the same variable. There are several definitions of money supply, several definitions of income, and so on. Further, some variables like education, ability, and risk are not directly measurable and we have to use some proxies for them. We

Learner, Specification Searches, p. 8.



are thus left with the problem of choosing among the different proxies. In the example of demand for oranges that Leamer considered, one has to choose between money income F, and expenditures E, as measures of the households true income. The estimated equations he gets are

log D, = 6.2 + 0.85 log Y, - 0.67 log P, R- = 0.15

(ID (0 21) (0 13)

log = 5.2 + 1.1 log E, - 0.45 log P, R = 0.18

(I 0) (018) (0 16)

The i? has increased with the use of E„ suggesting that E, is a better proxy than Y, for "true income."

Data Selection Search

Often, in econometric work we have different data sets from which we can estimate the same relationship. A question often arises as to whether we can pool the different data sets and get more efficient estimates of the parameters. In Section 4.6 we gave some examples where the data sets referred to prewar and postwar years. There we found significant differences in the coefficients between the two periods which suggested that the data should not be pooled. This is an example of a data-selection search.

Post-data Model Construction

This is what Leamer calls "Sherlock Holmes" inference. In response to a question by Dr. Watson about the likely perpetrators of the crime, Sherlock Holmes replied: "No data yet. ... It is a capital mistake to theorize before you have all the evidence. It biases the judgments." According to the traditional statistical theory, on the other hand, it is a "capital mistake to view the facts before you have all the theories. It biases the judgments." Any theory that is postulated after looking at a particular data set cannot be tested using the same data set, because doing so would amount to double counting. On the other hand, Sherlock Holmes would argue that the set of viable alternative hypotheses is immense and the set of hypotheses formulated before the data set is observed can be incomplete. There is always the risk that the data favor some unspecified hypothesis. Hence the data evidence is used to construct a set of "empirically relevant" hypotheses, thereby reducing the cost of formulating a comprehensive set of hypotheses and the risk of not identifying the "best" hypothesis.

In almost all econometric work investigators do something similar to what Sherlock Holmes does. They formulate some hypotheses, then observe that the coefficients of some variables have wrong signs or implausible magnitudes or that the residuals have a peculiar pattern. Then they introduce more explanatory variables or impose some constraints on the parameters. A question



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [ 166 ] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]