back start next


[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [ 14 ] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]


14

2.9 TESTING OF HYPOTHESES 29

2. A hypothesis whifch says that a parameter has a specified value is called a point hypothesis. A hypothesis which says that a parameter lies in a specified interval is called an internal hypothesis. For instsance, if p. is the population mean, then ; p = 4 is a point by hypothesis. ; 4 < p < 7 is an interval hypothesis.

3. A hypothesis test is a procedure that answers the question of whether the observed difference between the sample value and the population value hypothesized is real or due to chance variation. For instance, if the hypothesis says that the population mean p = 6 and the sample mean = 8, then we want to know whether this difference is real or due to chance variation.

4. The hypothesis we are testing is called the null hypothesis and is often denoted by Hg. The alternative hypothesis is denoted by ,. The probability of rejecting when, in fact, it is true, is called the significance level. To test whether the observed difference between the data and what is expected under the null hypothesis is real or due to chance variation, we use a test statistic. A desirable criterion for the test statistic is that its sampUng distribution be tractable, preferably with tabulated probabilities. Tables are already available for the normal, t. x, and F distributions, and hence the test statistics chosen are often those that have these distributions.

5. The observed significance level or P-value is the probability of getting a value of the test statistic that is as extreme or more extreme than the observed value of the test statistic. This probability is computed on the basis that the null hypothesis is correct. For instance, consider a sample of n independent observations from a normal population with mean p and variance o-. We want to test

: p = 7 against ,: p # 7 The test statistic we use is

Vniy - p)

which has a /-distribution with ( - 1) degrees of freedom. Suppose that n = 25, = 10, 5 = 5. Then under the assumption that is true, the observed value of / is f( = 3. Since high positive values of / are evidence against the null hypothesis Hq, the P-value is [since degrees of freedom ( - 1) = 24]

P = Prob(/24 > 3)

This is the observed significance level.

6. It is common practice to say simply that the result of the test is (statistically) significant or not significant and not report the actual F-values. The meaning of the two terms is as follows:

This terminology is unfortunate because "null" means "zero, void, insignificant, amounts to nothing, etc." A hypothesis fx = 0 can be called a null hypothesis, but a hypothesis - = 100 should not be called a "null" hypothesis. However, this is the standard termmology that was introduced in the 1930s by the statisticians Jerzy Neyman and e. s. Pearson.



Result of the Test

Reality

„ Is True

Is False

Significant (reject

Type I error or a

Correct conclusion

error

Not significant (do

Correct conclusion

Type II error or p

not reject )

error

A question of historical interest is: "How did the numbers 0.05 and 0.01 creep into all these textbooks?" The answer is that they were suggested by the famous statistician Sir R. A. Fisher (1890-1962), the "father" of modern statistics, and his prescription has been followed ever since.

Statistically significant. Sampling variation is an unlikely explanation of the discrepancy between the null hypothesis and sample values.

Statistically insignificant. Sampling variation is a likely explanation of the discrepancy between the null hypothesis and the sample value.

Also, the terms significant and highly significant are customarily used to denote "significant at the 0.05 level" and "significant at the 0.01 level" respectively. However, consider two cases where the P-values are 0.055 and 0.045, respectively. Then in the former case one would say that the results are "not significant" and in the latter case one would say that the results are "significant," although the sample evidence is marginally different in both cases. Similarly, two tests with F-values of 0.80 and 0.055 will both be considered "not significant," although there is a tremendous difference in the compatibility of the sample evidence with the null hypothesis in the two cases. That is why many computer programs print out P-values.

7. Statistical significance and practical significance are not the same thing. A result that is highly significant statistically may be of no practical significance at all. For instance, suppose that we consider a shipment of cans of cashews with expected mean weight of 450 g. If the actual sample mean of weights is 449.5 g, the difference may be practically insignificant but could be highly statistically significant if we have a large enough sample or a small enough sampling variance (note that the test statistic has / in the numerator and S in the denominator). On the other hand, in the case of precision instruments, a part is expected to be of length 10 cm and a sample had a mean length of 9.9 cm. If n is small and S is large, the difference could not be statistically significant but could be practically very significant. The shipment could simply be useless.

8. It is customary to reject the null hypothesis when the test statistic is statistically significant at a chosen significance level and not to reject when the test statistic is not statistically significant at the chosen significance level. There is, however, some controversy on this issue which we discuss later in item 9. In reality, may be either true or false. Corresponding to the two cases of reality and the two conclusions drawn, we have the following four possibilities:



2 9 TESTING OF HYPOTHESES

There are two possible errors that we can make:

1. Rejecting (, when it is true. This is called the type I error or a error.

2. Not rejecting when it is not true. This is called the type II error or (3 error.

Thus

a = Prob(rejecting is true)

P = Prob(not rejecting is not true)

a is just the significance level, defined earlier. (1 - p) is called the power of the test. The power of the test cannot be computed unless the alternative hypothesis , is specified; that is, is not true means that , is true. For example, consider the problem of testing the hypothesis.

: p. = 10 against ,: (x = 15

for a normal population with mean and variance o-. The test statistic we use is f = \/n(x - \l)IS. From the sample data we get the values of n, x, and S. To calculate a we use p = 10, and to calculate p we use (x = 15. The two errors are

a = Prob(/ > /* , = 10) p = Prob(/ < f*(x = 15)

where f* is the cutoff point of t that we use to reject or not reject . The distributions of t under and , are shown in Figure 2.1. In our example a is the right-hand tail area from the distribution of / under and p is the left-hand tail area from the distribution of t under ,. If the alternative hypothesis , says thai jx < 0, then the distribution of t under , would be to the left of the distribution of t under . In this case a would be the left-hand tail area of

fft/Ho)

Sampling distribution of t under Ho

Sampling distribution of t under H\

Do not reject Hq

Reject Ho

Fnre 2.1. Type I and type II errors in testing a hypothesis.



[start] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [ 14 ] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212]