Prévia do material em texto
Computers and Chemical Engineering 24 (2000) 2037–2039 Letter to the Editor www.elsevier.com/locate/compchemeng Statistical tests and confidence intervals Dear Editor, I am writing this letter to your journal because I am very puzzled. I know that the contents of this commu- nication are not completely in the scope of your jour- nal, however, I also submitted it to other, perhaps more suitable journals, as a short communication; the editors of chemical engineering journals said that it was also out of the scope of their journals and suggested I try journals of statistics. The editors of statistics journals said that it was an interesting argument for chemical engineering journals due to the style of its presentation. However, in my opinion the argument covered in this note is fundamental because it clarifies the real signifi- cance of a statistical test and the meaning of confidence interval for parameters in a model. These tests play an important role in many applica- tions of chemical engineering and are used with their relative confidence intervals in many programs for building models. Thus I think that this argument can also be useful to people who are interested in model analysis and numerical applications. In fact, note that the book ‘Numerical Recipes’ (Press, Flannery, Teukol- sky & Vetterling, 1988) devotes two chapters to statis- tics problems. In the forthwith note I want to highlight some of the mistakes which can be found in statistics books and papers when some statistics tests or some confidence interval of parameters are used. Let us consider a typical test. For the sake of simplic- ity, I will look at the Student t-test even though similar considerations can be made for any other statistical test for example an F-test. To begin with, we have to select two hypotheses: the first, H0, is traditionally called the null hypothesis while the second one is the alternative hypothesis, Ha. The two hypotheses are complementary so if the first is true, the second is false and vice versa. In the case of a t-test, for example, the null hypothe- sis could be H0: m=m0 and the alternative Ha: m\m0. A type I error is made if one hypothesis is rejected when true. The risk of a type I error is usually denoted by a. A type II error is made if one hypothesis is accepted when the alternative hypothesis is true. The risk of a type II error is usually denoted by b. The test is performed by calculating an experimental value of the variable t, tex. This is done by using a sample of experiments, choosing a reasonable value for the risk a and comparing the value tex with the theoret- ical value ta found on the appropriate table. What can we state if texB ta? The following mistake is usually only found in a few books: we accept the null hypothesis with a confidence 1−a (Berenson, Levine & Rindskopf, 1988). In other words, it is a known mistake to state that the risk of a type II error is, in this situation, b=1−a. The correct answer can be found in many modern books: whilst we accept the null hypothesis, we cannot evaluate the risk of doing that (Arnold, 1990; Mendenhall, Wackerly & Scheaffer, 1990). What can we state if tex\ ta? In all of the books I read, I found the following answer: we can reject the null hypothesis H0 with a risk a (Arnold, 1990). Very often the sentence continues: the risk of a type II error in accepting the alternative hypothesis, Ha, is b=a (Berenson et al., 1988; Hines & Montgomery, 1990; Wonnacott & Wonnacott, 1990). In my opinion, both of the previous statements are wrong. In this situation, the correct statement is: if tex be- longs to the theoretical population used to build up the table of ta, then only a% times will it result that tex\ ta. This statement is correct because it is a tautology. So it is reasonable and correct to say: tex does not belong to the theoretical population of t and the risk for an error is a. But we cannot know why tex does not belong to the theoretical population of t. Perhaps the cause is the falsity of the null hypothesis H0, perhaps not. There is no way of knowing why tex\ ta. Is it because the population of my experiment sample has one of the infinite distributions that are not Gaussian, or because some experiments are outliers, or something else? There is an infinite number of possible reasons which are different from the falsity of H0. We cannot presume that only H0: m=m0 is false. How can we confirm all of the other hypotheses necessary to state that if H0 is true 0098-1354/00/$ - see front matter © 2000 Elsevier Science Ltd. All rights reserved. PII: S 0 0 9 8 -1354 (00 )00602 -5 Letter to the Editor2038 then tex belongs to the theoretical population used to build t? Obviously we cannot use any other statistical test because, in this case, the argument is circular. Nor can we assume the other hypotheses to be axioms. If we cannot discover why tex does not belong to the theoret- ical population of t, then the value of the risk a is not conservative: it could turn out that tex\ ta is much more then a% times even when the H0: m=m0 hypothe- sis is true, although with a different type of distribution. So it is a mistake to state that if tex\ ta the null hypothesis H0: m=m0 with a risk a must be rejected. It is possible and correct, but in my opinion mislead- ing, to state: if tex\ ta,, I reject the null hypothesis H0 with a risk a where the null hypothesis is now defined as m=m0 and the sum of all the (infinite) hypotheses required for considering tex as belonging to the theoret- ical population used to build t. The correct and not ambiguous answer when tex\ ta is: I reject the null hypothesis H0: m=m0 but I do not know the risk for a type I error. What about the type II error when tex\ ta? What is the risk in accepting the alternative hypothesis? If you consider the statement m=m0 as the null hypothesis, H0, you cannot accept the alternative hy- pothesis, Ha, with a risk b=a because, in that case, you could not reject the null hypothesis, H0, with a risk a. If, on the other hand, you consider the statement m=m0 and the sum of all the (infinite) hypotheses required for considering tex as belonging to the theoret- ical population used to build t as the null hypothesis, H0, then you cannot formulate an alternative hypothe- sis, Ha, because, in that case, we would have an infinite number of alternative hypotheses. In conclusion: The correct behaviour is to avoid quantifying the type I error when we reject the null hypothesis. We can continue to use the type I error only if we extend the null hypothesis to include all of the (infinite) hypotheses required to consider tex as belonging to the theoretical population used to build t, but in this case the rejection is a tautology. We can never quantify a type II error. The introduc- tion of the risk b for a type II error is always a big mistake. An important consequence of the previous consider- ations influences the meaning of the confidence interval and confidence region for parameters. Let us consider the simplest confidence interval: x̄− ta 's2 N 5m5 x̄+ ta 's2 N What is the meaning of the 1−a confidence interval for the parameter m? In some books, I found the following demonstration for building the confidence interval of the parameter m and the meaning to the author in this case is: I have a confidence 1−a that the true value of the parameter m is internal to the previous interval (Press et al., 1988; Mendenhall et al., 1990). Both the demonstration and the sentence are definitely wrong. We start with a one way t-test with the null hypoth- esis H0: m=m0+d and the alternative Ha: mBm0+d. We choose the value of d as the limit for the test: tex= ta* and the value of a*=a/2=b/2. In this way, if texB ta* I reject the null hypothesis H0: m=m0+d with a type I error risk a/2 and I accept the alternative hypothesis Ha: mBm0+d with the same risk b/2. The next step is to perform a second t-test with the null hypothesis H0: m=m0−d and the alternative Ha: m\m0−d.We choose the value of d as the limit for the test: tex= ta* and the value of a*=a/2=b/2. In this way, if tex\ ta* I reject the null hypothesis H0: m=m0− d with a type I error risk a/2 and I accept the alterna- tive hypothesis Ha: m\m0−d with the same risk b/2. If texB ta* one accepts with a risk a/2 that mBm0+d and if tex\ ta* one accepts with a risk a/2 that m\m0− d so one can conclude: I have a confidence 1−a that the true value of the parameter m is internal to the previous interval. The previous reasoning is completely wrong because we can never quantify a type II error. Thus, it is a big mistake to say: I have a confidence 1−a that the true value of the parameter m is internal to the previous interval. In some other books, I found the following meaning for the 1−a confidence interval of the parameter m (Berenson et al., 1988; Arnold 1990; Hines & Mont- gomery, 1990; Wonnacott & Wonnacott, 1990). If we repeat the sample of experiments many times and each time we calculate the confidence interval 1−a times, the confidence interval will contain the parameter m=m0, while only a times the parameter m=m0 will be outside the interval. This version is also completely wrong. The correct statement is: if we repeat the sample of experiments many times for a theoretical population normally distributed with m=m0 and with all of the implicit hypotheses necessary for building a variable of type t and each time we calculate: m09 ta 's2 N then this interval will contain 1−a times the value of x̄, while only a times the value of x̄ will be outside the interval. This statement is correct but it is also a tautology because it is the definition of the value of ta. We cannot exchange m0 with x̄ ! We could do this only if it were possible to control a type II error risk Letter to the Editor 2039 and accept some hypotheses with an assigned risk or if it were possible to assume that x̄ satisfies all the implicit hypotheses necessary to build a variable of type t. Neither of these possibilities is permitted. I think that it is possible to verify the contradiction in the meaning usually attributed to a confidence inter- val in another way. Let us consider a two way t-test and assume the equality sign: �tex�= ta From this equation, we can obtain the following: x̄− ta 's2 N 5m5 x̄+ ta 's2 N If one looks on this relation as a confidence interval is used to saying: I have a confidence 1−a that the true value of the parameter m is internal to the previous interval. However, if you consider the original t-test, you know that it is an error to state: I have a confi- dence 1−a that the null hypothesis is true. Thus the previous sentence is incorrect. In conclusion, the meaning of the 1−a confidence interval for the parameter m is far weaker than usually reported in statistics books. Obviously what we have seen for a simple confidence interval is valid also for the confidence region of the model (linear or non linear) parameters. If the meaning of confidence interval or region is practically a tautology, is there any point in its evalua- tion? In my opinion there is, but only in a negative sense: if the interval or the region of confidence is very large, we can conclude that the model and/or the experimental sample is poor and we have to improve it. When the interval or the region of confidence is very small, this information is practically useless. In this regard, I would like to add another consider- ation. The confidence intervals or regions are evaluated under the hypothesis of exact algebra. In other words, the equations for their evaluation are valid for the classic analysis where round off errors is not present. For example, the confidence regions for the parameters of a linear model are based on the equation: V(b)= (FTF)−1s2 for the variance and covariance matrix. From this equation and using an F-test, we obtain the region of confidence for the parameters. The parameters are usually calculated by numerically solving the normal system of equations: FTFb=FTy or, even better, via factorisation (QR or SVD) of the matrix F. In each case, we obtain an estimation of the parame- ters with an error that depends on the product of the MachEps of the floating point precision and the condi- tion number of the matrix FTF. If this condition num- ber is large (as occurs very frequently in the model problems due to correlation between the parameters), it is necessary to add this circle of uncertainty to the classic ellipsoid coming from a statistical point of view. If we plan some experiments to reduce the region of confidence, it could be important not only to try to reduce the volume of the ellipsoid as in the classical methods, but also to make the ellipsoid more spherical since the condition number depends on the ratio of the maximum and minimum eigen value of the matrix FTF. References Arnold, S. F. (1990). Mathematical statistics. Englewood Cliffs, NJ: Prentice-Hall. Berenson, M. L., Levine, D. M., & Rindskopf, D. (1988). Applied statistics. Englewood Cliffs, NJ: Prentice-Hall. Hines, W. W., & Montgomery, D. C. (1990). Probability and statistics in engineering and management science. New York: Wiley. Mendenhall, W., Wackerly, D. D., & Scheaffer, R. L. (1990). Mathe- matical statistics with applications. Boston: PWS-KENT. Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. (1988). Numerical recipes in C. Cambridge: Cambridge University Press. Wonnacott, T. H., & Wonnacott, R. J. (1990). Introductory statistics. New York: Wiley. G. Buzzi Ferraris26 May 2000 Ist. Chim. Ind. e Ing. Chim., Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy .