Buscar

Prévia do material em texto

Computers and Chemical Engineering 24 (2000) 2037–2039
Letter to the Editor
www.elsevier.com/locate/compchemeng
Statistical tests and confidence intervals
Dear Editor,
I am writing this letter to your journal because I am
very puzzled. I know that the contents of this commu-
nication are not completely in the scope of your jour-
nal, however, I also submitted it to other, perhaps more
suitable journals, as a short communication; the editors
of chemical engineering journals said that it was also
out of the scope of their journals and suggested I try
journals of statistics. The editors of statistics journals
said that it was an interesting argument for chemical
engineering journals due to the style of its presentation.
However, in my opinion the argument covered in this
note is fundamental because it clarifies the real signifi-
cance of a statistical test and the meaning of confidence
interval for parameters in a model.
These tests play an important role in many applica-
tions of chemical engineering and are used with their
relative confidence intervals in many programs for
building models. Thus I think that this argument can
also be useful to people who are interested in model
analysis and numerical applications. In fact, note that
the book ‘Numerical Recipes’ (Press, Flannery, Teukol-
sky & Vetterling, 1988) devotes two chapters to statis-
tics problems.
In the forthwith note I want to highlight some of the
mistakes which can be found in statistics books and
papers when some statistics tests or some confidence
interval of parameters are used.
Let us consider a typical test. For the sake of simplic-
ity, I will look at the Student t-test even though similar
considerations can be made for any other statistical test
for example an F-test.
To begin with, we have to select two hypotheses: the
first, H0, is traditionally called the null hypothesis while
the second one is the alternative hypothesis, Ha. The
two hypotheses are complementary so if the first is true,
the second is false and vice versa.
In the case of a t-test, for example, the null hypothe-
sis could be H0: m=m0 and the alternative Ha: m\m0.
A type I error is made if one hypothesis is rejected
when true. The risk of a type I error is usually denoted
by a.
A type II error is made if one hypothesis is accepted
when the alternative hypothesis is true. The risk of a
type II error is usually denoted by b.
The test is performed by calculating an experimental
value of the variable t, tex. This is done by using a
sample of experiments, choosing a reasonable value for
the risk a and comparing the value tex with the theoret-
ical value ta found on the appropriate table.
What can we state if texB ta?
The following mistake is usually only found in a few
books: we accept the null hypothesis with a confidence
1−a (Berenson, Levine & Rindskopf, 1988). In other
words, it is a known mistake to state that the risk of a
type II error is, in this situation, b=1−a. The correct
answer can be found in many modern books: whilst we
accept the null hypothesis, we cannot evaluate the risk
of doing that (Arnold, 1990; Mendenhall, Wackerly &
Scheaffer, 1990).
What can we state if tex\ ta?
In all of the books I read, I found the following
answer: we can reject the null hypothesis H0 with a risk
a (Arnold, 1990). Very often the sentence continues: the
risk of a type II error in accepting the alternative
hypothesis, Ha, is b=a (Berenson et al., 1988; Hines &
Montgomery, 1990; Wonnacott & Wonnacott, 1990).
In my opinion, both of the previous statements are
wrong.
In this situation, the correct statement is: if tex be-
longs to the theoretical population used to build up the
table of ta, then only a% times will it result that tex\ ta.
This statement is correct because it is a tautology. So it
is reasonable and correct to say: tex does not belong to
the theoretical population of t and the risk for an error
is a.
But we cannot know why tex does not belong to the
theoretical population of t. Perhaps the cause is the
falsity of the null hypothesis H0, perhaps not. There is
no way of knowing why tex\ ta. Is it because the
population of my experiment sample has one of the
infinite distributions that are not Gaussian, or because
some experiments are outliers, or something else? There
is an infinite number of possible reasons which are
different from the falsity of H0. We cannot presume
that only H0: m=m0 is false. How can we confirm all of
the other hypotheses necessary to state that if H0 is true
0098-1354/00/$ - see front matter © 2000 Elsevier Science Ltd. All rights reserved.
PII: S 0 0 9 8 -1354 (00 )00602 -5
Letter to the Editor2038
then tex belongs to the theoretical population used to
build t? Obviously we cannot use any other statistical
test because, in this case, the argument is circular. Nor
can we assume the other hypotheses to be axioms. If we
cannot discover why tex does not belong to the theoret-
ical population of t, then the value of the risk a is not
conservative: it could turn out that tex\ ta is much
more then a% times even when the H0: m=m0 hypothe-
sis is true, although with a different type of
distribution.
So it is a mistake to state that if tex\ ta the null
hypothesis H0: m=m0 with a risk a must be rejected.
It is possible and correct, but in my opinion mislead-
ing, to state: if tex\ ta,, I reject the null hypothesis H0
with a risk a where the null hypothesis is now defined
as m=m0 and the sum of all the (infinite) hypotheses
required for considering tex as belonging to the theoret-
ical population used to build t.
The correct and not ambiguous answer when tex\ ta
is: I reject the null hypothesis H0: m=m0 but I do not
know the risk for a type I error.
What about the type II error when tex\ ta? What is
the risk in accepting the alternative hypothesis?
If you consider the statement m=m0 as the null
hypothesis, H0, you cannot accept the alternative hy-
pothesis, Ha, with a risk b=a because, in that case,
you could not reject the null hypothesis, H0, with a risk
a.
If, on the other hand, you consider the statement
m=m0 and the sum of all the (infinite) hypotheses
required for considering tex as belonging to the theoret-
ical population used to build t as the null hypothesis,
H0, then you cannot formulate an alternative hypothe-
sis, Ha, because, in that case, we would have an infinite
number of alternative hypotheses.
In conclusion:
The correct behaviour is to avoid quantifying the
type I error when we reject the null hypothesis.
We can continue to use the type I error only if we
extend the null hypothesis to include all of the (infinite)
hypotheses required to consider tex as belonging to the
theoretical population used to build t, but in this case
the rejection is a tautology.
We can never quantify a type II error. The introduc-
tion of the risk b for a type II error is always a big
mistake.
An important consequence of the previous consider-
ations influences the meaning of the confidence interval
and confidence region for parameters.
Let us consider the simplest confidence interval:
x̄− ta
's2
N
5m5 x̄+ ta
's2
N
What is the meaning of the 1−a confidence interval for
the parameter m?
In some books, I found the following demonstration
for building the confidence interval of the parameter m
and the meaning to the author in this case is: I have a
confidence 1−a that the true value of the parameter m
is internal to the previous interval (Press et al., 1988;
Mendenhall et al., 1990). Both the demonstration and
the sentence are definitely wrong.
We start with a one way t-test with the null hypoth-
esis H0: m=m0+d and the alternative Ha: mBm0+d.
We choose the value of d as the limit for the test:
tex= ta* and the value of a*=a/2=b/2. In this way, if
texB ta* I reject the null hypothesis H0: m=m0+d with
a type I error risk a/2 and I accept the alternative
hypothesis Ha: mBm0+d with the same risk b/2.
The next step is to perform a second t-test with the
null hypothesis H0: m=m0−d and the alternative Ha:
m\m0−d.We choose the value of d as the limit for the
test: tex= ta* and the value of a*=a/2=b/2. In this
way, if tex\ ta* I reject the null hypothesis H0: m=m0−
d with a type I error risk a/2 and I accept the alterna-
tive hypothesis Ha: m\m0−d with the same risk b/2.
If texB ta* one accepts with a risk a/2 that mBm0+d
and if tex\ ta* one accepts with a risk a/2 that m\m0−
d so one can conclude: I have a confidence 1−a that
the true value of the parameter m is internal to the
previous interval.
The previous reasoning is completely wrong because
we can never quantify a type II error. Thus, it is a big
mistake to say: I have a confidence 1−a that the true
value of the parameter m is internal to the previous
interval.
In some other books, I found the following meaning
for the 1−a confidence interval of the parameter m
(Berenson et al., 1988; Arnold 1990; Hines & Mont-
gomery, 1990; Wonnacott & Wonnacott, 1990).
If we repeat the sample of experiments many times
and each time we calculate the confidence interval 1−a
times, the confidence interval will contain the parameter
m=m0, while only a times the parameter m=m0 will be
outside the interval.
This version is also completely wrong.
The correct statement is: if we repeat the sample of
experiments many times for a theoretical population
normally distributed with m=m0 and with all of the
implicit hypotheses necessary for building a variable of
type t and each time we calculate:
m09 ta
's2
N
then this interval will contain 1−a times the value of x̄,
while only a times the value of x̄ will be outside the
interval.
This statement is correct but it is also a tautology
because it is the definition of the value of ta.
We cannot exchange m0 with x̄ ! We could do this
only if it were possible to control a type II error risk
Letter to the Editor 2039
and accept some hypotheses with an assigned risk or if
it were possible to assume that x̄ satisfies all the implicit
hypotheses necessary to build a variable of type t.
Neither of these possibilities is permitted.
I think that it is possible to verify the contradiction
in the meaning usually attributed to a confidence inter-
val in another way.
Let us consider a two way t-test and assume the
equality sign:
�tex�= ta
From this equation, we can obtain the following:
x̄− ta
's2
N
5m5 x̄+ ta
's2
N
If one looks on this relation as a confidence interval is
used to saying: I have a confidence 1−a that the true
value of the parameter m is internal to the previous
interval. However, if you consider the original t-test,
you know that it is an error to state: I have a confi-
dence 1−a that the null hypothesis is true. Thus the
previous sentence is incorrect.
In conclusion, the meaning of the 1−a confidence
interval for the parameter m is far weaker than usually
reported in statistics books. Obviously what we have
seen for a simple confidence interval is valid also for the
confidence region of the model (linear or non linear)
parameters.
If the meaning of confidence interval or region is
practically a tautology, is there any point in its evalua-
tion? In my opinion there is, but only in a negative
sense: if the interval or the region of confidence is very
large, we can conclude that the model and/or the
experimental sample is poor and we have to improve it.
When the interval or the region of confidence is very
small, this information is practically useless.
In this regard, I would like to add another consider-
ation. The confidence intervals or regions are evaluated
under the hypothesis of exact algebra. In other words,
the equations for their evaluation are valid for the
classic analysis where round off errors is not present.
For example, the confidence regions for the parameters
of a linear model are based on the equation:
V(b)= (FTF)−1s2
for the variance and covariance matrix. From this
equation and using an F-test, we obtain the region of
confidence for the parameters.
The parameters are usually calculated by numerically
solving the normal system of equations:
FTFb=FTy
or, even better, via factorisation (QR or SVD) of the
matrix F.
In each case, we obtain an estimation of the parame-
ters with an error that depends on the product of the
MachEps of the floating point precision and the condi-
tion number of the matrix FTF. If this condition num-
ber is large (as occurs very frequently in the model
problems due to correlation between the parameters), it
is necessary to add this circle of uncertainty to the
classic ellipsoid coming from a statistical point of view.
If we plan some experiments to reduce the region of
confidence, it could be important not only to try to
reduce the volume of the ellipsoid as in the classical
methods, but also to make the ellipsoid more spherical
since the condition number depends on the ratio of the
maximum and minimum eigen value of the matrix FTF.
References
Arnold, S. F. (1990). Mathematical statistics. Englewood Cliffs, NJ:
Prentice-Hall.
Berenson, M. L., Levine, D. M., & Rindskopf, D. (1988). Applied
statistics. Englewood Cliffs, NJ: Prentice-Hall.
Hines, W. W., & Montgomery, D. C. (1990). Probability and statistics
in engineering and management science. New York: Wiley.
Mendenhall, W., Wackerly, D. D., & Scheaffer, R. L. (1990). Mathe-
matical statistics with applications. Boston: PWS-KENT.
Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T.
(1988). Numerical recipes in C. Cambridge: Cambridge University
Press.
Wonnacott, T. H., & Wonnacott, R. J. (1990). Introductory statistics.
New York: Wiley.
G. Buzzi Ferraris26 May 2000
Ist. Chim. Ind. e Ing. Chim.,
Politecnico di Milano,
Piazza Leonardo da Vinci 32,
20133 Milan,
Italy
.