Next: One sample tests in
Up: Statistical hypothesis testing
Previous: Alternative hypotheses
Contents
The atmospheric sciences literature is full of bad practice
concerning statistical hypothesis testing. Hopefully, after
this course, you will not contribute to the continuation of
these bad habits ! Here are some of the classic mistakes
that are often made in the literature:
- ``The results ... are statistically significant''
Complete failure to state clearly which hypotheses are
being tested and the level of significance. In addition,
the way this is stated treats statistical inference
as just a way of rubber stamp approving results that the
author found interesting. This is not what inference
is about.
- ``... and are 95% significant''
What the author is trying to say is that the null hypothesis
can be rejected at the 0.05 level of significance. In
statistics, levels of significance are kept small
(e.g.
), whereas levels of confidence
are generally large (e.g.
). This
abuse of convention is particularly bad in the atmospheric
science literature (as also noted by von Storch and Zwiers 1999).
- ``the results are not significant at the 0.05 level but
are significant at the 0.10 level''
The idea of hypothesis testing is that you FIX the level
of significance BEFORE looking at the data. Choosing it
after you have seen what the p-values are so as to reject
the null hypothesis will lead to too many rejections.
If the p-values are quite large (i.e. greater than 0.01)
then it is good practice to quote the values and then let
the reader make the decision.
- ``however, the results are significant in certain regions''
This is stating the obvious since the more data samples (variables)
you look at, the more chance you will have of being able
to reject the null hypothesis. With large gridded data sets, there
is a real danger of ``data mining'' and fooling yourself (and others)
into thinking you have found something statistically significant.
The total number of data samples or variables examined needs
to be taken into account when doing many tests at the same time.
See Wilks (1995) for a discussion of these kind of multiplicity
and dimensionality problems.
- ``the null hypothesis can not be rejected and so must be true''
Wrong. Either the null hypothesis is true OR your data sample was
just not the right one to be able to reject the null hypothesis.
If the null hypothesis can not be rejected all you can say is that
the ``data are not inconsistent with the null hypothesis''
(remember this useful phrase !).
Next: One sample tests in
Up: Statistical hypothesis testing
Previous: Alternative hypotheses
Contents
David Stephenson
2005-09-30