Next: One sample tests in Up: Statistical hypothesis testing Previous: Alternative hypotheses Contents

Examples of bad practice

The atmospheric sciences literature is full of bad practice concerning statistical hypothesis testing. Hopefully, after this course, you will not contribute to the continuation of these bad habits ! Here are some of the classic mistakes that are often made in the literature:

``The results ... are statistically significant''
Complete failure to state clearly which hypotheses are being tested and the level of significance. In addition, the way this is stated treats statistical inference as just a way of rubber stamp approving results that the author found interesting. This is not what inference is about.
``... and are 95% significant''
What the author is trying to say is that the null hypothesis can be rejected at the 0.05 level of significance. In statistics, levels of significance are kept small (e.g. $\alpha=0.05$ ), whereas levels of confidence are generally large (e.g. $1-\alpha=0.95$ ). This abuse of convention is particularly bad in the atmospheric science literature (as also noted by von Storch and Zwiers 1999).
``the results are not significant at the 0.05 level but are significant at the 0.10 level''
The idea of hypothesis testing is that you FIX the level of significance BEFORE looking at the data. Choosing it after you have seen what the p-values are so as to reject the null hypothesis will lead to too many rejections. If the p-values are quite large (i.e. greater than 0.01) then it is good practice to quote the values and then let the reader make the decision.
``however, the results are significant in certain regions''
This is stating the obvious since the more data samples (variables) you look at, the more chance you will have of being able to reject the null hypothesis. With large gridded data sets, there is a real danger of ``data mining'' and fooling yourself (and others) into thinking you have found something statistically significant. The total number of data samples or variables examined needs to be taken into account when doing many tests at the same time. See Wilks (1995) for a discussion of these kind of multiplicity and dimensionality problems.
``the null hypothesis can not be rejected and so must be true''
Wrong. Either the null hypothesis is true OR your data sample was just not the right one to be able to reject the null hypothesis. If the null hypothesis can not be rejected all you can say is that the ``data are not inconsistent with the null hypothesis'' (remember this useful phrase !).

Next: One sample tests in Up: Statistical hypothesis testing Previous: Alternative hypotheses Contents

David Stephenson 2005-09-30