Next: Alternative hypotheses Up: Statistical hypothesis testing Previous: Getting rid of straw Contents

Decision procedure

**Figure:** Schematic showing the sampling density function of a test statistic assuming a certain null hypothesis. The critical levels are shown for an (unusually large) level of significance $\alpha=0.25$ for visually clarity.

Statistical testing generally uses a suitable test statistic that can be calculated for the sample. Under the null hypothesis, the test statistic is assumed to have a sampling distribution that tails off to zero for large positive and negative values of . Fig. 6.1 shows the sampling distribution of a typical test statistic such as the Z-score $Z=(\overline{X}-\mu)/\sigma$ , which is used to test hypotheses about the mean. The decision-making procedure in classical statistical inference proceeds by the following well-defined steps:

Set up the most reasonable null hypothesis $H_0: X\sim f(.)$
Specify the level of significance $\alpha$ you are prepared to accept. This is the probability that the null hypothesis will be rejected even if it really is true (e.g. the probability of convicting innocent people) and so is generally small (e.g. 5%).
Use to calculate the sampling distribution $T\sim f_T(.)$ of your desired test statistic
Calculate the p-value $p=Pr\{\Vert T\Vert\geq t\}$ of your observed sample value of the test statistic. The p-value gives the probability of finding samples of data even less consistent with the null hypothesis than your particular sample assuming is true i.e. the area in the tails of the probability density beyond the observed value of the test statistic. So if you observe a particularly large value for your test statistic, then the p-value will be very small since it will be rare to find data that gave even larger values for the test statistic.
Reject the null hypothesis if the p-value is less than the level of significance $p<\alpha$ on the grounds that the data are inconsistent with the null hypothesis at the $\alpha$ level of significance. Otherwise, do not reject the null hypothesis since the ``data are not inconsistent'' with it.

The level of significance defines a rejection region (critical region) in the tails of the sampling distribution of the test statistic. If the observed value of the test statistic lies in the rejection region, the p-value is less than $\alpha$ and the null hypothesis is rejected. If the observed value of the test statistic lies closer to the centre of the distribution, then the p-value is greater than or equal to $\alpha$ and the null hypothesis can not be rejected. All values of that have p-values greater than or equal to $\alpha$ define the $(1-\alpha)100\%$ confidence interval.

Example: Are meteorologists taller or shorter than other people ?

Let us try and test the hypothesis that Reading meteorologists have different mean heights to other people in Reading based on the small sample of data presented in Table 2.1. Assume that we know that the population of all people in Reading have heights that are normally distributed with a population mean of $\mu_0=$ 170cm and a population standard deviation of $\sigma_0=$ 30cm. So the null hypothesis is that our sample of meteorologists have come from this population, and the alternative hypothesis is that they come from a population with a different mean height. Mathematically the hypotheses can be stated as:

$\displaystyle H_0: \mu=\mu_0$			(6.1)
$\displaystyle H_1: \mu\neq\mu_0$

where $\mu_0=170$ cm. Let's choose a typical level of significance equal to 0.05. Under the null hypothesis, the sampling distribution of sample means should follow $\overline{X}\sim N(\mu_0,\sigma_0^2/n)$ , and hence the test statistic $Z=(\overline{X}-\mu_0)/(\sigma_0/\sqrt{n})\sim N(0,1)$ . Based on our sample of data presented in Table 2.1, the mean height is $\overline{X}=174.3cm$ and so the test statistic

is equal to 0.48, in other words, the mean of our sample is only 0.48 standard errors greater than the population mean. The p-value, i.e. the area in the tails of the density curve beyond this value, is given by $2(1-\Phi(\vert z\vert)$ and so for a

of 0.48 the p-value is 0.63, which is to say that there is a high probability of finding data less consistent with the null hypothesis than our sample. The p-value is clearly much larger than the significance level and so we can not reject the null hypothesis in this case - at the 0.05 level of significance, our data is not inconsistent with coming from the population in Reading. Based on this small sample data, we can not say that meteorologists have different mean heights to other people in Reading.

Next: Alternative hypotheses Up: Statistical hypothesis testing Previous: Getting rid of straw Contents

David Stephenson 2005-09-30