Empirical quantiles

Next: Example: Summary statistics for Up: Descriptive statistics for univariate Previous: Resistant statistics Contents

Empirical quantiles

One way of obtaining resistant statistics is to use the empirical quantiles (percentiles/fractiles). The quantile (this term was first used by Kendall, 1940) of a distribution is the number such that a proportion of the values are less than or equal to . For example, the 0.25 quantile $x_{0.25}$ (also referred to as the 25th percentile or lower quartile) is the value such that 25% of all the values fall below that value.

Empirical quantiles can be most easily constructed by sorting (ranking) the data into ascending order to obtain a sequence of order statistics $x_{(1)} \leq x_{(2)} \leq \ldots \leq x_{(n)}$ as shown in Figure 2.1b. The 'th quantile is then obtained by taking the rank 'th order statistic $x_{((n+1)p)}$ (or an average of neigbouring values if is not integer):

$\displaystyle x_p$

$\displaystyle =$

$\displaystyle \left\{ \begin{array}{ll} x_{((n+1)p)} & \mbox{ if $(n+1)p$\ is i... ...\\ 0.5*(x_{([(n+1)p])}+x_{([(n+1)p]+1)}) & \mbox{otherwise} \end{array}\right.$

(2.5)

where

is the probability $\Pr(X\leq x_p)=r/(n+1)$ and

is the greatest integer not exceeding

. Note that the empirical probability

is only defined at discrete values - quantiles for other values of

can be obtained either by interpolation ( $1\leq p(n+1)\leq n$ ) or by extrapolation (

). The use of

rather than

in the denominator of

prevents issuing probabilities that are either zero or one (i.e. perfect certainty) based on only a finite sample of data. As an example, the quartiles of the height example are given by $x_{0.25}=x_{(3)}=171$ (lower quartile), $x_{0.5}=x_{(6)}=175$ (median), and $x_{0.75}=x_{(9)}=180$ (upper quartile).

**Figure:** Diagram showing how the empirical distribution is obtained for the heights given in Table 2.1. All heights are relative to a reference height of 150cm in order to make the differences more apparent.

Unlike the arithmetic mean, the median $x_{0.5}$ is not at all influenced by the exact value of the largest objects and so provides a resistant measure of the central location. Likewise, a resistant measure of the scale can be obtained using the Inter-Quartile Range (IQR) given by the difference between the upper and lower quartiles $x_{0.75}-x_{0.25}$ . In the asymptotic limit of large sample size ( $n\rightarrow\infty$ ), for normally (Gaussian) distributed variables (see Chapter 4), the sample median tends to the sample mean and the sample IQR tends to 1.34 times the sample standard deviation. Resistant measures of skewness and kurtosis also exist such as the dimensionless Yule-Kendall skewness statistic

$\displaystyle \gamma_{YK}$

$\displaystyle =$

$\displaystyle \frac{x_{0.25}-2x_{0.5}+x_{0.75}}{x_{0.75}-x_{0.25}}$

(2.6)

and Moors kurtosis statistic

$\displaystyle \tau_M = \frac{(x_{0.875} - x_{0.625}) + (x_{0.375} - x_{0.125})}{x_{0.75}-x_{0.25}}$

There also exist other resistant measures based on all the quantiles such as L-moments, but these are beyond the scope of this course - refer to Wilks (1995) and von Storch and Zwiers (1999) for more discussion.

Next: Example: Summary statistics for Up: Descriptive statistics for univariate Previous: Resistant statistics Contents

David Stephenson 2005-09-30