Next: Further reading Up: Basic probability concepts Previous: Expectation, (co-)variance, and correlation Contents

Summary of statistical notation

Mathematical notation in statistics can often be a source of confusion. Here is a brief summary of some commonly used conventions:

Upper case Roman letters are used to denote random variables (e.g. , , , etc.) whereas lower case Roman letters are used to denote their specific values (e.g. , , and ). For example, the probability of a (generic) random variable exceeding a (specific) value is given by $\Pr({X>x})$ . Therefore, sample data such as measurements are denoted by lower case Roman letters (e.g. a data sample with values $\{x_1,x_2,\ldots,x_n\}$ ).
Unobservable quantities such as model parameters and noise are denoted using lower case Greek letters. For example, the linear regression model that explains random variable in terms of is given by $Y=X\beta+\alpha+\epsilon$ .
The hat symbol is used to denote estimated and predicted values. For example, $\hat{\beta}$ is an estimate of the model parameter $\beta$ , and $\hat{Y}$ is a prediction of the random variable .
Bold face upper case Roman letters are used to denote data matrices containing multiple variables. For example, the $(n\times p)$ data matrix ${\bf X}$ contains elements $x_{ij}$ where row index $i=1,2,\ldots,n$ refers to the data items/objects, and column index $j=1,2,\ldots,p$ refers to the variables.
The symbol is often used to denote the sample size (the number of data objects in the sample).
The population mean $\mu_X$ is written in terms of the expectation operator as $\mu_X=E(X)$ whereas the sample mean is denoted using an overline (e.g. $\overline{x}$ ). Sometimes climate studies using the quantum mechanics bra-ket notation to denote expectation but this non-standard practice should be avoided.

Next: Further reading Up: Basic probability concepts Previous: Expectation, (co-)variance, and correlation Contents

David Stephenson 2005-09-30