next up previous contents
Next: Summary of statistical notation Up: Basic probability concepts Previous: Odds   Contents

Expectation, (co-)variance, and correlation

If probabilities are known for all events in event space, it is possible to calculate the expectation (population mean) of a random variable $ X$
$\displaystyle E(X)$ $\displaystyle =$ $\displaystyle \sum_{i} x_i \Pr(A_i) =\mu_X$ (3.3)

where $ x_i$ is the value taken by the random variable $ X$ for event $ A_i$; i.e. $ A_i = \{X = x_i\}$. As an example, if there is a one in a thousand chance of winning a lottery prize of £1500 and each lottery ticket costs £2 then the expectation (expected long term profit) is -£0.50= $ 0.001\times$£(1500-2) $ )+0.999\times($-£2$ )$. A useful property of expectation is that the expectation of any linear combination of two random variables is simply the linear combination of their respective expecations
$\displaystyle E(aX+bY)$ $\displaystyle =$ $\displaystyle aE(X)+bE(Y)$ (3.4)

where $ a$ and $ b$ are (non-random) constants. Note also that $ E(XY)=E(X)E(Y)$ if X and Y are independent random variables.

The expectation can also be used to define the population variance

$\displaystyle Var(X)$ $\displaystyle =$ $\displaystyle E((X-E(X))^2)=E(X^2)-(E(X))^2=\sigma_X^2$ (3.5)

which provides a very useful measure of the overall uncertainty in the random variable. The variance of a linear combination of two random variables is given by
$\displaystyle Var(aX+bY)$ $\displaystyle =$ $\displaystyle a^2Var(X)+b^2Var(Y)+2abCov(X,Y)$ (3.6)

where $ a$ and $ b$ are (non-random) constants. The quantity $ Cov(X,Y)=E((X-E(X))(Y-E(Y)))$ is known as the covariance of $ X$ and $ Y$ and is equal to zero for independent variables. The covariance can be expressed as
$\displaystyle Cov(X,Y)$ $\displaystyle =$ $\displaystyle Cor(X,Y)\sqrt{Var(X)Var(Y)}$ (3.7)

where $ Cor(X,Y)$ is a dimensionless number lying between -1 and 1 known as the correlation between $ X$ and $ Y$. Correlation is widely used to measure the amount of linear association between two variables.

Note that the quantities $ E(.)$ and $ Var(.)$ refer specifically to population parameters and NOT sample means and variances. To avoid confusion the sample mean of an observed variable $ x$ is denoted by $ \overline{x} $ and the sample variance is denoted by $ s_x^2$. Sample covariance is denoted $ s_{xy}$ and sample correlation is denoted by $ r_{xy}$. These provide estimates of the population quantities but should never be confused with them !


next up previous contents
Next: Summary of statistical notation Up: Basic probability concepts Previous: Odds   Contents
David Stephenson 2005-09-30