Next: Summary of statistical notation Up: Basic probability concepts Previous: Odds Contents

Expectation, (co-)variance, and correlation

If probabilities are known for all events in event space, it is possible to calculate the expectation (population mean) of a random variable

$\displaystyle E(X)$

$\displaystyle =$

$\displaystyle \sum_{i} x_i \Pr(A_i) =\mu_X$

(3.3)

where

is the value taken by the random variable

for event

; i.e. $A_i = \{X = x_i\}$ . As an example, if there is a one in a thousand chance of winning a lottery prize of £1500 and each lottery ticket costs £2 then the expectation (expected long term profit) is -£0.50= $0.001\times$ £(1500-2) $)+0.999\times($ -£2

. A useful property of expectation is that the expectation of any linear combination of two random variables is simply the linear combination of their respective expecations

$\displaystyle E(aX+bY)$

$\displaystyle =$

$\displaystyle aE(X)+bE(Y)$

(3.4)

where

and

are (non-random) constants. Note also that

if X and Y are independent random variables.

The expectation can also be used to define the population variance

$\displaystyle Var(X)$

$\displaystyle =$

$\displaystyle E((X-E(X))^2)=E(X^2)-(E(X))^2=\sigma_X^2$

(3.5)

which provides a very useful measure of the overall uncertainty in the random variable. The variance of a linear combination of two random variables is given by

$\displaystyle Var(aX+bY)$

$\displaystyle =$

$\displaystyle a^2Var(X)+b^2Var(Y)+2abCov(X,Y)$

(3.6)

where

and

are (non-random) constants. The quantity

is known as the covariance of

and

and is equal to zero for independent variables. The covariance can be expressed as

$\displaystyle Cov(X,Y)$

$\displaystyle =$

$\displaystyle Cor(X,Y)\sqrt{Var(X)Var(Y)}$

(3.7)

where

is a dimensionless number lying between -1 and 1 known as the correlation between

and

. Correlation is widely used to measure the amount of linear association between two variables.

Note that the quantities and refer specifically to population parameters and NOT sample means and variances. To avoid confusion the sample mean of an observed variable is denoted by $\overline{x}$ and the sample variance is denoted by . Sample covariance is denoted $s_{xy}$ and sample correlation is denoted by $r_{xy}$ . These provide estimates of the population quantities but should never be confused with them !

Next: Summary of statistical notation Up: Basic probability concepts Previous: Odds Contents

David Stephenson 2005-09-30