## Jacq Christmas

**EMail:**
J.T.Christmas@exeter.ac.uk
**PhD, Machine Learning** *University of Exeter, 2011.* **MSc, Applied Artificial Intelligence** *University of Exeter, 2007.* **Worked in industry** *Johnson & Johnson, Tesco, Kodak, JP Morgan, etc.* **BSc (Hons), Computer Science** *University of Exeter, ????.*

# Statistical data analysis and modelling

## Locating and tracking objects in videos

**Fig 0**: An example frame from a cricket video showing a tagged cricket near its burrow. The blue ellipse identifies where the tracking algorithm has located the cricket.

The aim of this research project is to track an entire population of wild field crickets (*Gryllus campestris*), where the genetic code of each individual is known, in order to study the
interaction between behaviour and genetics. The crickets are too small to attach a tracking device to, so to study their natural behaviour a human-readable tag is secured to them when they achieve adulthood and any movement in the area around their burrow is captured on video.

The result is tens of thousands of hours of video which need to be analysed to determine whether crickets are present and, if so, what their tag identifications are, where they go and how they interact.

A summary of the procedure for processing a single video clip is as follows:

- preprocess the video to identify foreground pixels (based on movement)
- track the cricket(s) through each frame
- home in on the location of the tag
- identify the characters on the tag

Figure 0 shows an example frame from a good quality video of a tagged cricket, overlaid with a blue ellipse generated by the tracking algorithm.

## Satellite image reconstruction

NASA’s Sea-viewing Wide Field-of-view Sensor (SeaWiFS) platform's mission is to measure the concentration of phytoplankton, and hence chlorophyll, in the surface of the oceans, which is important to the understanding of the Earth’s carbon cycle. The chlorophyll in plankton absorb certain wavelengths of sunlight causing the light reflected back to SeaWiFS to be reduced at those wavelengths. One of those wavelengths, which is close to the maximum absorption for chlorophyll, occurs at 443nm.

Figure 1 shows one of the 39 SeaWiFS 443nm images supplied by NASA. Each shows the same stretch of coastline in the northern region of the Gulf of Mexico, around the city of New Orleans. In each of the images some parts of the sea are obscured by cloud and so the absorption data is missing in those areas. The aim is to estimate the missing values and to calculate a measure of how accurate these estimates are believed to be.

Using a method similar to that of Everson and Sirovich (1995) the estimates and variances were calculated using Probabilistic PCA (PPCA; Tipping & Bishop 1999).

## Robust autogression

Autoregression (AR) is a tool commonly used to understand and predict time series data. Traditionally the excitation noise is modelled as a Gaussian. However, real-world data may not be Gaussian in nature, and it is known that Gaussian models are adversely affected by the presence of outliers. We introduce a Bayesian AR model in which the excitation noise is assumed to be Student-t distributed. Variational Bayesian approximations to the posterior distributions of the model parameters are used to overcome the intractable integrations inherent in the Bayesian model. Independent Automatic Relevance Determination (ARD) priors over each of the AR coefficients are used to estimate the model order.

Figure 4 compares the AR coefficients calculated using the Gaussian and Student-t assumptions for the excitation noise for a set of EEG signals.

**For details see** J. Christmas & R.M. Everson: "Robust autoregression: Student-t innovations using variational Bayes", in IEEE Transactions on Signal Processing, in press 2010. Matlab code for this model is here.

**Fig 4**: Comparison of coefficients calculated for a set of 58 EEG traces using (left) Gaussian AR and (right) Student-t AR. Note how much more homogeneous the Student-t AR coefficients are; an important consequence of this is that power spectral densities calculated from the AR coefficients estimated with Student-t excitations are considerably more consistent across a subject than estimates using Gaussian excitations.

## Modelling multivariate time series data

While a set of observations modelled using PPCA may have some temporal ordering (as they do in the series of SeaWiFS satellite images above), changing the ordering of the observations makes no difference to the principal components. In order to capture this temporal information we have created a new model, PPCA-AR, which combines PPCA with another linear model, autoregression (AR). Figure 5 shows the improved prediction of missing data of this spatial-temporal model compared with the purely spatial PPCA model.

**For details see** J. Christmas & R.M. Everson: "Temporally Coupled Principal Component Analysis: A Probabilistic Autoregression Method", accepted for publication in the Proceedings of 2010 IEEE World Congress on Computational Intelligence, Barcelona, July 18-23 2010.

**Fig 5**: Reconstruction of missing data by two different PPCA models (light and dark blue), compared with PPCA-AR (red) for two dimensions (those that have the greatest and smallest variances respectively) from 22-dimensional data. Vertical grey stripes indicate where values are missing and the black line shows the actual values. The bar chart (bottom) shows the total number of values missing across all dimensions.

## Other interests

- Have applied ant colony optimisation (ACO) to the problem of finding polygenic and epistatic associations for type 2 diabetes within a chromosome. The algorithm found two previously-known and one possible, previously-unknown association.
- Am developing a genetic program (GP) which creates C-like programs.

## Papers

- J. Christmas & R.M. Everson: "Robust autoregression: Student-t innovations using variational Bayes",
*IEEE Transactions on Signal Processing*, vol 59, pp 48-57, 2011. - J. Christmas, E.C. Keedwell, T.M. Frayling, J.R.B. Perry: Ant colony optimisation to identify genetic variant association with type 2 diabetes,
*Information Sciences*, vol 181, issue 9, pp 1609-1622, 2011. - J. Christmas & R.M. Everson: "Temporally Coupled Principal Component Analysis: A Probabilistic Autoregression Method",
*Proceedings of the International Joint Conference on Neural Networks (IJCNN)*, Barcelona, July 18-23 2010.

## Acknowledgements

Many thanks to

- Dr Sonia Gallegos, US Naval Research Laboratory at NASA’s Stennis Space Center, Mississippi, for providing the SeaWiFS images.
- Dr Aureliu Lavric, School of Psychology, University of Exeter, for providing the EEG data.