## Machine Learning Group

Prof. Richard Everson
Dr. Jonathan Fieldsend

Image and video compression.
Yousra Almathami
Tracking and prediction.
Dr. Jacq Christmas
Statistical data analysis and modelling.
Andrew Clark
Sparse modelling, multi-objective optimisation, Bioinformatics, Approximate Bayesian Computation.
Mark Cook
Richard Fredlund
Active learning.
Joao Leitao
Conditional random fields.
Will Reckhouse
David Walker
Multi-objective optimisation, high-dimensional visualisation.

# Research overview

A common theme in all the research areas described below is how we handle the uncertainty inherent in most applications. In all these areas we seek to apply statistical methods to determine not just estimates of the answers to our questions, but also to quantify the amount of error there might be in those estimates.

Below is a brief summary of some of the projects we have been working on. All of them use one or more of following methods:

• Statistical data analysis and modelling
• Search and optimisation
• Multi-objective optimisation
• Signal processing
• Time-series analysis
• Change-point analysis
• Clustering
• Active learning
• Visualisation

## Accident and Emergency (A&E) patient care outcome

Given measurements of the vital signs of a patient arriving at A&E in an ambulance, and historical records from other patients, we have developed methods for working out the probability that this patient will die, or predicting their quality of life after discharge from hospital.

## Finding underlying structure in EEG traces

Given the EEG traces (measurements of the electrical signals generated by brain activity) from a number of individuals who are all performing the same mental task, we have been developing methods for identifying and isolating the underlying signal caused by the brain working on that task.

## Predicting missing data in satellite images

Given a series of satellite images which show the amount of phytoplankton in the surface of the same stretch of ocean taken over a period of time, we have developed methods for estimating the phytoplankton levels in the areas of the images which are obscured by cloud.

## Face recognition

Given a set of example photographs of different (known) individuals, we have been developing methods for identifying the individual in a new photograph.

## Identification of fraudulent credit card transactions

We have developed methods for optimising a credit card fraud system to identify fraudulent transactions whilst minimising the number of genuine transactions queried.

## Visualising complex data

People are used to drawing two- or sometimes three-dimensional graphs for visualising data, but this becomes impossible when there are four or more variables to consider. We have been developing methods for visualising and understanding data where there are many variables.

## Air traffic control collision alert system

There are many occasions in business where there are multiple conflicting objectives to be optimised; quality versus cost is a common one. UK air traffic control uses a system to alert controllers when aircraft are likely to come into conflict with one another. This system is tuned by the setting of hundreds of parameters. It is unable to be certain that a conflict will occur, so the parameters must be set to optimise the trade-off between maximising the number of true alerts and minimising the number of false alerts. We have developed methods for finding the optimal trade-off and visualising the effects of changing the parameter values, enabling the controllers to determine the optimal settings for their selected trade-off.

## Identifying changes to mobile phone network KPIs

In order to alter the performance of a complicated mobile phone network, a company can change one or more of the hundreds of parameters that control it. Having applied the changes they then want to identify the overall effect they have had on the 28 Key Performance Indicators (KPIs) of the network. We have developed a method which identifies change-points in the KPIs.

## Searching for genetic causes of type 2 diabetes

In monogenic diseases, such as cystic fibrosis, one genetic marker completely predicts a person’s disease status; in polygenic diseases genetic markers increase or decrease the probability of disease status; in epistatic diseases there is a complicated interaction between different genetic markers which increase or decrease the probability of disease status. Type 2 diabetes is polygenic and may be epistatic. We have developed a method for identifying the combination of genetic markers that increase or decrease a person’s likelihood of contracting type 2 diabetes, given the genetic makeup of one chromosome for each of a set of cases and controls.

## Chemical analysis of wine varieties

Given the results of a chemical analysis of wines grown in the same region in Italy, we have developed methods for identifying which wines are similar to each other in some sense so that their grape varieties may be deduced.

## The “cocktail party problem”

Given the soundtracks recorded by a number of microphones in a room in which a group of people are talking, we have developed methods for splitting out the individual voices.

## Active learning

In order to teach a computer to identify patterns in data it must be given a set of training examples. It can be very expensive and/or time-consuming to gather the examples. The aim of active learning is to determine which new examples will give the most benefit when added to the training set, before the expense of gathering them is incurred. We have developed a novel, principled framework for understanding this paradigm and are beginning to apply it to practical problems.