# Statistics - overview

Research in this group is carried out within research group areas of probability, statistics and statistical bioinformatics. These groups overlap considerably, and our research topics span them all.

We have internationally recognised expertise in shape analysis and related areas. Building on these strengths, we have established a Centre for Statistical Bioinformatics to encourage high profile interdisciplinary collaboration. We also collaborate with other subject areas across the University including environmental sciences, transport studies, engineering, biochemistry, epidemiology and biostatistics.

Our research areas include:

### Shape analysis

This is a large area of current research activity aiming to develop methods to describe shapes and to summarise succinctly variability and changes through time. The field is an extension of multivariate analysis but is complicated by the invariance of the shape of an object under translation, rotation and scale changes. Of particular interest is the case where the description of the outline of an object can be made using landmarks. Applications include automatic chromosome identification and analysis, fitting the outline of a human hand in a noisy image and the description of growth in mouse vertebrae. These projects involve the development of suitable mathematical models including the complex Bingham distribution, the use of deformable templates and analyses based on Kendall and Bookstein coordinates. There have been substantial developments in face recognition with plastic surgery in mind. Three-dimensional pictures of the head can be produced from a laser scanner. Two questions are, (i) How to average such pictures, (ii) How to predict the effect of changes in plastic surgery. In collaboration, our aim will be to produce a statistical summary of shape changes.

A continuing project with the Department of Anatomy is related to the effect of evolution on size and shape. This is examined through a selection process on mice through several generations using data on vertebrae available as digital images. Further research is needed on the optimum selection of landmarks and the analysis of the evolutionary hypothesis, as well as analysis without using landmarks. These are two-dimensional problems and their extension to three dimensions would be extremely useful in anatomy and anthropology.

### Spatial linear models

The spatial linear model is fundamental to a number of techniques used in image processing, for example, for locating gold/ore deposits, or creating maps. There are many unresolved problems in this area such as the behaviour of maximum likelihood estimators and predictors, and diagnostic tools. There are strong connections between kriging predictors for the spatial linear model and spline methods of interpolation and smoothing. The two-dimensional version of splines/kriging can be used to construct deformations of the plane, which are of key importance in shape analysis.

### Robustness

One of the classic problems of robustness theory involves the simultaneous estimation of location and scatter from a set of multivariate data. More work is needed to better understand questions of influence, uniqueness, and breakdown. A classic class of estimators used in multivariate robustness is the class of M-estimators. These have good local robustness properties (i.e. they are not sensitive to a few gross outliers in the data) but have poor breakdown (bounded by 1/(p+1)) in p dimensions. A new class of estimators called constrained M-estimators overcomes these theoretical problems at the price of being more difficult to compute in practice. More work is needed to understand and fine-tune the behaviour of these estimates.

### Classification

Classification or discrimination involves learning a rule whereby a new observation can be classified into a pre-defined class. Current approaches can be grouped into three historical strands: statistical, machine learning and neural network. The classical statistical methods make distributional assumptions. There are many others which are distribution free, and which require some regularisation so that the rule performs well on unseen data. Recent interest has focussed on the ability of classification methods to be generalized. For example, two related methods which are distribution free are the k-nearest neighbour classifier and the kernel density estimation approach. In both methods, there are several problems of importance: the choice of smoothing parameter(s) or k, and choice of appropriate metrics or selection of variables. These problems can be addressed by cross-validation methods, but this is computationally slow. An analysis of the relationship with a neural net approach (LVQ) should yield faster methods.

### Analysis of complex computer models

Computer codes are used in scientific research to study and predict the behaviour of complex systems. Their run times often make uncertainty and sensitivity analyses impractical because of the thousands of runs that are conventionally required, so efficient techniques have been developed based on a statistical representation of the code. Working alongside modellers from other disciplines we have employed and improved these techniques in a number of ways. The models we have worked with include an oil reservoir simulator, a global vegetation model and a jet engine simulator. Future challenges include statistically modelling the gap between computer simulators and reality, accounting for extreme events using simulators and handling stochastic simulator output.

### Bayesian statistics

Bayesian statistics provides a framework within which uncertainties in real-world problems can be quantified whether they are aleatory or epistemic. The basic Bayesian paradigm is the updating of prior knowledge given observed data. This simple idea leads to complicated philosophical and mathematical questions that have been the focus of much research attention over the past twenty years due to increased computing power and more examples of Bayesian methods being applied. Within our research group, a number of staff are working on areas that fall under this umbrella. Topics include: Bayesian networks and other probabilistic graphical models, development of efficient posterior sampling, nonparametric regression techniques, the quantification of expert beliefs (expert elicitation) and applied subjective analyses.

For further information on our research please visit our statistics research group pages.

### Research seminars and colloquia

All upcoming statistics seminars can be found in our events section.

- Mathematical Biology and Medicine seminars
- Probability, Stochastic Modelling and Financial Mathematics seminars
- Leeds Annual Statistical Research workshops
- Royal Statistical Society - Leeds/Bradford local group
- Colloquia

### Research team

If you are interested in collaborating with us or joining our research team, please get in touch. View all members of the statistics department.

### PhD projects

We have opportunities for prospective PhD students. Potential projects can be found in our postgraduate research opportunities directory.