Bayesian alignment using hierarchical models, with applications in protein bioinformatics
1 School of Mathematics, University of Bristol, Bristol BS8 1TW, U.K. p.j.green{at}bristol.ac.uk, 2 Department of Statistics, School of Mathematics, University of Leeds, Leeds LS2 9JT, U.K. k.v.mardia{at}leeds.ac.uk
An important problem in shape analysis is to match configurations of points in space after filtering out some geometrical transformation. In this paper we introduce hierarchical models for such tasks, in which the points in the configurations are either unlabelled or have at most a partial labelling constraining the matching, and in which some points may only appear in one of the configurations. We derive procedures for simultaneous inference about the matching and the transformation, using a Bayesian approach. Our hierarchical model is based on a Poisson process for hidden true point locations; this leads to considerable mathematical simplification and efficiency of implementation of EM and Markov chain Monte Carlo algorithms. We find a novel use for classical distributions from directional statistics in a conditionally conjugate specification for the case where the geometrical transformation includes an unknown rotation. Throughout, we focus on the case of affine or rigid motion transformations. Under a broad parametric family of loss functions, an optimal Bayesian point estimate of the matching matrix can be constructed that depends only on a single parameter of the family. Our methods are illustrated by two applications from bioinformatics. The first problem is of matching protein gels in two dimensions, and the second consists of aligning active sites of proteins in three dimensions. In the latter case, we also use information related to the grouping of the amino acids, as an example of a more general capability of our methodology to include partial labelling information. We discuss some open problems and suggest directions for future work.
Key Words: Bioinformatics; Markov chain Monte Carlo; Matching; Poisson process; Protein gel; Protein structure; Shape analysis; Von Mises-Fisher distribution.
Received August 2004. Revised December 2005.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
T. Hamelryck Probabilistic models and machine learning in structural bioinformatics Statistical Methods in Medical Research, October 1, 2009; 18(5): 505 - 526. [Abstract] [PDF] |
||||
![]() |
L. Xie, L. Xie, and P. E. Bourne A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery Bioinformatics, June 15, 2009; 25(12): i305 - i312. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.R. Davies, R.M. Jackson, K.V. Mardia, and C.C. Taylor The Poisson Index: a new probabilistic model for protein ligand binding site similarity Bioinformatics, November 15, 2007; 23(22): 3001 - 3008. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. J. Wilkinson Bayesian methods in bioinformatics and computational systems biology Brief Bioinform, April 12, 2007; (2007) bbm007v1. [Abstract] [Full Text] [PDF] |
||||


