© 1999 by Biometrika Trust
The choice of variables in multivariate regression: a non-conjugate Bayesian decision theory approach
A1 Institute of Mathematics and Statistics, University of Kent at Canterbury, Canterbury, Kent CT2 7NF, UK E-mail: philip.j.brown@ukc.ac.uk A2 Department of Statistical Science, University College London, London WC1E 6BT, UK E-mail: tom@stats.ucl.ac.uk A3 Department of Statistics, Texas A&M University, College Station, TX 77843-3143, USA E-mail: mvannucci@stat.tamu.edu
We consider the choice of explanatory variables in multivariate linear regression. Our approach balances prediction accuracy against costs attached to variables in a multivariate version of a decision theory approach pioneered by Lindley (1968). We also employ a non-conjugate proper prior distribution for the parameters of the regression model, extending the standard normal-inverse Wishart by adding a component of error which is unexplainable by any number of predictor variables, thus avoiding the determinism identified by Dawid (1988). Simulated annealing and fast updating algorithms are used to search for good subsets when there are very many regressors. The technique is illustrated on a near infrared spectroscopy example involving 39 observations and 300 explanatory variables. This demonstrates the effectiveness of multivariate regression as opposed to separate univariate regressions. It also emphasises that within a Bayesian framework more variables than observations can be utilised.
Key Words: Bayesian decision theory; Determinism; Multivariate regression; Near infrared spectroscopy; Non-conjugate distribution; Prediction; Quadratic loss; Simulated annealing; Utility