Biometrika Advance Access originally published online on September 24, 2009
Biometrika 2009 96(4):835-845; doi:10.1093/biomet/asp047
Article |
Bayesian lasso regression
Department of Statistics, The Ohio State University, Columbus, Ohio 43210, U.S.A. hans{at}stat.osu.edu
Received for publication 1 July 2008. Revision received 1 March 2009.
The lasso estimate for linear regression corresponds to a posterior mode when independent, double-exponential prior distributions are placed on the regression coefficients. This paper introduces new aspects of the broader Bayesian treatment of lasso regression. A direct characterization of the regression coefficients posterior distribution is provided, and computation and inference under this characterization is shown to be straightforward. Emphasis is placed on point estimation using the posterior mean, which facilitates prediction of future observations via the posterior predictive distribution. It is shown that the standard lasso prediction method does not necessarily agree with model-based, Bayesian predictions. A new Gibbs sampler for Bayesian lasso regression is introduced.
Key Words: Double-exponential distribution Gibbs sampler L1 penalty Laplace distribution Markov chain Monte Carlo Posterior predictive distribution Regularization
References
-
Andrews D., Mallows C. Scale mixtures of normal distributions. J. R. Statist. Soc. (1974) B 36:99–102.
Bernardo J., Smith A. Bayesian Theory (2000) Chichester: Wiley.
Carlin B., Polson N. Inference for nonconjugate Bayesian models using the Gibbs sampler. Can. J. Statist. (1991) 19:399–405.[CrossRef]
Carlin B., Polson N., Stoffer D. A Monte Carlo approach to nonnormal and nonlinear state-space modeling. J. Am. Statist. Assoc. (1992) 87:493–500.[CrossRef][Web of Science]
Efron B., Hastie T., Johnstone I., Tibshirani R. Least angle regression. Ann. Statist. (2004) 32:407–99.[CrossRef]
Fernández C., Steel M. Bayesian regression analysis with scale mixtures of normals. Economet. Theory (2000) 16:80–101.
Geweke J. Efficient simulation from the multivariate normal and student-t distributions subject to linear constraints. In: Computer Science and Statistics: Proceedings of the 23rd Symposium on the Interface (1991) Alexandria, VA: American Statistical Association. 571–8.
Gilks W., Wild P. Adaptive rejection sampling for Gibbs sampling. Appl. Statist. (1992) 41:337–48.[CrossRef]
Mitchell A. A note on posterior moments for a normal mean with double-exponential prior. J. R. Statist. Soc. (1994) B 56:605–10.
Park T., Casella G. The Bayesian lasso. J. Am. Statist. Assoc. (2008) 103:681–6.[CrossRef][Web of Science]
Pericchi L., Smith A. Exact and approximate posterior moments for a normal location parameter. J. R. Statist. Soc. (1992) B 54:793–804.
Pericchi L., Walley P. Robust Bayesian credible intervals and prior ignorance. Int. Statist. Rev. (1991) 59:1–23.[CrossRef]
Spiegelhalter D. A test for normality against symmetric alternatives. Biometrika (1977) 64:415–8.
Tanner M., Wong W. The calculation of posterior densities by data augmentation (with discussion). J. Am. Statist. Assoc. (1987) 82:528–50.[CrossRef][Web of Science]
Tibshirani R. Regression shrinkage and selection via the lasso. J. R. Statist. Soc. (1996) B 58:267–88.
West M. On scale mixtures of normal distributions. Biometrika (1987) 74:646–8.
| ||||||||||||||||||||||||||||||||||||||||||||||