Skip Navigation

Biometrika 2008 95(3):759-771; doi:10.1093/biomet/asn034
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Chen, J.
Right arrow Articles by Chen, Z.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2008 Biometrika Trust

Articles

Extended Bayesian information criteria for model selection with large model spaces

Jiahua Chen

Department of Statistics, University of British Columbia, Vancouver, British Columbia V6T 1Z2, Canada jhchen{at}stat.ubc.ca

Zehua Chen

Department of Statistics and Applied Probability, National University of Singapore, Singapore 117546 stachenz{at}nus.edu.sg

Received for publication 1 January 2007. Revision received 1 January 2008.

The ordinary Bayesian information criterion is too liberal for model selection when the model space is large. In this paper, we re-examine the Bayesian paradigm for model selection and propose an extended family of Bayesian information criteria, which take into account both the number of unknown parameters and the complexity of the model space. Their consistency is established, in particular allowing the number of covariates to increase to infinity with the sample size. Their performance in various situations is evaluated by simulation studies. It is demonstrated that the extended Bayesian information criteria incur a small loss in the positive selection rate but tightly control the false discovery rate, a desirable property in many applications. The extended Bayesian information criteria are extremely useful for variable selection in problems with a moderate sample size but with a huge number of covariates, especially in genome-wide association studies, which are now an active area in genetics research.

Key Words: Bayesian paradigm • Consistency • Genome-wide association study • Tournament approach • Variable selection



References

    Akaike H. Information theory and an extension of the maximum likelihood principle. In: 2nd International Symposium on Information Theory—Petrox B. N., Caski F., eds. (1973) Budapest: Akademiai Kiado. 267–81.

    Benjamini Y., Hochberg Y. Controlling the false discovery rate—A practical and powerful approach to multiple testing. J. R. Statist. Soc. (1995) B 57:289–300.

    Berger J. O., Pericchi L. R. Objective Bayesian methods for model selection: Introduction and comparison. In: Model Selection—Lahiri P., ed. (2001) Hayward, CA: Inst. Math. Statist. 135–207. Lecture Notes Monograph Series Volume 38.

    Bogdan M., Doerge R., Ghosh J. K. Modifying the Schwarz Bayesian information criterion to locate multiple interacting quantitative trait loci. Genetics (2004) 167:989–99.[Abstract/Free Full Text]

    Broman K. W., Speed T. P. A model selection approach for the identification of quantitative trait loci in experimental crosses. J. R. Statist. Soc. (2002) B 64:641–56.[CrossRef]

    Clyde M. A., Berger J. O., Bullard F., Ford E. B., Jefferys W. H., Luo R., Paulo R., Loredo T. Current challenges in Bayesian model choice. Statistical Challenges in Modern Astronomy IV—Babu G. F., Feigelson E. D., eds. (2007) San Francisco. 224–40. Astronomical Society of the Pacific Conference Series Volume 371.

    Craven P., Wahba G. Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. (1979) 31:377–403.[CrossRef]

    Csörgö M., Horváth L. Limit Theorems in Change-Point Analysis (1997) New York: John Wiley & Sons.

    Fan J., Li R. Variable selection via non-concave penalized likelihood and its oracle properties. J. Am. Statist. Assoc. (2001) 96:1348–60.[CrossRef][Web of Science]

    Li K.-C. Asymptotic optimality for Cp, CL, cross-validation and generalized cross-validation: Discrete index set. Ann. Statist. (1987) 15:958–75.[CrossRef]

    Marchini J., Donnelly P., Cardon L. R. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature Genet. (2005) 37:413–7.[CrossRef][Web of Science][Medline]

    Meinshausen N., Bühlmann P. High-dimensional graphs and variable selection with the Lasso. Ann. Statist. (2006) 34:1436–62.[CrossRef]

    Rao C. R., Wu Y. H. A strongly consistent procedure for model selection in a regression problem. Biometrika (1989) 76:369–74.[Abstract/Free Full Text]

    Schwarz G. Estimating the dimension of a model. Ann. Statist. (1978) 6:461–4.[CrossRef]

    Shao J. An asymptotic theory for linear model selection. Statist. Sinica (1997) 7:221–64.

    Siegmund D. Model selection in irregular problems: Application to mapping quantitative trait loci. Biometrika (2004) 91:785–800.[Abstract/Free Full Text]

    Stone M. Cross-validatory choice and assessment of statistical predictions (with Discussion). J. R. Statist. Soc. (1974) B 39:111–47.

    Tibshirani R. Regression shrinkage and selection via the Lasso. J. R. Statist. Soc. (1996) B 58:267–88.

    Wilks S. S. The large sample distribution of the likelihood ratio for testing composite hypotheses. Ann. Statist. (1938) 9:60–2.[CrossRef]

    Yao Y. C. Estimating the number of change-points via Schwartz criterion. Statist. Prob. Lett. (1988) 6:181–9.[CrossRef]

    Zhang C. H., Huang J. The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. (2008) 36:1567–94.[CrossRef]

    Zhao P., Yu B. On model selection consistency of Lasso. J. Mach. Learn. Res. (2006) 7:2541–67.[Web of Science]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
GeneticsHome page
W. Li and Z. Chen
Multiple-Interval Mapping for Quantitative Trait Loci With a Spike in the Trait Distribution
Genetics, May 1, 2009; 182(1): 337 - 342.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Chen, J.
Right arrow Articles by Chen, Z.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?