Skip Navigation

Biometrika 2009 96(2):307-322; doi:10.1093/biomet/asp016
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Wang, S.
Right arrow Articles by Zhu, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2009 Biometrika Trust

Article

Hierarchically penalized Cox regression with grouped variables

S. Wang and B. Nan

Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A. sijiwang{at}umich.edu bnan{at}umich.edu

N. Zhu and J. Zhu

Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A. nfzhou{at}umich.edu jizhu{at}umich.edu

Received for publication 1 December 2007. Revision received 1 October 2008.

In many biological and other scientific applications, predictors are often naturally grouped. For example, in biological applications, assayed genes or proteins are grouped by biological roles or biological pathways. When studying the dependence of survival outcome on these grouped predictors, it is desirable to select variables at both the group level and the within-group level. In this article, we develop a new method to address the group variable selection problem in the Cox proportional hazards model. Our method not only effectively removes unimportant groups, but also maintains the flexibility of selecting variables within the identified groups. We also show that the new method offers the potential for achieving the asymptotic oracle property.

Key Words: Cox model • Group variable selection • Lasso • Microarray • Oracle property • Regularization



References

    Andersen P. K., Gill R. D. Cox's regression model for counting processes: a large sample study. Ann. Statist. (1982) 10:1100–20.[CrossRef]

    Antoniadis A., Fan J. Regularization of wavelet approximations (with discussions). J. Am. Statist. Assoc. (2001) 96:939–67.[CrossRef][Web of Science]

    Breiman L. Better subset regression using the non-negative garrote. Technometrics (1995) 37:373–84.[CrossRef][Web of Science]

    Breslow N. Covariance analysis of censored survival data. Biometrics (1974) 30:89–99.[CrossRef][Web of Science][Medline]

    Cai T. Discussion of "Regularization of wavelet approximations," by Antoniadis & Fan. J. Am. Statist. Assoc. (2001) 96:960–2.[Web of Science]

    Cox D. R. Regression models and life-tables (with discussion). J. R. Statist. Soc. (1972) 34:187–220.

    Fan J., Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Statist. Assoc. (2001) 96:1348–60.[CrossRef][Web of Science]

    Fan J., Li R. Variable selection for Cox's proportional hazards model and frailty model. Ann. Statist. (2002) 30:74–99.[CrossRef]

    Frank I. E., Friedman J. H. A statistical view of some chemometrics regression tools (with discussion). Technometrics (1993) 35:109–48.[CrossRef][Web of Science]

    Gui J., Li H. Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics (2005) 21:3001–8.[Abstract/Free Full Text]

    Kanehisa M., Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. (2002) 28:27–30.

    Luan Y., Li H. Group additive regression models for genomic data analysis. Biostatistics (2008) 9:100–13.[Abstract/Free Full Text]

    Miller L. D., Smeds J., George J., Vega V. B., Vergara L., Ploner A., Pawitan Y., Hall P., Klaar S., Liu E. T., Bergh J. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc. Nat. Acad. Sci. (2005) 102:13550–5.[Abstract/Free Full Text]

    Park M. Y., Hastie T. L1-regularization path algorithm for generalized linear models. J. R. Statist. Soc. (2007) 69:659–77.[CrossRef]

    Shen X., Ye J. Adaptive model selection. J. Am. Statist. Assoc. (2002) 97:210–21.[CrossRef][Web of Science]

    Sotiriou C., Wirapati P., Loi S., Harris A., Fox S., Smeds J., Nordgren H., Farmer P., Praz V., Haibe-Kains B., Desmedt C., Larsimont D., Cardoso F., Peterse H., Nuyten D., Buyse M., Van de Vijver M. J., Bergh J., Piccart M., Delorenzi M. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J. Nat. Cancer Inst. (2006) 98:262–72.[Abstract/Free Full Text]

    The Gene Ontology Consortiu. Gene ontology: tool for the unification of biology. Nat. Genet. (2000) 25:259.[CrossRef][Web of Science][Medline]

    Tibshirani R. Regression shrinkage and selection via the lasso. J. R. Statist. Soc. (1996) 58:267–88.

    Tibshirani R. The lasso method for variable selection in the Cox model. Statist. Med. (1997) 16:385–95.[CrossRef]

    Wang H., Li G., Tsai C. L. Regression coefficient and autoregressive order shrinkage and selection via the lasso. J. R. Statist. Soc. (2007) 69:63–78.

    Wei Z., Li H. Nonparametric pathway-based regression models for analysis of genomic data. Biostatistics (2007) 8:265–84.[Abstract/Free Full Text]

    Yuan M., Lin Y. Model selection and estimation in regression with grouped variables. J. R. Statist. Soc. (2006) B 68:49–67.[CrossRef]

    Yuan M., Lin Y. On the nonnegative garrote estimator. J. R. Statist. Soc. (2007) 69:143–61.[CrossRef]

    Zhang H. H., Liu Y., Wu Y., Zhu J. Variable selection for multicategory SVM via sup-norm regularization. Electron. J. Statist. (2006) 2:149–67.[CrossRef]

    Zhang H. H., Lu W. Adaptive-lasso for Cox's proportional hazard model. Biometrika (2007) 94:691–703.[Abstract/Free Full Text]

    Zhao P., Yu B. On model selection consistency of lasso. J. Mach. Learn. Res. (2006) 7:2541–63.[Web of Science]

    Zou H. The adaptive lasso and its oracle properties. J. Am. Statist. Assoc. (2006) 101:1418–29.[CrossRef][Web of Science]

    Zou H. A note on path-based variable selection in the penalized proportional hazards model. Biometrika (2008) 95:241–7.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Wang, S.
Right arrow Articles by Zhu, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?