Article |
Adaptive regularization using the entire solution surface
School of Statistics, University of Minnesota, 313 Ford Hall, 224 Church Street S. E., Minneapolis, Minnesota 55455, U.S.A. swu{at}stat.umn.edu xshen{at}stat.umn.edu charlie{at}stat.umn.edu
Received for publication 1 November 2007. Revision received 1 January 2009.
Several sparseness penalties have been suggested for delivery of good predictive performance in automatic variable selection within the framework of regularization. All assume that the true model is sparse. We propose a penalty, a convex combination of the L1- and L
-norms, that adapts to a variety of situations including sparseness and nonsparseness, grouping and nongrouping. The proposed penalty performs grouping and adaptive regularization. In addition, we introduce a novel homotopy algorithm utilizing subgradients for developing regularization solution surfaces involving multiple regularizers. This permits efficient computation and adaptive tuning. Numerical experiments are conducted using simulation. In simulated and real examples, the proposed penalty compares well against popular alternatives.
Key Words: Homotopy Lasso L1-norm L
-norm Subgradient Support vector machine Variable grouping and selection
References
-
Allgower E. L., Georg K. Introduction to Numerical Continuation Methods (2003) Philadelphia, PA: SIAM.
Bickel P. J., Ritov Y., Tsybakov A. B. Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. (2008) 37:1705–32.[CrossRef]
Bondell H. D., Reich B. J. Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics (2008) 64:115–23.[CrossRef][Web of Science][Medline]
Bradley P. S., Mangasarian O. L. Feature selection via concave minimization and support vector machines. In: Mach. Learn. Proc. Fifteenth Int. Conf.—Shavlik J. W., ed. (1998) San Francisco, CA: Morgan Kaufmann. 82–90.
Breiman L., Spector P. Submodel selection and evaluation in regression—the Xrandom case. Technometrics (1992) 60:291–319.
Candès E. J., Tao T. The Dantzig selector: statistical estimation when pis much larger than n (with discussion). Ann. Statist. (2007) 35:2313–51.[CrossRef]
Donoho D. L., Elad M., Temlyakov V. N. Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Info. Theory (2006) 52:6–18.[CrossRef]
Dudoit S., Fridlyand J., Speed T. P. Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Statist. Assoc. (2002) 97:77–87.[CrossRef][Web of Science]
Efron B., Johnstone I. M., Hastie T. J., Tibshirani R. J. Least angle regression (with discussion). Ann. Statist. (2004) 32:407–99.[CrossRef]
Fan J., Lv J. Sure independence screening for ultra-high dimensional feature space (with discussion). J. R. Statist. Soc. (2008) B 70:849–911.[CrossRef]
Golub T. R., Slonim D. K., Tamayo P., Huard C., Gaasenbeek M., Mesirov J. P., Coller H., Loh M., Downing J. R., Caligiuri M. A., Bloomfield C. D., Lander E. S. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science (1999) 286:513–36.
Hastie T. J., Rosset S., Tibshirani R. J., Zhu J. The entire regularization path for the support vector machine. J. Mach. Learn. Res. (2004) 5:1391–415.
Hoerl A. E., Kennard R. W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics (1970) 12:55–67.[CrossRef][Web of Science]
Liu Y., Wu Y. Variable selection via a combination of the l0and l1 penalties. J. Comp. Graph. Statist. (2007) 16:782–98.[CrossRef]
Park M. Y., Hastie T. J. An l1regularization-path algorithm for generalized linear models. J. R. Statist. Soc. (2007) B 69:659–77.[CrossRef]
Rockafellar R. T. Convex Analysis (1970) Princeton, NJ: Princeton University Press.
Rockafellar R. T., Wets R. J. Variational Analysis (2003) Berlin: Springer.
Rosset S., Zhu J. Piecewise linear regularized solution paths. Ann. Statist. (2007) 35:1012–30.[CrossRef]
Shen X., Huang H. C. Optimal model assessment, selection and combination. J. Am. Statist. Assoc. (2006) 101:554–68.[CrossRef][Web of Science]
Stein C. M. Estimation of the mean of a multivariate normal distribution. Ann. Statist. (1981) 9:1135–51.[CrossRef]
Tibshirani R. J. Regression shrinkage and selection via the lasso. J. R. Statist. Soc. (1996) B 58:267–88.
Vapnik V. N. The Nature of Statistical Learning Theory (1995) New York: Springer.
Wang J., Shen X. Estimation of generalization error: Random and fixed inputs. Statist. Sinica (2006a) 16:569–88.
Wang L., Shen X. Multi-category support vector machines, feature selection and solution path. Statist. Sinica (2006b) 16:617–33.
Wang L., Shen X. On l1-norm multi-class support vector machines: Methodology and theory. J. Am. Statist. Assoc. (2007) 102:583–94.[CrossRef][Web of Science]
Wang L., Zhu J., Zou H. The doubly regularized support vector machine. Statist. Sinica (2006) 16:589–616.
Wang S., Zhu J. Variable selection for model-based high-dimensional clustering and its application to microarray data. Biometrics (2008) 64:440–8.[CrossRef][Web of Science][Medline]
Wolberg W. H., Mangasarian O. L. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Nat. Acad. Sci. (1990) 87:9193–6.
Yuan M., Lin Y. Model selection and estimation in regression with grouped variables. J. R. Statist. Soc. (2006) B 68:49–67.[CrossRef]
Zhu J., Rosset S., Hastie T. J., Tibshirani R. J. 1-Norm support vector machines. In: Neural Information Processing Systems—Thrun S., Saul L. K., Schölkopf B., eds. (2004) Cambridge: MIT Press.
Zou H., Hastie T. J. Regularization and variable selection via the elastic net. J. R. Statist. Soc. (2005) B 67:301–20.[CrossRef]
Zou H., Hastie T. J., Tibshirani R. J. On the `degrees of freedom' of the lasso. Ann. Statist. (2007) 35:2173–92.[CrossRef]
| ||||||||||||||||||||||||||||||||||||||||||||||||