Skip Navigation



Biometrika Advance Access published online on November 19, 2007

Biometrika, doi:10.1093/biomet/asm077
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Wang, J.
Right arrow Articles by Liu, Y.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2007 Biometrika Trust

Articles

Probability estimation for large-margin classifiers

Junhui Wang and Xiaotong Shen

School of Statistics, University of Minnesota, Minneapolis, Minnesota 55455, U.S.A. wangjh{at}stat.umn.edu, xshen{at}stat.umn.edu

Yufeng Liu

Department of Statistics and Operations Research, Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, North Carolina 27599, U.S.A. yfliu{at}email.unc.edu

Received for publication 1 April 2006. Revision received 1 May 2007.

Large margin classifiers have proven to be effective in delivering high predictive accuracy, particularly those focusing on the decision boundaries and bypassing the requirement of estimating the class probability given input for discrimination. As a result, these classifiers may not directly yield an estimated class probability, which is of interest itself. To overcome this difficulty, this article proposes a novel method for estimating the class probability through sequential classifications, by using features of interval estimation of large-margin classifiers. The method uses sequential classifications to bracket the class probability to yield an estimate up to the desired level of accuracy. The method is implemented for support vector machines and {psi}-learning, in addition to an estimated Kullback–Leibler loss for tuning. A solution path of the method is derived for support vector machines to reduce further its computational cost. Theoretical and numerical analyses indicate that the method is highly competitive against alternatives, especially when the dimension of the input greatly exceeds the sample size. Finally, an application to leukaemia data is described.

Key Words: Function estimation • High dimension and low sample size • Interval estimate • Tuning • Weighting



References

    Bartlett P., Tewari A. Sparseness vs estimating conditional probabilities: some asymptotic results. J. Mach. Learn. Res. (2007) 8:775–90.[Web of Science]

    Breiman L. The little bootstrap and other methods for dimensionality selection in regression: X-fixed prediction error. J. Am. Statist. Assoc. (1992) 87:738–54.[CrossRef][Web of Science]

    Breiman L., Spector P. Submodel selection and evaluation in regression—the X-Random case. Int. Rev. Statist. (1992) 3:291–319.

    Cortes C., Vapnik V. Support vector networks. Mach. Learn. (1995) 20:273–97.

    Efron B. The estimation of prediction error: covariance penalties and cross-validation (with Discussion). J. Am. Statist. Assoc. (2004) 99:619–42.[CrossRef][Web of Science]

    Golub T., Slonim D., Tamayo P., Huard C., Gaasenbeek M., Mesirov J., Coller H., Loh M., Downing J., Caligiuri M. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science (1999) 286:531–6.[Abstract/Free Full Text]

    Guyon I., Elisseff A. An introduction to variable and feature selection. J. Mach. Learn. Res. (2003) 3:1157–82.[CrossRef]

    Guyon I., Weston J., Vapnik V. Gene selection for cancer classification using support vector machine. Mach. Learn. (2002) 46:389–422.[CrossRef]

    Hastie T., Rosset S., Tibshirani R., Hz J. The entire regularization path for the support vector machine. J. Mach. Learn. Res. (2004) 5:1391–415.

    Jaakkola T., Diekhans M., Haussler D. Using the Fisher kernel method to detect remote protein homologies. In: Proc. Int. Conf. Intell. Syst. Mol. Biol.—Lengauer T., Schneider R., Bork P., Brutlag D., Glasgow J., Mewes H., Zimmer R., eds. (1999) Heidelberg, Germany: AAAI. 149–58.

    Kimeldorf G., Wahba G. Some results on Tchebycheffian spline functions. J. Math. Anal. Applic. (1971) 33:82–95.[CrossRef][Web of Science]

    Lin Y. Support vector machines and the Bayes rule in classification. Data Mining Know. Disc. (2002) 6:259–75.[CrossRef]

    Lin Y., Lee Y., Wahba G. Support vector machines for classification in nonstandard situations. Mach. Learn. (2002) 46:191–202.[CrossRef]

    Liu S., Shen X., Wong W. Computational development of {psi}-learning. In: Proc. 2005 SIAM Int. Conf. Data Mining—Kargupta H., Srivastava J., Kamath C., Goodman A., eds. (2005) Philadelphia: SIAM. 1–12.

    Liu Y., Shen X. Multicategory {psi}-learning. J. Am. Statist. Assoc. (2006) 101:500–9.[CrossRef][Web of Science]

    Liu Y., Shen X., Doss H. Multicategory {psi}-learning and support vector machine: computational tools. J. Comp. Graph. Statist. (2005) 14:219–36.[CrossRef]

    Platt J.C. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers—Smola A., Bartlett P., Scholkopf B., Schuurmans D., eds. (1999) Cambridge, MA: MIT Press. 61–74.

    Shen X., Huang H-C. Optimal model assessment, selection and combination. J. Am. Statist. Assoc. (2006) 101:554–68.[CrossRef][Web of Science]

    Shen X., Tseng G.C., Hang X., Wong W.H. On {psi}-learning. J. Am. Statist. Assoc. (2003) 98:724–34.[CrossRef][Web of Science]

    Shen X., Wang L. Discussion of ‘Local Rademacher complexities and oracle inequalities in risk minimization’ by V. Koltchinskii. Ann. Statist. (2006) 34:2677–80.[CrossRef]

    Shen X., Wong W.H. Convergence rate of sieve estimates. Ann. Statist. (1994) 22:580–615.

    Steinwart I. On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res. (2001) 2:67–93.[CrossRef][Web of Science]

    Steinwart I. Sparseness of support vector machines. J. Mach. Learn. Res. (2003) 4:1071–105.[CrossRef]

    Steinwart I., Scovel C. Fast rates for support vector machines using Gaussian kernels. Ann. Statist. (2007) 35:575–607.[CrossRef]

    Tsybakov A. Optimal aggregation of classifiers in statistical learning. Ann. Statist. (2004) 32:135–66.[CrossRef]

    Vapnik V. Statistical Learning Theory (1998) New York: Wiley.

    Wahba G. Spline Models for Observational Data (1990) Philadelphia: SIAM.

    Wang J., Shen X. Estimation of generalisation error: random and fixed inputs. Statist. Sinica (2006) 16:569–88.

    Zhou D.X. The covering number in learning theory. J. Complexity (2002) 18:739–67.[CrossRef]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Wang, J.
Right arrow Articles by Liu, Y.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?