Skip Navigation


Biometrika Advance Access originally published online on April 1, 2009
Biometrika 2009 96(2):469-478; doi:10.1093/biomet/asp007
This Article
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
96/2/469    most recent
asp007v1
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Chan, Y.-B.
Right arrow Articles by Hall, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2009 Biometrika Trust

Article

Scale adjustments for classifiers in high-dimensional, low sample size settings

Yao-Ban Chan and Peter Hall

Department of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria 3010, Australia y.chan{at}ms.unimelb.edu.au P.Hall{at}ms.unimelb.edu.au

Received for publication 1 October 2007. Revision received 1 September 2008.
   Abstract

Distance-based classifiers are generally considered to be effective at discriminating between populations that differ in location. Indeed, nearest-neighbour methods and the support vector machine are frequently used in very high-dimensional problems involving gene expression data, where it is believed that elevated levels of expression convey much of the information for classification. However, one problem inherent to distance-based classifiers is that scale differences can mask location differences. In consequence, such classifiers can have poor performance if the information for classification accumulates through a large number of relatively small location differences in data components, rather than via large differences. In this paper, we show that a simple adjustment for scale, applicable to a variety of distance-based classifiers, can remedy the problem. For some classifiers, such as those based on the support vector machine or the centroid method, scale corrections are important primarily in the case of small training-sample sizes. However, for other classifiers, including those based on nearest-neighbour and average-distance methods, scale adjustments are helpful more generally.

Key Words: Average-distance classifier • Centroid method • Distance-based classifier • Location difference • Nearest-neighbour method • Support vector machine


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.