Biometrika Advance Access originally published online on August 5, 2007
Biometrika 2007 94(3):760-766; doi:10.1093/biomet/asm050
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2007 Biometrika Trust
Miscellanea |
The high-dimension, low-sample-size geometric representation holds under mild conditions
Department of Statistics, University of Georgia, Athens, Georgia 30602, U.S.A.
Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, North Carolina 27599, U.S.A.
Department of Epidemiology and Health Policy Research, University of Florida, Gainesville, Florida 32610, U.S.A.
Department of Biostatistics, University of Washington, Seattle, Washington 98195, U.S.A.
jyahn{at}stat.uga.edu
marron{at}email.unc.edu
Keith.Muller{at}biostat.ufl.edu
yychi{at}u.washington.edu
Received for publication 1 July 2005.
Revision received 1 February 2007.
| Abstract |
|---|
High-dimension, low-small-sample size datasets have different geometrical properties from those of traditional low-dimensional data. In their asymptotic study regarding increasing dimensionality with a fixed sample size, Hall et al. (2005) showed that each data vector is approximately located on the vertices of a regular simplex in a high-dimensional space. A perhaps unappealing aspect of their result is the underlying assumption which requires the variables, viewed as a time series, to be almost independent. We establish an equivalent geometric representation under much milder conditions using asymptotic properties of sample covariance matrices. We discuss implications of the results, such as the use of principal component analysis in a high-dimensional space, extension to the case of nonindependent samples and also the binary classification problem.
Key Words: High-dimension, low-sample-size Large p small n Linear discrimination Sample covariance matrix
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. C. Wu, L. Zhang, Z. Wang, D. C. Christiani, and X. Lin Sparse linear discriminant analysis for simultaneous testing for the significance of a gene set/pathway and gene selection Bioinformatics, May 1, 2009; 25(9): 1145 - 1151. [Abstract] [Full Text] [PDF] |
||||
