Variable selection in clustering via Dirichlet process mixture models
1 Department of Statistics, Texas A&M University, College Station, Texas 77843-3143, U.S.A. sinae{at}stat.tamu.edu, 2 Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104-6021, U.S.A. mtadesse{at}cceb.upenn.edu, 3 Department of Statistics, Texas A&M University, College Station, Texas 77843-3143, U.S.A. mvannucci{at}stat.tamu.edu
| Abstract |
|---|
The increased collection of high-dimensional data in various fields has raised a strong interest in clustering algorithms and variable selection procedures. In this paper, we propose a model-based method that addresses the two problems simultaneously. We introduce a latent binary vector to identify discriminating variables and use Dirichlet process mixture models to define the cluster structure. We update the variable selection index using a Metropolis algorithm and obtain inference on the cluster structure via a split-merge Markov chain Monte Carlo technique. We explore the performance of the methodology on simulated data and illustrate an application with a DNA microarray study.
Key Words: Bayesian inference; Clustering; Dirichlet process mixture model; DNA microarray data analysis; Variable selection.
Received December 2004. Revised March 2006.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
D. L. Banks, L. House, and K. Killourhy Cherry-picking for complex data: robust structure discovery Phil Trans R Soc A, November 13, 2009; 367(1906): 4339 - 4359. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. B. Dunson and J.-H. Park Kernel stick-breaking processes Biometrika, June 1, 2008; 95(2): 307 - 323. [Abstract] [PDF] |
||||

