Articles |
Nonparametric variance estimation in the analysis of microarray data: a measurement error approach
Department of Statistics, Texas A&M University, College Station, Texas 77843-3143, U.S.A. carroll{at}stat.tamu.edu
Department of Statistics, and Applied Probability, University of California, Santa Barbara, California 93106, U.S.A. yuedong{at}pstat.ucsb.edu
Received for publication 1 March 2007. Revision received 1 December 2007.
We investigate the effects of measurement error on the estimation of nonparametric variance functions. We show that either ignoring measurement error or direct application of the simulation extrapolation, SIMEX, method leads to inconsistent estimators. Nevertheless, the direct SIMEX method can reduce bias relative to a naive estimator. We further propose a permutation SIMEX method that leads to consistent estimators in theory. The performance of both the SIMEX methods depends on approximations to the exact extrapolants. Simulations show that both the SIMEX methods perform better than ignoring measurement error. The methodology is illustrated using microarray data from colon cancer patients.
Key Words: Heteroscedasticity Local polynomial regression Measurement error Microarray Nonparametric regression Permutation SIMEX Simulation-extrapolation Variance function estimation
References
-
Alon U., Barkai N., Notterman D., Gish K., Ybarra S., Mack D., Levine A. J. Broad patterns of gene expression revealed by clustering of tumour and normal colon tissues probed by oligonucleotide arrays. Proc. Nat. Acad. Sci. (1999) 96:6745–50.
Baldi P., Long A. D. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics (2001) 17:509–19.
Callow M. J., Dudoit S., Gong E. L., Speed T. P., Rubin E. M. Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. Genome Res. (2000) 10:2022–9.
Carroll R. J., Maca J. D., Ruppert D. Nonparametric regression in the presence of measurement error. Biometrika (1999) 86:541–54.
Carroll R. J., Ruppert D., Welsh A. H. Local estimating equations. In: J. Am. Statist. Assoc. (1998) 93:214–27.[CrossRef][Web of Science]
Carroll R. J., Ruppert D., Stefanski L. A., Crainiceanu C. Measurement Error in Nonlinear Models: A Modern Perspective (2006) 2nd ed. New York: Chapman and Hall.
Chen Y., Dougherty E. R., Bittner M. L. Ratio-based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Optics (1997) 2:364–74.[CrossRef]
Chen Y., Kamat V., Dougherty E. R., Bittner M. L., Meltzer P. S., Trent J. M. Ratio statistics of gene expression levels and applications to microarray data analysis. Bioinformatics (2002) 18:1207–15.
Cook J., Stefanski L. A. A simulation extrapolation method for parametric measurement error models. J. Am. Statist. Assoc. (1995) 89:1314–28.[CrossRef][Web of Science]
Cui X., Hwang J. T. G., Qiu J., Blades N. J., Churchill G. A. Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics (2005) 6:59–75.[Abstract]
Devanarayan V., Stefanski L. A. Empirical simulation extrapolation for measurement error models with replicate measurements. Statist. Prob. Lett. (2002) 59:219–25.[CrossRef]
Huang X., Pan W. Comparing three methods for variance estimation with duplicated high density oligonucleotide arrays. Funct. Integr. Genomics (2002) 2:126–33.[CrossRef][Medline]
Jain N., Thatte J., Braciale T., Ley K., O'Connell M., Lee J. Local-pooled error test for identifying differentially expressed genes with a small number of replicated microarrays. Bioinformatics (2003) 19:1945–51.
Kamb A., Ramaswami A. A simple method for statistical analysis of intensity differences in microarray-derived gene expression data. BMC Biotechnol. (2001) 1:1–8.[Medline]
Leung Y., Cavalieri D. Fundamentals of cDNA microarray data analysis. Trends Genet. (2003) 11:649–59.
Lin Y., Nadler S. T., Attie A. D., Yandell B. S. Adaptive gene picking with microarray data: detecting important low abundance signals. In: The Analysis of Gene Expression Data: Methods and Software—Parmigiani G., Garrett E. S., Irizarry R. A., Zeger S. L., eds. (2003) New York: Springer. 291–312.
Lönnstedt I., Speed T. Replicated microarray data. Statist. Sinica (2002) 12:31–46.
Nguyen D. V., Arpat A. B., Wang N., Carroll R. J. DNA microarray experiments: biological and technological aspects. Biometrics (2002) 58:701–17.[CrossRef][Web of Science][Medline]
Rocke D. M., Durbin B. A model for measurement error for gene expression arrays. J. Comp. Biol. (2001) 8:557–69.[CrossRef]
Ruppert D., Wand M., Carroll R. Semiparametric Regression (2003) New York: Cambridge University Press.
Smyth G. K. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statist. Appl. Genet. Mol. Biol. (2004) 3. Article 3.
Tong T., Wang Y. Optimal shrinkage estimation of variances with applications to microarray data analysis. J. Am. Statist. Assoc. (2007) 102:113–22.[CrossRef][Web of Science]
Weng L., Dai H., Zhan Y., He Y., Stepaniants S. B., Bassett D. E. Rosetta error model for gene expression analysis. Bioinformatics (2006) 22:1111–21.
| ||||||||||||||||||||||||||||||||||||||||||||||||||