Skip Navigation


Biometrika Advance Access originally published online on September 26, 2008
Biometrika 2008 95(4):961-977; doi:10.1093/biomet/asn036
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Liang, F.
Right arrow Articles by Zhang, J.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2008 Biometrika Trust

Articles

Estimating the false discovery rate using the stochastic approximation algorithm

Faming Liang

Department of Statistics, Texas A&M University, College Station, Texas 77843, U.S.A., fliang{at}stat.tamu.edu

Jian Zhang

Institute of Mathematics, Statistics and Actuarial Science, University of Kent, Canterbury, Kent CT2 7NF, U.K., j.zhang{at}kent.ac.uk

Received for publication 1 May 2006. Revision received 1 February 2008.

Testing of multiple hypotheses involves statistics that are strongly dependent in some applications, but most work on this subject is based on the assumption of independence. We propose a new method for estimating the false discovery rate of multiple hypothesis tests, in which the density of test scores is estimated parametrically by minimizing the Kullback–Leibler distance between the unknown density and its estimator using the stochastic approximation algorithm, and the false discovery rate is estimated using the ensemble averaging method. Our method is applicable under general dependence between test statistics. Numerical comparisons between our method and several competitors, conducted on simulated and real data examples, show that our method achieves more accurate control of the false discovery rate in almost all scenarios.

Key Words: Ensemble averaging • False discovery rate • Microarray data analysis • Multiple hypothesis testing • Stochastic approximation



References

    Allison D. B., Gadbury G. L., Heo M., Fernandez J. R., Lee C. K., Prolla T. A., Weindruch R. A mixture model approach for the analysis of microarray gene expression data. Comp. Statist. Data Anal (2002) 39:1–20.[CrossRef]

    Benjamini Y., Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Statist. Soc (1995) B 57:289–300.

    Benjamini Y., Krieger A. M., Yekutieli D. Adaptive linear step-up procedures that control the false discovery rate. Biometrika (2006) 93:491–507.[Abstract/Free Full Text]

    Benjamini Y., Liu W. A step-down multiple hypothesis procedure that controls the false discovery rate under independence. J. Statist. Plan. Infer (1999) 82:163–70.[CrossRef]

    Benjamini Y., Yekutieli D. On the control of false discovery rate in multiple testing under dependency. Ann. Statist (2001) 29:1165–88.[CrossRef]

    Benjamini Y., Yekutieli D. False discovery rate–adjusted multiple confidence intervals for selected parameters (with Discussion). J. Am. Statist. Assoc (2005) 100:71–93.[CrossRef][Web of Science]

    Bordes L., Delmas C., Vandekerkhove P. Semiparametric estimation of a two-component mixture model where one component is known. Scand. J. Statist (2006) 33:733–53.[CrossRef]

    Chen H. F. Stochastic approximation with non-additive measurement noise. J. Appl. Prob (1998) 35:407–17.[CrossRef][Web of Science]

    Cook R. D., Weisberg S. Residuals and Influence in Regression (1982) New York: Chapman and Hall.

    Do K. A., Müller P., Tang F. A Bayesian mixture model for differential gene expression. Appl. Statist (2005) 54:627–44.

    Dudoit S., Shaffer J. P., Boldrick J. C. Multiple hypothesis testing in microarray experiments. Statist. Sci (2003) 18:71–103.[CrossRef]

    Efron B. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Am. Statist. Assoc (2004) 99:96–104.[CrossRef][Web of Science]

    Efron B. Correlation and large-scale simultaneous significance testing. J. Am. Statist. Assoc (2007) 102:93–103.[CrossRef][Web of Science]

    Efron B., Tibshirani R. J., Storey J. D., Tusher V. Empirical Bayes analysis of a microarray experiment. J. Am. Statist. Assoc (2001) 96:1151–60.[CrossRef][Web of Science]

    Furrer R. M-estimation for dependent random variables. Statist. Prob. Lett (2002) 57:337–41.[CrossRef]

    Genovese C., Wasserman L. Operating characteristics and extension of the FDR procedure. J. R. Statist. Soc (2002) B 64:499–517.[CrossRef]

    Hashem S. Optimal linear combinations of neural networks. Neural Networks (1997) 10:599–614.[CrossRef][Web of Science][Medline]

    Holzmann H., Munk A., Gneiting T. Identifiability of finite mixtures of elliptical distributions. Scand. J. Statist (2006) 33:753–63.[CrossRef]

    Johnson M. E., Tietjen G. L., Beckman R. J. A new family of probability distributions with applications to Monte Carlo studies. J. Am. Statist. Assoc (1980) 75:276–9.[CrossRef][Web of Science]

    Kooperberg C., Stone C. J. Logspline density estimation for censored data. J. Comp. Graph. Statist (1992) 1:301–28.[CrossRef]

    Kushner H. J. Stochastic approximation with discontinuous dynamics and state dependent noise: w.p.1 and weak convergence. J. Math. Anal. Appl (1981) 82:527–42.[Web of Science]

    Liang F., Liu C., Wang N. A robust sequential Bayesian method for identification of differentially expressed genes. Statist. Sinica (2007) 17:571–97.

    Pan W., Lin J., Le C. A mixture model approach to detecting differentially expressed genes with microarray data. Funct. Integrat. Genom (2003) 3:117–24.[CrossRef]

    Pounds S., Cheng C. Robust estimation of the false discovery rate. Bioinformatics (2006) 22:1979–87.[Abstract/Free Full Text]

    Qiu X., Klebanov L., Yakovlev A. Correlation between gene expression levels and limitations of the empirical Bayes methodology for finding differentially expressed genes. Statist. Applic. Genet. Molec. Biol (2005) 4. article 34.

    Robbins H., Monro S. A stochastic approximation method. Ann. Math. Statist (1951) 22:400–7.[CrossRef]

    Stone C. J., Hansen M., Kooperberg C., Truong Y. K. The use of polynomial splines and their tensor products in extended linear modeling (with discussion). Ann. Statist (1997) 25:1371–470.[CrossRef]

    Storey J. D. A direct approach to false discovery rates. J. R. Statist. Soc (2002) B 64:479–98.[CrossRef]

    Storey J. D., Taylor J. E., Siegmund D. Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Statist. Soc (2004) B 66:187–205.[CrossRef]

    Tadic V. On the convergence of stochastic iterative algorithms and their applications to machine learning. Proc. 36th Conf. Decis. Control (1997) 2281–6. San Diego, CA.

    White H. Learning in artificial neural networks. Neural Comp (1989) 1:425–64.[CrossRef]

    Wolpert D. H. Stacked generalization. Neural Networks (1992) 5:241–59.[CrossRef][Web of Science]

    Yin G., Zhu Y. M. Almost sure convergence of stochastic approximation algorithms with nonadditive noise. Int. J. Contr (1989) 49:1361–76.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Liang, F.
Right arrow Articles by Zhang, J.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?