Biometrika Advance Access originally published online on January 26, 2009
Biometrika 2009 96(1):221-228; doi:10.1093/biomet/asn073
Miscellanea |
A note on semiparametric efficient inference for two-stage outcome-dependent sampling with a continuous outcome
Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599-7420, U.S.A. rsong{at}bios.unc.edu zhou{at}bios.unc.edu kosorok{at}unc.edu
Received for publication 1 June 2007. Revision received 1 August 2008.
Outcome-dependent sampling designs have been shown to be a cost-effective way to enhance study efficiency. We show that the outcome-dependent sampling design with a continuous outcome can be viewed as an extension of the two-stage case-control designs to the continuous-outcome case. We further show that the two-stage outcome-dependent sampling has a natural link with the missing-data and biased-sampling frameworks. Through the use of semiparametric inference and missing-data techniques, we show that a certain semiparametric maximum-likelihood estimator is computationally convenient and achieves the semiparametric efficient information bound. We demonstrate this both theoretically and through simulation.
Key Words: Biased sampling Empirical process Maximum likelihood estimation Missing data Outcome-dependent Profile likelihood Two-stage sampling
References
-
Bickel P. J., Klaassen C. A. J., Ritov Y., Wellner J. A. Efficient and Adaptive Estimation for Semiparametric Models (1998) New York: Springer.
Breslow N., McNeney B., Wellner J. A. Large sample theory for semiparametric regression models with two-phase, outcome dependent sampling. Ann. Statist. (2003) 31:1110–39.[CrossRef]
Breslow N. E., Cain K. C. Logistic regression for two-stage case-control data. Biometrika (1988) 75:11–20.
Chatterjee N., Chen Y.-H., Breslow N. E. A pseudoscore estimator for regression problems with two-phase sampling. J. Am. Statist. Assoc. (2003) 98:158–68.[CrossRef][Web of Science]
Cornfield J. A method of estimating comparative rates from clinical data. J. Nat. Cancer Inst. (1951) 11:1269–75.[Web of Science][Medline]
Lawless J. F., Kalbfleisch J. D., Wild C. J. Semiparametric methods for response-selective and missing data problems in regression. J. R. Statist. Soc. (1999) B 61:413–38.[CrossRef]
Murphy S. A., van der-Vaart A. W. On profile likelihood (with Discussion). J. Am. Statist. Assoc. (2000) 95:449–85.[CrossRef][Web of Science]
Murphy S. A., van der-Vaart A. W. Semiparametric mixtures in case-control studies. J. Mult. Anal. (2001) 79:1–32.[CrossRef]
Nan B., Emond M. J., Wellner J. A. Information bounds for Cox regression models with missing data. Ann. Statist. (2004) 32:723–53.[CrossRef]
Prentice R. L., Pyke R. Logistic disease incidence models and case-control studies. Biometrika (1979) 66:403–12.
Qin J. Empirical likelihood in biased sample problems. Ann. Statist. (1993) 21:1182–96.[CrossRef]
Scott A., Wild C. Maximum likelihood for generalised case-control studies. J. Statist. Plan. Infer. (2001) 96:3–27.[CrossRef]
van der-Vaart A., Wellner J. A. Existence and consistency of maximum likelihood in upgraded mixture models. J. Mult. Anal. (1992) 43:133–46.[CrossRef]
van der-Vaart A. W., Wellner J. A. Weak Convergence and Empirical Processes. (1996) New York: Springer.
van der-Vaart A., Wellner J. A. Consistency of semiparametric maximum likelihood estimators for two-phase sampling. Can. J. Statist. (2001) 29:269–88.[CrossRef]
van der-Vaart A. W. Asymptotic Statistics (1998) Cambridge: Cambridge University Press.
Wacholder S., Weinberg C. R. Flexible maximum likelihood methods for assessing joint effects in case-control studies with complex sampling. Biometrics (1994) 50:350–57.[CrossRef][Web of Science][Medline]
Wang X., Zhou H. A semiparametric empirical likelihood method for biased sampling schemes with auxiliary covariates. Biometrics (2006) 62:1149–60.[CrossRef][Web of Science][Medline]
Weaver M. A., Zhou H. An estimated likelihood method for continuous outcome regression models with outcome-dependent sampling. J. Am. Statist. Assoc. (2005) 100:459–69.[CrossRef][Web of Science]
Weinberg C. R., Wacholder S. Prospective analysis of case-control data under general multiplicative-intercept risk models. Biometrika (1993) 80:461–5.
White J. E. A two-stage design for the study of the relationship between a rare exposure and a rare disease. Am. J. Epidemiol. (1982) 115:119–28.
Zhang Z., Rockette H. On maximum likelihood estimation in parametric regression with missing covariates. J. Statist. Plan. Infer. (2005) 134:206–23.[CrossRef]
Zhao L. P., Lipsitz S. Designs and analysis of two-stage studies. Statist. Med. (1992) 11:769–82.[CrossRef]
Zhou H., Chen J., Rissnen T. H., Korrick S. A., Hu H., Salonen J. T., Longnecker M. P. Outcome-dependent sampling: an efficient sampling and inference procedure for studies with a continuous outcome. Epidemiol. (2007) 18:461–8.[CrossRef][Web of Science][Medline]
Zhou H., Weaver M. A., Qin J., Longnecker M. P., Wang M. C. A semiparametric empirical likelihood method for data from an outcome-dependent sampling scheme with a continuous outcome. Biometrics (2002) 58:413–21.[CrossRef][Web of Science][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||