Biometrika Advance Access originally published online on October 12, 2009
Biometrika 2009 96(4):991-997; doi:10.1093/biomet/asp040
Miscellanea |
Semiparametric methods for evaluating risk prediction markers in case-control studies
Fred Hutchinson Cancer Research Center, Public Health Sciences, 1100 Fairview Avenue N., Seattle, Washington 98109-1024, U.S.A. yhuang{at}fhcrc.org mspepe{at}u.washington.edu
Received for publication 1 March 2008. Revision received 1 January 2009.
The performance of a well-calibrated risk model for a binary disease outcome can be characterized by the population distribution of risk and displayed with the predictiveness curve. Better performance is characterized by a wider distribution of risk, since this corresponds to better risk stratification in the sense that more subjects are identified at low and high risk for the disease outcome. Although methods have been developed to estimate predictiveness curves from cohort studies, most studies to evaluate novel risk prediction markers employ case-control designs. Here we develop semiparametric methods that accommodate case-control data. The semiparametric methods are flexible, and naturally generalize methods previously developed for cohort data. Applications to prostate cancer risk prediction markers illustrate the methods.
Key Words: Biased sampling Biomarker Case-control Predictiveness curve Risk prediction Semiparametric method
References
-
Anderson J. A. Separate sample logistic discrimination. Biometrika (1972) 59:19–35.
Baker S. G., Kramer B. S., Srivastava S. Markers for early detection of cancer: statistical guidelines for nested case-control studies. BMI Med. Res. Methodol. (2002) 2:4–11.[CrossRef]
Bickel P. J., Klaassen C. A. J., Ritov Y., Wellner J. A. Efficient and Adaptive Estimation for Semiparametric Models (1993) Baltimore, MD: Johns Hopkins University Press.
Breslow N. E., Robins J. M., Wellner J. A. On the semi-parametric efficiency of logistic regression under case-control sampling. Bernoulli (2000) 6:447–55.[CrossRef][Web of Science]
Bura E., Gastwirth J. L. The binary regression quantile plot: assessing the importance of predictors in binary regression visually. Biomet. J. (2001) 43:5–21.[CrossRef]
Cole T. J., Green P. J. Smoothing reference centile curves: the LMS method and penalized likelihood. Statist. Med. (1992) 11:1305–19.[CrossRef]
Gilbert P. B. Large sample theory of maximum likelihood estimates in semiparametric biased sampling models. Ann. Statist. (2000) 28:151–194.[CrossRef]
Gilbert P. B., Lele S., Vardi Y. Maximum likelihood estimation in semiparametric selection bias models with application to AIDS vaccine trials. Biometrika (1999) 86:27–43.
Gill R. D., Vardi Y., Wellner J. A. Large sample theory of empirical distributions in biased sampling models. Ann. Statist. (1988) 16:1069–1112.[CrossRef]
Green D. M., Swets J. A. Signal Detection Theory and Psychophysics (1966) New York: Wiley.
Huang Y., Pepe M. S. A parametric ROC model based approach for evaluating the predictiveness of continuous markers in case-control studies. Biometrics (2009) doi: 10.1111/j.1541-0420.2009.01201.x.
Huang Y., Pepe M. S., Feng Z. Evaluating the predictiveness of a continuous marker. Biometrics (2007) 63:1181–88.[Web of Science][Medline]
Lloyd C. J. Maximum likelihood estimation of misclassification rates of a binomial regression. Biometrika (2000) 87:700–705.
Pepe M. S., Etzioni R., Feng Z., Potter J. D., Thompson M. L., Thornquist M., Winget M., Yasui Y. Phases of biomarker development for early detection of cancer. J. Nat. Cancer Inst. (2001) 93:1054–61.
Pepe M. S., Feng Z., Huang Y., Longton G. M., Prentice R., Thompson I. M., Zheng Y. Integrating the predictiveness of a marker with its performance as a classifier. Am. J. Epidemiol. (2008a) 167:362–68.
Pepe M. S., Feng Z., Janes H., Bossuyt P. M., Potter J. D. Pivotal evaluation of the accuracy of a biomarker used for classification or prediction: standards for study design. J. Nat. Cancer Inst (2008b) 100:1432–38.
Prentice R. L., Pyke R. Logistic disease incidence models and case-control studies. Biometrika (1979) 66:403–11.
Qin J., Zhang J. A goodness-of-fit test for logistic regression models based on case-control data. Biometrika (1997) 84:609–18.
Qin J., Zhang J. Using logistic regression procedures for estimating receiver operating characteristic curves. Biometrika (2003) 93:585–96.
Ransohoff D. F. How to improve reliability and efficiency of research about molecular markers: roles of phases, guidelines, and study design. J. Clin. Epidemiol. (2007) 60:1205–19.[CrossRef][Web of Science][Medline]
Simon R. Roadmap for developing and validating therapeutically relevant genomic classifiers. J. Clin. Oncol. (2005) 23:7332–41.
Thompson I. M., Pauler Ankerst D., Chi C. Assessing prostate cancer risk: results from the prostate cancer prevention trial. J. Nat. Cancer Inst. (2006) 98:529–34.
van der Vaart A. W. Asymptotic Statistics (1998) Cambridge, UK: Cambridge University Press.
Vardi Y. Empirical distributions in selection bias models. Ann. Statist. (1985) 13:178–203.[CrossRef]
| ||||||||||||||||||||||||||||||||||||||||||||||||