© 2002 by Biometrika Trust
Overestimation of the receiver operating characteristic curve for logistic regression
1 Department of Statistics, University of Warwick, Coventry, CV4 7AL, U.K.jbc@stats.warwick.ac.ukphil@stats.warwick.ac.uk
Logistic regression is often used to find a linear combination of covariates which best discriminates between two groups or populations.The ROC, receiver operating characteristic, curve is a good way of assessing the performance of the resulting score, but using the same data both to fit the score and to calculate its ROC leads to an over-optimistic estimate of the performance which the score would give if it were to be validated on a sample of future cases. The paper studies the extent of this overestimation, and suggests a shrinkage correction for the ROC curve itself and for the area under the curve. The correction is consistent with Efron's formula for the bias in the error rate of a binary prediction rule. Two medical examples are discussed.
Key Words: Logistic regression; ROC; Screening score; Shrinkage
Received April 2000. Revised September 2001