Biometrika Advance Access first published online on November 25, 2007
This version published online on December 8, 2007
Biometrika, doi:10.1093/biomet/asm078
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
Articles |
Predicting Future Responses Based on Possibly Mis-Specified Working Models
Department of Biostatistics, Harvard University, Boston, Massachusetts 02115, U.S.A. tcai{at}hsph.harvard.edu
Department of Preventive Medicine, Northwestern University, Chicago, Illinois 60611, U.S.A. lutian{at}northwestern.edu
Cardiovascular Division, Brigham & Women's Hospital, Boston, Massachusetts 02115, U.S.A. ssolomon{at}rics.bwh.harvard.edu
Department of Biostatistics, Harvard University, Boston, Massachusetts 02115, U.S.A. wei{at}sdac.harvard.edu
Received for publication 1 July 2006.
Revision received 1 May 2007.
| Abstract |
|---|
Under a general regression setting, we propose an optimal unconditional prediction procedure for future responses. The resulting prediction intervals or regions have a desirable average coverage level over a set of covariate vectors of interest. When the working model is not correctly specified, the traditional conditional prediction method is generally invalid. On the other hand, one can empirically calibrate the above unconditional procedure and also obtain its crossvalidated counterpart. Various large and small sample properties of these unconditional methods are examined analytically and numerically. We find that the
-fold crossvalidated procedure performs exceptionally well even for cases with rather small sample sizes. The new proposals are illustrated with two real examples, one with a continuous response and the other with a binary outcome.
Key Words: Heterogeneous regression
-fold crossvalidation Mis-specified regression model Optimal prediction region Prediction error rate