Skip Navigation


Biometrika Advance Access originally published online on November 25, 2007
Biometrika 2008 95(1):75-92; doi:10.1093/biomet/asm078
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Cai, T.
Right arrow Articles by Wei, L.J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2007 Biometrika Trust

Articles

Predicting future responses based on possibly mis-specified working models

Tianxi Cai

Department of Biostatistics, Harvard University, Boston, Massachusetts 02115, U.S.A. tcai{at}hsph.harvard.edu

Lu Tian

Department of Preventive Medicine, Northwestern University, Chicago, Illinois 60611, U.S.A. lutian{at}northwestern.edu

Scott D. Solomon

Cardiovascular Division, Brigham & Women's Hospital, Boston, Massachusetts 02115, U.S.A. ssolomon{at}rics.bwh.harvard.edu

L.J. Wei

Department of Biostatistics, Harvard University, Boston, Massachusetts 02115, U.S.A. wei{at}sdac.harvard.edu

Received for publication 1 July 2006. Revision received 1 May 2007.

Under a general regression setting, we propose an optimal unconditional prediction procedure for future responses. The resulting prediction intervals or regions have a desirable average coverage level over a set of covariate vectors of interest. When the working model is not correctly specified, the traditional conditional prediction method is generally invalid. On the other hand, one can empirically calibrate the above unconditional procedure and also obtain its crossvalidated counterpart. Various large and small sample properties of these unconditional methods are examined analytically and numerically. We find that the K-fold crossvalidated procedure performs exceptionally well even for cases with rather small sample sizes. The new proposals are illustrated with two real examples, one with a continuous response and the other with a binary outcome.

Key Words: Heterogeneous regression • K-fold crossvalidation • Mis-specified regression model • Optimal prediction region • Prediction error rate



References

    Bayarri M. J., Berger J. O. The interplay of Bayesian and frequentist analysis. Statist. Sci. (2004) 19:58–80.[CrossRef]

    Bilias Y., Gu M., Ying Z. Towards a general asymptotic theory for Cox model with staggered entry. Ann. Statist. (1997) 25:662–82.[CrossRef]

    Bowman A. W. An alternative method of cross-validation for the smoothing of density estimates. Biometrika (1984) 71:353–60.[Abstract/Free Full Text]

    Box G., Tiao G. Bayesian Inference in Statistical Analysis (1973) Reading, MA: Addison-Wesley.

    Carroll R. J., Ruppert D. Transformation and Weighting in Regression (1988) London: Chapman and Hall.

    Carroll R. J., Ruppert D. Prediction and tolerance intervals with transformation and/or weighting. Technometrics (1991) 33:197–210.[Medline]

    Cook R. D. Testing predictor contributions in sufficient dimension reduction. Ann. Statist. (2004) 32:1062–92.[CrossRef]

    Cook R. D., Weisberg S. An Introduction to Regression Graphics (1994) New York: John Wiley and Sons.

    Cook R. D., Weisberg S. Graphics for assessing the adequacy of regression models. J. Am. Statist. Assoc. (1997) 92:490–9.[CrossRef][ISI]

    DiRienzo A. G., Lagakos S. W. Effects of model misspecification on tests of no randomized treatment effect arising from Cox's proportional hazards model. J. R. Statist. Soc. B (2001) 63:745–57.[CrossRef]

    Efron B. How biased is the apparent error rate of a prediction rule? J. Am. Statist. Assoc. (1986) 81:461–70.[CrossRef][ISI]

    Gail M. H., Wieand S., Piantadosi S. Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika (1984) 71:431–44.[Abstract/Free Full Text]

    Lagakos S. W., Schoenfeld D. A. Properties of proportional-hazards score tests under misspecified regression models. Biometrics (1984) 40:1037–48.[CrossRef][ISI][Medline]

    Landwehr J. M., Pregibon D., Shoemaker A. C. Graphical methods for assessing logistic regression models. J. Am. Statist. Assoc. (1984) 79:61–71.[CrossRef][ISI]

    Lin D. Y., Wei L. J. The robust inference for the Cox proportional hazards model. J. Am. Statist. Assoc. (1989) 84:1074–8.[CrossRef][ISI]

    Lin D. Y., Wei L. J. Goodness-of-fit tests for the general Cox regression model. Statist. Sinica (1991) 1:1–17.

    Lin D. Y., Wei L. J., Ying Z. Model-checking techniques based on cumulative residuals. Biometrics (2002) 58:1–12.[CrossRef][ISI][Medline]

    Manes C., Pfeffer M., Rutherford J., Greaves S., Rouleau J., Arnold J., Menapace F., Solomon S. Value of the electrocardiogram in predicting left ventricular enlargement and dysfunction after myocardial infarction. Am. J. Med. (2003) 114:99–105.[CrossRef][ISI][Medline]

    Neter J., Wasserman W., Kutner M. H. Applied Linear Statistical Models: Regression, Analysis of Variance, and Experimental Designs (1985) Boston, MA: Richard D. Irwin.

    Neyman J. Frequentist probability and frequent statistics. Synthese (1977) 36:97–132.[CrossRef][ISI]

    Olive D. Prediction intervals for regression models. Comp. Statist. Data Anal. (2007) 51:3115–22.[CrossRef]

    Pfeffer M., Greaves S., Arnold J., Glynn R., LaMotte F., Lee R., Menapace F., Rapaport E., Ridker P., Rouleau J., Solomon S., Hennekens C. Early versus delayed angiotensin-converting enzyme inhibition therapy in acute myocardial infarction. the healing and early afterload reducing therapy trial. Circulation (1997) 97:2643–51.

    Pollard D. Empirical Processes: Theory and Applications (1990) Hayward, CA: Institute of Mathematical Statistics.

    Rosenblatt M. On the maximal deviation of k-dimensional density estimates. Ann. Prob. (1976) 4:1009–15.[CrossRef]

    Rubin D. B. Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. (1984) 12:1151–72.[CrossRef]

    Rudemo M. Empirical choice of histograms and kernel density estimators. Scand. J. Statist. (1982) 9:65–78.

    Schmoyer R. L. Asymptotically valid prediction intervals for linear models. Technometrics (1992) 34:399–408.[Medline]

    Scott D. W. Multivariate Density Estimation: Theory, Practice, and Visualization (1992) New York: John Wiley and Sons.

    Silverman B. W. Density Estimation for Statistics and Data Analysis (1986) London: Chapman and Hall.

    Stine R. A. Bootstrap prediction intervals for regression. J. Am. Statist. Assoc. (1985) 80:1026–31.[CrossRef][ISI]

    Struthers C. A., Kalbfleisch J. D. Misspecified proportional hazard models. Biometrika (1986) 73:363–9.[Abstract/Free Full Text]

    Su J. Q., Wei L. J. A lack-of-fit test for the mean function in a generalized linear model. J. Am. Statist. Assoc. (1991) 86:420–6.[CrossRef][ISI]

    Tian L., Cai T., Goetghebeur E., Wei L. J. Model evaluation based on the distribution of estimated absolute prediction error. Biometrika (2007) 94:297–311.[Abstract/Free Full Text]

    Tsiatis A. A. A note on a goodness-of-fit test for the logistic regression model. Biometrika (1980) 67:250–1.[Abstract/Free Full Text]

    Uno H., Tian L., Wei L. J. The optimal confidence region for a random parameter. Biometrika (2005) 92:957–64.[Abstract/Free Full Text]

    Van Der Vaart A. W. Weak convergence of smoothed empirical processes. Scand. J. Statist. (1994) 21:501–4.

    Van Der Vaart A. W. Asymptotic Statistics (1998) Cambridge: Cambridge University Press.

    Van Der Vaart A. W., Wellner J. A. Weak Convergence and Empirical Processes (1996) New York: Springer-Verlag.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Cai, T.
Right arrow Articles by Wei, L.J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?