Biometrika Advance Access originally published online on November 19, 2007
Biometrika 2007 94(4):841-860; doi:10.1093/biomet/asm070
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
Articles |
Estimation of Regression Models for the Mean of Repeated Outcomes Under Nonignorable Nonmonotone Nonresponse
Department of Applied Mathematics and Computer Sciences Ghent University, 9000 Ghent, Belgium stijn.vansteelandt{at}ugent.be
Department of Economics, Di Tella University, Buenos Aires, Argentina arotnitzky{at}utdt.edu
Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts 02115, U.S.A. robins{at}hsph.harvard.edu
Received for publication 1 April 2005.
Revision received 1 April 2007.
| Abstract |
|---|
We propose a new class of models for making inference about the mean of a vector of repeated outcomes when the outcome vector is incompletely observed in some study units and missingness is nonmonotone. Each model in our class is indexed by a set of unidentified selection-bias functions which quantify the residual association of the outcome at each occasion t and the probability that this outcome is missing after adjusting for variables observed prior to time t and for the past nonresponse pattern. In particular, selection-bias functions equal to zero encode the investigator's a priori belief that nonresponse of the next outcome does not depend on that outcome after adjusting for the observed past. We call this assumption sequential explainability. Since each model in our class is nonparametric, it fits the data perfectly well. As such, our models are ideal for conducting sensitivity analyses aimed at evaluating the impact that different degrees of departure from sequential explainability have on inference about the marginal means of interest. Although the marginal means are identified under each of our models, their estimation is not feasible in practice because it requires the auxiliary estimation of conditional expectations and probabilities given high-dimensional variables. We henceforth discuss the estimation of the marginal means under each model in our class assuming, additionally, that at each occasion either one of the following two models holds: a parametric model for the conditional probability of nonresponse given current outcomes and past recorded data or a parametric model for the conditional mean of the outcome on the nonrespondents given the past recorded data. We call the resulting procedure 2T-multiply robust as it protects at each of the T time points against misspecification of one of these two working models, although not against simultaneous misspecification of both. We extend our proposed class of models and estimators to incorporate data configurations which include baseline covariates and a parametric model for the conditional mean of the vector of repeated outcomes given the baseline covariates.
Key Words: Double robustness Generalized estimating equation Intermittent missingness Longitudinal study Missing at random Semiparametric inference