Biometrika Advance Access originally published online on November 3, 2008
Biometrika 2008 95(4):933-946; doi:10.1093/biomet/asn042
Articles |
Multiple imputation when records used for imputation are not used or disseminated for analysis
Department of Statistical Science, Duke University, Durham, North Carolina 27708-0251, U.S.A. jerry{at}stat.duke.edu
Received for publication 1 July 2007. Revision received 1 March 2008.
When some of the records used to estimate the imputation models in multiple imputation are not used or available for analysis, the usual multiple imputation variance estimator has positive bias. We present an alternative approach that enables unbiased estimation of variances and, hence, calibrated inferences in such contexts. First, using all records, the imputer samples m values of the parameters of the imputation model. Second, for each parameter draw, the imputer simulates the missing values for all records n times. From these mn completed datasets, the imputer can analyse or disseminate the appropriate subset of records. We develop methods for interval estimation and significance testing for this approach. Methods are presented in the context of multiple imputation for measurement error.
Key Words: Combining data Confidentiality Measurement error Missing data Multiple imputation
References
-
Abowd J. M., Woodcock S. D. Multiply-imputing confidential characteristics and file links in longitudinal linked data. Privacy in Statistical Databases—Domingo-Ferrer J., Torra V., eds. (2004) New York: Springer. 290–7.
Barnard J., Rubin D. B. Small-sample degrees of freedom with multiple-imputation. Biometrika (1999) 86:948–55.
Brownstone D., Valletta R. G. Modeling earnings measurement error: a multiple imputation approach. Rev. Econ. Statist. (1996) 78:705–17.[CrossRef][Web of Science]
Clogg C. C., Rubin D. B., Schenker N., Schultz B., Weidman L. Multiple imputation of industry and occupation codes in census public-use samples using Bayesian logistic regression. J. Am. Statist. Assoc. (1991) 86:68–78.[CrossRef][Web of Science]
Cole S. R., Chu H., Greenland S. Multiple-imputation for measurement-error correction. Int. J. Epidemiol. (2006) 35:1074–81.
Durrant G. B., Skinner C. Using missing data methods to correct for measurement error in a distribution function. Survey Methodol. (2006) 32:25–36.
Ghosh-Dastidar B., Schafer J. L. Multiple edit/multiple imputation for multivariate continuous data. J. Am. Statist. Assoc. (2003) 98:807–17.[CrossRef][Web of Science]
Harel O., Zhou X. H. Multiple imputation for correcting verification bias. Statist. Med. (2006) 25:3769–86.[CrossRef]
Li K. H., Meng X. L., Raghunathan T. E., Rubin D. B. Significance levels from repeated p-values with multiply-imputed data. Statist. Sinica (1991) 1:65–92.
Li K. H., Raghunathan T. E., Rubin D. B. Large-sample significance levels from multiply-imputed data using moment-based statistics and an F reference distribution. J. Am. Statist. Assoc. (1991) 86:1065–73.[CrossRef][Web of Science]
Little R. J. A. Statistical analysis of masked data. J. Offic. Statist. (1993) 9:407–26.
Meng X. L., Rubin D. B. Performing likelihood ratio tests with multiply-imputed data sets. Biometrika (1992) 79:103–11.
Raghunathan T. E. Combining information from multiple surveys for assessing health disparities. Allgemeines Statist. Archiv. (2006) 90:515–26.[CrossRef]
Raghunathan T. E., Siscovick D. S. Combining exposure information from multiple sources in the analysis of a case-control study. Statistician (1998) 47:333–47.
Rassler S. A non-iterative Bayesian approach to statistical matching. Statist. Neer. (2003) 57:58–74.[CrossRef]
Reiter J. P. Inference for partially synthetic, public use microdata sets. Survey Methodol. (2003) 29:181–9.
Reiter J. P. Simultaneous use of multiple imputation for missing data and disclosure limitation. Survey Methodol. (2004) 30:235–42.
Reiter J. P. Significance tests for multi-component estimands from multiply-imputed, synthetic microdata. J. Statist. Plan. Infer. (2005) 131:365–77.[CrossRef]
Reiter J. P. Using CART to generate partially synthetic, public use microdata. J. Offic. Statist. (2005) 21:441–62.
Reiter J. P. Small-sample degrees of freedom for multi-component significance tests with multiple imputation for missing data. Biometrika (2007) 94:502–8.
Reiter J. P., Raghunathan T. E. The multiple adaptations of multiple imputation. J. Am. Statist. Assoc. (2007) 102:1462–71.[CrossRef][Web of Science]
Rubin D. B. Multiple Imputation for Nonresponse in Surveys (1987) New York: John Wiley.
Rubin D. B. Nested multiple imputation of NMES via partially incompatible MCMC. Statist. Neer. (2003) 57:3–18.[CrossRef]
Rubin D. B., Schenker N. Interval estimation from multiply-imputed data: a case study using census agriculture industry codes. J. Offic. Statist. (1987) 3:375–87.
Schafer J., Harel O. Multiple imputation in two stages. ASA Proceedings of the Joint Statistical Meetings (2002) 1359–63. Alexandria: VA, American Statistical Association.
Schenker N. Assessing variability due to race bridging: application to census counts and vital rates for the year 2000. J. Am. Statist. Assoc. (2003) 98:818–28.[CrossRef][Web of Science]
Schenker N., Parker J. D. From single-race reporting to multiple-race reporting: using imputation methods to bridge the transition. Statist. Med. (2003) 22:1571–87.[CrossRef]
Schenker N., Raghunathan T. E. Combining information from multiple surveys to enhance estimation of measures of health. Statist. Med. (2007) 26:1802–11.[CrossRef]
Yucel R. M., Zaslavsky A. M. Imputation of binary treatment variables with measurement error in administrative data. J. Am. Statist. Assoc. (2005) 100:1123–32.[CrossRef][Web of Science]
| ||||||||||||||||||||||||||||||||||||||||||||||||