5. Discussion

Qi Dong, Michael R. Elliott and Trivellore E. Raghunathan

Previous

In this paper, we propose a new method to combine information from multiple complex surveys. We apply the new method to combine information about health insurance status from the 2006 NHIS, MEPS, and BRFSS. Results show that the combined estimate is more precise compared to the estimates from individual surveys. As previous work has shown (Dong et al. 2014), we have little information loss in the sense that the sampling properties of inferences from the synthetic population and the actual sample are very similar. Thus when we combine the estimates from three samples, the combined estimate is substantially more efficient that the estimates from individual surveys. (We note that this application is primarily for illustrative purposes; similar inferences could be made by computing the design-based estimates and variances for each of the surveys, then applying the combining rule in (3.2) on the design-based estimates.)

This new combining survey method has two major advantages over the existing methods. First, the approach used here to generate synthetic populations, discussed in detail in Dong et al. (2014), accounts for the complex sample design nonparametrically using extensions of finite population Bayesian bootstrap methods. Since the resulting synthetic populations can be analyzed as simple random samples, information from other surveys can be used to adjust for the nonsampling errors and/or filling in the missing variables. Another advantage of this method is it has no limitation on the number of surveys to be combined as long as the surveys have the same underlying population. The proposed method that adjusts for the complex sampling design features can be applied to each survey independently. After the missing information is imputed, regardless the number of surveys to be combined, we only need to combine the estimates from each survey using the combing rule developed in this manuscript. A final advantage of the proposed approach is the ability of the synthetic populations generated by the nonparametric method to preserve the item-missing values in the actual data. This potentially fills in a gap in the multiple imputation area that existing imputation methods typically ignore the complex sampling design features in the data and impute the missing values as if they are simple random samples. We consider this application in future work.

References

Cohen, M.P. (1997). The Bayesian bootstrap and multiple imputation for unequal probability sample designs. Proceedings of the Survey Research Methods Section, American Statistical Association, 635‑638.

Dong, Q. (2012).  Unpublished PhD thesis, University of Michigan.

Dong Q., Elliott, M.R. and Raghunathan T.E. (2014). A nonparametric method to generate synthetic populations to adjust for complex sampling design features. Survey Methodology, 40 (1), 29-46.

Elliott, M.R. and Davis, W.W. (2005). Obtaining cancer risk factor prevalence estimates in small areas: combining data from two surveys. Journal of the Royal Statistical Society C: Applied Statistics, 54, 595-609.

Ezzati-Rice, T.M., Rohde, F. and Greenblatt, J. (2008). Sample design of the medical expenditure panel survey household component, 1998 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaacbaqcLbwaqa aaaaaaaaWdbiaa=nbiaaa@37A3@  2007. Methodology Report No. 22. Agency for Healthcare Research and Quality, Rockville, MD. Accessed at: http://www.meps.ahrq.gov/mepsweb/data_files/publications/mr22/mr22.pdf, February 2014.

Hartley, H.O. (1974). Multiple frame methodology and selected applications. The Indian Journal of Statistics, C, 38, 99-118.

Lo, A.Y. (1986). Bayesian statistical inference for sampling a finite population. Annals of Statistics, 14, 1226-1233.

Lohr, S.L. and Rao, J.N.K. (2000). Inference from dual frame surveys. Journal of the American Statistical Association, 95, 271-280.

National Center for Health Statistics (2007). Data file documentation, National Health Interview Survey, 2006 (machine readable data file and documentation). National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, Maryland. Accessed at: ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHIS/2006/srvydesc.pdf, February 2014.

Raghunathan, T.E., Reiter, J.P. and Rubin, D.B. (2003). Multiple imputation for statistical disclosure limitation. Journal of Official Statistics, 19, 1-16.

Raghunathan, T.E., Xie, D.W., Schenker, N., Parsons, V.L., Davis, W.W., Dodd, K.W. and Feuer, D.J. (2007). Combining information from two surveys to estimate county-level prevalence rates of cancer risk factors and screening. Journal of the American Statistical Association, 102, 474-486.

Reiter, J.P., Raghunathan, T.E. and Kinney, S.K. (2006). The importance of modeling the sampling design in multiple imputation for missing data. Survey Methodology, 32, 143-149.

Rubin, D.B. (1981). The Bayesian bootstrap. The Annals of Statistics, 9, 131-134.

Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: Wiley. 

Schenker, N., Gentleman, J.F., Rose, D, Hing, E. and Shimizu, I.M. (2002). Combining estimates from complementary surveys: A case study using prevalence estimates from national health surveys of households and nursing homes. Public Health Reports, 117, 393-407.

Schenker, N. and Raghunathan, T.E. (2007). Combining information from multiple surveys to enhance estimation of measures of health. Statistics in Medicine, 26, 1802-1811.

Schenker, N., Raghunathan, T.E. and Bondarenko, I. (2009). Improving on analyses of self-reported data in a large-scale health survey by using information from an examination-based survey. Statistics in Medicine, 29, 533-545.

Skinner, C.J. and Rao, J.N.K. (1996). Estimation in dual frame surveys with complex designs. Journal of the American Statistical Association, 91, 349-356.

Previous

Date modified: