Estimation of domain discontinuities using Hierarchical Bayesian Fay-Herriot models
Section 5. Discussion

Survey process redesigns often result in discontinuities that disturb the comparability of the outcomes over time obtained with a repeated survey. To avoid confounding real period-to-period change with differences in measurement bias, it is important that such discontinuities are quantified during the implementation of a new survey process. A straightforward approach is to collect data under the old and new design in parallel to each other for some period of time. Available budgets for parallel data collection often do not meet the minimum required sample sizes that come from power calculations to detect minimum prespecified differences at certain significance and power levels. This might be sufficient for quantifying discontinuities at the national level but not at the domain level, even for the planned domains of the regular survey. To obtain more precise predictions for the domain discontinuities a small area estimation approaches based on hierarchical Bayesian Fay-Herriot (FH) models is proposed.

In an earlier paper (van den Brakel et al., 2016) a univariate FH model is proposed, where reliable direct domain estimates of the regular survey are considered as potential auxiliary variables in a step-forward model selection procedure to build adequate models for small domain prediction of the small sample assigned to the alternative survey. In this paper a bivariate FH model for the direct estimates obtained under both the regular and alternative survey is proposed as an alternative to obtain adequate predictions for domain discontinuities. In addition a univariate FH model applied to the direct estimates of the discontinuities is considered as a simple alternative. The methods are applied to a small scale parallel run conducted to quantify discontinuities in a survey process redesign of the Dutch Crime Victimization Survey (CVS).

Using direct estimates from the regular survey as auxiliary variables in models for small domains under the alternative approach results in a substantial improvement of precision, compared to univariate models that only use auxiliary variables from available registers. This can be expected since both surveys attempt to measure the same variables with a different survey approach. A drawback of the univariate approach is that the variance estimation procedure for the discontinuities is complex, since a non-negligible covariance between the direct estimates from the regular design and the model based predictions for the alternative design arises. The method is complex since a model-based MSE is combined with a design-based variance of a direct estimator. This might even result in negative variance estimates for the discontinuities. These complications are partially circumvented by developing a design-based estimator for the MSE of the small domain predictions and the covariance component (van den Brakel et al., 2016).

Under a bivariate FH model in a fully Bayesian framework negative variance estimates are avoided since the variances for disontinuities are derived from positive-definite covariance matrices of the bivariate model. The bivariate FH model improves the predictions for the domain discontinuities since the model improves the precision of the estimates of both the regular and alternative approach, and the strong positive correlation between the random domain effects further reduces the variance of the contrasts. For four out of five variables of the Dutch CVS the bivariate FH model indeed resulted in more precise predictions for domain discontinuities compared to the univariate FH model. Another advantage of the bivariate model is that it improves the domain predictions of both the regular and alternative model while the univariate model assumes that the sample size of the regular survey is sufficiently large to make reliable precise direct domain estimates. The bivariate model is therefore also appropriate in parallel runs where e.g., the sample size of the regular survey is reduced in order to increase the sample size for the alternative survey. Finally the bivariate FH model avoids the complications to account for sampling error in the covariates, which is often required if the direct estimates of the regular survey are used as covariates in a univariate FH model.

For one variable (satisfaction with police performance) no adequate model could be constructed with the available auxiliary variables from the registers only. For this variable the multivariate model seems to result in overshrinkage of the predictions for the domain discontinuities. The results of the univariate model are clearly better in this case since the direct estimates from the regular survey are the only auxiliary variables that result in an adequate model for small domain predictions.

The univariate FH model for the direct estimates of the domain discontinuities turns out to be a reasonable alternative. It avoids the complications of the univariate FH model for the alternative CVS and the method is considerably simpler compared to the bivariate FH model. A point of concern are the extremely small shrinkage factors, which are an indication that the model puts too much weight on the synthetic part of the domain predictions. The bias of these domain predictions is indeed larger compared to that of the bivariate FH model.

A general problem in this application with the step-forward model selection procedure where covariates are included in the model as long as the WAIC value is reduced, is that this results in models with relatively large sets of covariates. With the limited number of domains in this application there is a real risk of overfitting the data. For some variables the covariates appear to be strong predictors for the domain variables, resulting in small random effects. Fitting a model without these covariates results in models with large random effects and strong positive correlations between the regular and alternative survey estimates. For other variables a model with a full covariance structure automatically results in parsimonious models for the fixed effect part, probably because the set of available covariates are less strong predictors for these target variables.

The aformentioned issue of selecting models with too many covariates is circumvented with an alternative step-forward selection approach. Since the WAIC values are estimated from the Gibbs sampler output, these values are observed with some degree of uncertainty. This is an argument not to include covariates if they only result in a small reduction of the WAIC. In an alternative step-forward selection approach, covariates are only selected if the decrease in the WAIC value exceeds the estimated standard error of the WAIC. With this approach parsimoneous models are selected since it avoids the selection of one or more covariates that only marginally improve the WAIC. For variables where initially large sets of covariates were selected, this approach results in a reasonable compromise between model fit and model complexity. As an alternative, models with equal regression coefficients can be considered. Such models are, however, less appropriate for predicting domain discontinuities if the random effects are small. In such situations the dummy indicator is the only model component that discriminates between the regular and alternative approach. This results in synthetic predictions for domain discontinuities that are almost equal over the domains, and approximately equal to the direct estimator for the discontinuity at the national level. Depending on the type of changes in the survey process, it might be correct to assume that domain discontinuities are equal. In that case the best estimate is obtained with the direct estimator at the national level.

For a better understanding of the properties and behaviour of the three different models for estimating domain discontinuities, including the proposed model selection approach, a comprehensive simulation is required. This will provide a better understanding under what conditions, which of the three different modeling approaches are preferred. Such a study is left for future research.

Acknowledgements

The views expressed in this paper are those of the authors and do not necessarily agree with the policy of Statistics Netherlands. We would like to thank the Associate Editor and the two referees for careful reading of our manuscript and providing useful comments. They proved to be very helpful to improve the quality of our paper.

Appendix


Table A.1
Overview auxiliary data
Table summary
This table displays the results of Overview auxiliary data. The information is grouped by Variable (appearing as row headers), Description and Source (appearing as column headers).
Variable Description Source
MBA_benefit Percentage of social benefit claimants MBA
MBA_immigr Percentage of immigrants in population MBA
MBA_immigrnw Percentage of non-western immigrants in population MBA
MBA_old Percentage of elderly people (aged over 65) MBA
MBA_benefit Percentage of social benefit claimants MBA
PR_assault Peported physical assaults PRRO
PR_propcrim Property crimes PRRO
PR_threat Reported threats PRRO
PR_weapon Weapon offences PRRO
PR_drugs Illicit drug offences PRRO
CVSR_nuisance Perceived nuisance in the neighbourhood regular survey
CVSR_victim Percentage of people saying that they have been victim to a crime regular survey
CVSR_funcpol Opinion on functioning of the police on a 10-point scale regular survey

References

Arora, V., and Lahiri, P. (1997). On the superiority of the bayesian method over the blup in small area estimation problems. Statistica Sinica, 7, 1053-1063.

Bell, W.R. (1999). Accounting for uncertainty about variances in small area estimation. Bulletin of the International Statistical Institute, 52nd Session.

Benavent, R., and Morales, D. (2016). Multivariate Fay-Herriot models for small area estimation. Computational Statistics and Data Analysis, 94, 372-390.

Bollineni-Balabay, O., van den Brakel, J.A., Palm, F. and Boonstra, H.J. (2017). Multilevel hierarchical Bayesian vs. state space approach in time series small area estimation: The Dutch Travel Survey. Journal of the Royal Statisitcal Society, forthcoming.

Boonstra, H.-J. (2020). mcmcsae: MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepK0hi9siW7rqqrFfpu0de9GqFf0xc9 qqpeuf0xe9q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr 0=vr0=edbaqaaeGaciGaaiaabeqaamGabaabaaGcbaacdiGaa8xBai aa=ngacaWFTbGaa83yaiaa=nhacaWFHbGaa8xzaiaacQdaaaa@3C29@  Markov Chain Monte Carlo Small Area Estimation. R package version 0.5.0.

Boonstra, H.J., and van den Brakel, J.A. (2019). Estimation of level and change for unemployment using structural time series models. Survey Methodology, 45, 3, 395-425. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2019003/article/00005-eng.pdf.

Datta, G., Ghosh, M., Nangia, N. and Natarjan, K. (1996). Estimation of median income in four-peron families: A Bayesian approach. In Bayesian analysis in Statistics and Econometrics, (Eds., D.A. Berry, K.M. Chaloner and J.M. Geweke), New York: John Wiley & Sons, Inc., 129-140.

Datta, G.S., Lahiri, P., Maiti, T. and Lu, K.L. (1999). Hierarchical Bayes estimation of unemployment rates for the states of the U.S. Journal of the American Statistical Association, 94(448), 1074-1082.

Estaban, M.D., Morales, D., Perez, A. and Santamaria, L. (2012). Small area estimation of poverty proportions under area-level time models. Computational Statistics and Data Analysis, 56, 2840-2855.

Fay, R.E., and Herriot, R.A. (1979). Estimates of income for small places: An application of James-Stein procedures to census data. Journal of the American Statistical Association, 74(366), 269-277.

Gelfand, A.E., and Smith, A.F.M. (1990). Sampling based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398-409.

Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515-533.

Gelman, A., and Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457-472.

Gelman, A., Van Dyk, D.A., Huang, Z. and Boscardin, W.J. (2008). Using redundant parameterizations to fit hierarchical models. Journal of Computational and Graphical Statistics, 17(1), 95-122.

Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A. and Rubin, D.B. (2004). Bayesian Data Analysis. New York: Chapman and Hall.

Geman, S., and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattn Anal. Mach. Intell., 6, 721-741.

Gonzales-Manteiga, W., Lombardia, M.J., Molina, I., Morales, D. and Santamaria, L. (2008). Analytic and bootstrap approximations of prediction errors under a multivariate Fay-Herriot model. Computational Statistics and Data Analysis, 52, 5242-5252.

Gonzalez, M.E., and Waksberg, J. (1973). Estimation of the error of synthetic estimates. Technical report, paper presented at the first meeting of the International Association of Survey Statisticians, Vienna, 18-25 August, 1973.

Hawala, S., and Lahiri, P. (2018). Variance modelling for domains. Statisitcs and Applications, 16, 399-409.

Hirose, M., and Lahiri, P. (2018). Estimating variance of random effects to solve multiple problems simultaneously. Annals of Statistics, 46, 1721-1741.

Hodges, J.S., and Sargent, D.J. (2001). Counting degrees of freedom in hierarchical and other richly-parameterised models. Biometrika, 88(2), 367-379.

Lahiri, P., and Pramanik, S. (2019). Evaluation of synthetic small-area estimators using design-based methods. Austrian Journal of Statistics, 48, 43-57.

Li, H., and Lahiri, P. (2010). Adjusted maximum likelihood method for solving small area estimation problems. Journal of Multivariate Analysis, 101, 882-892.

Marhuenda, Y., Molina, I. and Morales, D. (2013). Small area estimation with spatio-temporal Fay-Herriot models. Computational Statistics and Data Analysis, 57, 308-325.

Marker, D. (1995). Small Area Estimation: A Bayesian Perspective. Ph.D. thesis, University of Michigan.

O’Malley, A.J., and Zaslavsky, A.M. (2008). Domain-level covariance analysis for multilevel survey data with structured nonresponse. Journal of the American Statistical Association, 103(484), 1405-1418.

Pfeffermann, D., and Ben-Hur, D. (2018). Estimation of randomisation mean square error in small area estimation. International Statistical Review, 87, 31-49.

Pfeffermann, D., and Burck, L. (1990). Robust small area estimation combining time series and cross-sectional data. Survey Methodology, 16, 2, 217-237. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/1990002/article/14534-eng.pdf.

Pfeffermann, D., and Tiller, R. (2006). Small area estimation with state-space models subject to benchmark constraints. Journal of the American Statistical Association, 101, 1387-1397.

Polson, N.G., and Scott, J.G. (2012). On the half-Cauchy prior for a global scale parameter. Bayesian Analysis, 7(4), 887-902.

Rao, J.N.K., and Molina, I. (2015). Small Area Estimation. Wiley-Interscience.

Rao, J.N.K., and Yu, M. (1994). Small area estimation by combining time series and cross-sectional data. The Canadian Journal of Statistics, 22, 511-528.

Rao, J.N.K., Rubin-Bleuer, S. and Estevao, V.M. (2018). Measuring uncertainty associated with model-based small area estimators. Unpublished research paper.

Rivest, L.-P., and Belmonte, E. (2000). A conditional mean squared error of small area estimators. Survey Methodology, 26, 1, 67-78. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2000001/article/5179-eng.pdf.

Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer.

Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society B, 64(4), 583-639.

Sugasawa, S., Tamae, H. and Kubokawa, T. (2017). Bayesian estimators for small area models shrinking both means and variances. Scandinavian Journal of Statistics, 44(1), 150-167.

van den Brakel, J.A. (2008). Design-based analysis of embedded experiments with applications in the Dutch Labour Force Survey. Journal of the Royal Statisitcal Society, 171, 581-613.

van den Brakel, J.A., and Boonstra, H.J. (2018). Hierarchical Bayesian bivariate Fay-Herriot model for estimating domain discontinuities. Discussion paper, Statistics Netherlands.

van den Brakel, J.A., and Krieg, S. (2016). Small area estimation with state-space common factor models for rotating panels. Journal of the Royal Statisitcal Society, 179, 763-791.

van den Brakel, J.A., Buelens, B. and Boonstra, H.J. (2016). Small area estimation to quantify discontinuities in repeated sample surveys. Journal of the Royal Statisitcal Society, 179, 229-250.

van den Brakel, J.A., Smith, P.A. and Compton, S. (2008). Quality procedures for survey transitions- experiments, time series and discontinuities. Survey Research Methods, 2, 123-141.

Vehtari, A., Gelman, A. and Gabry, J. (2015). loo: MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepK0hi9siW7rqqrFfpu0de9GqFf0xc9 qqpeuf0xe9q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr 0=vr0=edbaqaaeGaciGaaiaabeqaamGabaabaaGcbaacdiGaa8hBai aa=9gacaWFVbGaaiOoaaaa@3896@  Efficient Leave-one-out Cross-validation and WAIC for Bayesian Models. R package version 0.1.3.

Vehtari, A., Gelman, A. and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistical Computation, 27, 1413-1432.

Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11, 3571-3594.

Watanabe, S. (2013). A widely applicable Bayesian information criterion. Journal of Machine Learning Research, 14, 867-897.

Wolter, K. (2007). Introduction to Variance Estimation. Springer.

Ybarra, L.M.R., and Lohr, S.L. (2008). Small area estimation when auxiliary information is measured with error. Biometrika, 95(4), 919-931.

You, Y. (2008). Small area estimation using area level models with model checking and applications. Proceedings of the Survey Methods Section of the Statistical Society of Canada.

You, Y., and Chapman, B. (2006). Small area estimation using area level models and estimated sampling variances. Survey Methodology, 32, 1, 97-103. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2006001/article/9263-eng.pdf.

You, Y., and Rao, J.N.K. (2000). Hierarchical Bayes estimation of small area means using multi-level models. Survey Methodology, 26, 2, 173-181. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2000002/article/5537-eng.pdf.


Date modified: